Hey guys,
I just recently realized that since virtually the entire Vaadin application code is processed as javascript, I need a workaround for search engine indexing.
Apparently google crawler is now able to crawl AJAX applications, albeit certain URI conditions have to be met first:
Google Crawler - AJAX
. Google seems to be the only search engine for now, but knowing how quickly the internet works this should spread to other engines soon enough.
Summary:
Basically what I get from the Google FAQ is that:
a) First we have to convert our “www.example.com/ajax.html #example” to “www.example.com/ajax.html #!example”. This seems simple enough once you set up URIfragment handling.
b) Second, and this part really confuses me, we have to provide an html snapshot of our page.
Whenever the google bot sees “#!” URIfragment it converts it to “?escaped_fragment=”. Now whenever this address is called, our application should return an HTML page with flat content that can be indexed by the bot.
c)At the end our server has to make sure that a request URL of the form “www.example.com/ajax.html?escaped_fragment=key=value” is mapped back to its original form: “www.example.com/ajax.html#!key=value”.
=============================================================
It seems there is already a book example on how to implement this, however I have a hard time following it.
// Set the URI Fragment when menu selection changes
menu.addListener(new Property.ValueChangeListener() {
public void valueChange(ValueChangeEvent event) {
String itemid = (String) event.getProperty().getValue();
// Set the fragment with the exclamation mark, which is
// understood by the indexing engine
urifu.setFragment("!" + itemid);
}
});
This part I understand, the menu has a listener that fires whenever someone selects an item, it just adds the URIfragment to the URL and also makes sure that the fragment begins with a “#!”.
// When the URI fragment is given, use it to set menu selection
urifu.addListener(new FragmentChangedListener() {
public void fragmentChanged(FragmentChangedEvent source) {
String fragment =
source.getUriFragmentUtility().getFragment();
if (fragment != null) {
// Skip the exclamation mark
if (fragment.startsWith("!"))
fragment = fragment.substring(1);
// Set the menu selection
menu.select(fragment);
// Display some content related to the item
main.addComponent(new Label(getContent(fragment)));
}
}
});
URIfragment listener that listens to URIfragment and selects the specific menu item when the fragment changes. This is still just the regular Vaadin URI handling.
// Store possible parameters here
main.addParameterHandler(new ParameterHandler() {
public void handleParameters(Map<String, String[]> parameters) {
// If the special escape paremeter is included, store it
if (parameters.containsKey("_escaped_fragment_"))
fragment = parameters.get("_escaped_fragment_")[0]
;
else
fragment = null;
}
});
Here we attach a parameter handler to the main Window that catches the parameters passed to our application on the server. If google sees a site with “#!” address fragment, it will send a request for a site with address “escaped_fragment”. Is this correct?
// Handle the parameters here
main.addURIHandler(new URIHandler() {
public DownloadStream handleURI(URL context, String relativeUri) {
if (fragment != null) {
// Got the fragment earlier, provide some HTML content
// for the indexing engine
String content = getContent(fragment);
ByteArrayInputStream istream = new ByteArrayInputStream(
("<html><body><p>" + content +
"</p></body></html>").getBytes());
return new DownloadStream(istream, "text/html", null);
}
return null;
}
});
Finally, the last part, handling of the “?escaped_fragment”. Now, it seems that anything thats after “?escaped_fragment” is stored in the fragment variable, such that if our address is “?escaped_fragment=earth” our fragment is “earth”. So here, we manually force an HTML page to be created with content that is just textual representation of whatever the fragment was.
================================================
Questions:
-
How does the google bot know what’s located on our website? Does it follow the links on its own? In the book example, we have our main page with a selectable list. If we provide the bot with the main link i.e. “http://magi.virtuallypreinstalled.com/book-examples/indexing” how will it see that"http://magi.virtuallypreinstalled.com/book-examples/indexing#!mars" or “http://magi.virtuallypreinstalled.com/book-examples/indexing#!earth” are even possible links so it can send the “?escaped_fragment” request? Are we suppose to specify all the possible links within the HTML snapshot of our page?
-
What happens with the html snapshot pages we create? Should we store them? In the book example, it seems like we are just overwriting the same istream whenever the parameter request is made.
-
Do we need to do anything on our server. In the google FAQ it says that we should map “?escaped_fragment” back to “#!” (summary, part c) but I’m not sure what they mean by this.
Thanks, and sorry for such a convoluted post. Allowing an application to be indexed is a big stepping stone for me.