Detect Search Engine

Hi,

i am looking for ways to do some kind of SEO with our vaadin application. I already managed to provide some meta information by creating some meta tags for title,description and keywords within a bootstraplistener. We would also like to provide some static HTML code for the search engine (because the search engine does no support javascript). I read that you can do that by generating some html code within the servlet. But my problem is: how can i detect that the request came from a search engine? Any idea?

thanks in advance :smiley:

The purpose of the escaped_fragment URL parameter is exactly the way a search engine says that it’s a search engine, and asks for a plain HTML page, as described
in the book
. Sure, someone can use the parameter from a browser as well, but probably never by mistake.

The HTML page is not exactly “static”, as it’s generated dynamically by the servlet (it could load it from static content), but I suppose static in the respect that the page isn’t generated with JavaScript as with regular Vaadin/AJAX apps.

i have already tried that approach. but google never went for it (according to my log files). according to the google documentation you have to provide some kind of meta information with this tag: . but it didn’t help either.

And you used the Google Webmaster Tools or something to get Google to crawl the URLs?

If it didn’t even access the URL, there’s no point in adding a header…

The book example can be found with Google search such as “Here is some crawlable venus”. I didn’t submit it with the Webmaster Tools, but I think Google found it just by indexing this forum.

Hi Dominik,

I managed to get google to crawl us with Vaadin 6 so i do not know if that info is any good.
if it isnt. just ignore my message.

first of all I added a parameter Handler to the window class and looked for escaped_fragment



      this.addParameterHandler(new ParameterHandler() 
        {
			private static final long serialVersionUID = 1L;

			public void handleParameters(Map<String, String[]> parameters) 
			{
                // If the special escape paremeter is included, store it
                if (parameters.containsKey("_escaped_fragment_"))
                {
                	m_strFragment = parameters.get("_escaped_fragment_")[0]
;
                }
                else
                {
                	m_strFragment = null;
                }
            }
        });		

I Also added a UriHandler:



        m_UriHandler = new URIHandler() 
	    {
			public DownloadStream handleURI(URL context, String relativeUri) 
			{
	            	Boolean bGoogle = m_strFragment != null;; 
	            		
					if(bGoogle )
					{ 
	            		//google friendly content here...

	            		String strContent = "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\"><html><head><meta http-equiv=\"Expires\" content=\"Fri, Jan 01 1900 00:00:00 GMT\"><meta http-equiv=\"Pragma\" content=\"no-cache\"><meta http-equiv=\"Cache-Control\" content=\"no-cache\"><meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"><title></title></head><body></body></html>";
	            		ByteArrayInputStream istream = new ByteArrayInputStream(( strContent ).getBytes());
	            		((WebApplicationContext)getApplication().getContext()).getHttpSession().setMaxInactiveInterval(10);
	            		 return new DownloadStream(istream, "text/html", null);

	            	 }
	            	return null;
	            }

the real code i use is a bit more complex as i generate a dynamic result according to the url google crawls true.
but i hope you get the idea.

more information about google crawling webapplications can be found here:
http://googlewebmastercentral.blogspot.nl/2009/10/proposal-for-making-ajax-crawlable.html

https://developers.google.com/webmasters/ajax-crawling/docs/getting-started

thx for the quick replay. yeah i used the google webmaster tools to force google to crawl our website. when i examine the information google received from our website i only get the html code with please enable javascript etc…

@arnold: thx four your reply. i am using a similar approch. logic wise its the same. actually it is the same as the example from the book of vaadin:https://vaadin.com/book/-/page/advanced.urifu.html

here is the code from our servlet


	@Override
	protected void service(HttpServletRequest request,
			HttpServletResponse response) throws ServletException, IOException {

		String fragment = request.getParameter("_escaped_fragment_");
		if(fragment != null){
			//html content for search engines
			logger.info("application was visited by a search engine");
			response.setContentType("text/html");
			Writer writer = response.getWriter();
			writer.append(HTMLCONTENT);
		}else{....
        }

the code works because i get the correct html content when i make the request on my own. but somehow google doesn’t want to.

Ah now i understand your problem. Google wont index any furder because it has no idea what to crawl.
You should tell in your robots.txt file where you have your sitemap.

sitemap: http://#yoursitemap page here#

for example:
in the sitemap you will for example set:

http:://yourwebsite.com/#!catalog=14

google will then crawl your page and present the link to the user removing the #.

it seems at this point google just dasn’t know what ‘pages’ to crawl besides your ‘homepage’

i hope this helped.
p.s. as most other crawlers do not understand the system i show them the door right away
(in the robots.txt)

 User-agent: *
Disallow: /

User-agent: Googlebot
Allow: /

User-agent: Googlebot-Mobile
Allow: /

ps2 your webapp has to be able to respond to the urlfragments.
and for google you have to show the alternative result.

Good point. Maybe I should mention this in the book as well.