You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@turbine.apache.org by Eliot <tu...@juxti.co.uk> on 2002/08/22 23:34:14 UTC

How to get Google to index your Turbine-generated pages (was: "Is Turbine-powered search engine success possible?")

If you want to make sure your turbine generated pages get indexed
by Google then you'll be interested in this post.

I dug a bit deeper in Google's index and this _seems_ (I'm not
certain) to be the rule for certain Turbine generated URLs *not*
being indexed:

Take the substring after the very last forwards slash in the URL.

If that string contains a comma (,) or the URLEncode of a comma
(%2C) then the URL will *not* be indexed by Googlebot UNLESS
there is a query string on the end.

So taking a fictional example:

While,

http://www.mydom.com/index.html

is okay,

http://www.mydom.com/company,index.html

is NOT okay.  Google will not index this page UNLESS it has a
query string, e.g.

http://www.mydom.com/company,index.html?key=val

If you want to make sure Google indexes all the pages
generated by Turbine make sure all your URLs either have a query
string, or end with a PathInfo appended name-value pair.

On my site I've made sure all URLs end with a redundant PathInfo
appended name-value pair (a=z) so now all my URLs end with the
string "/a/z" e.g.
http://www.mydom.com/cntxt/servlet/turbine/template/home,Index.vm/a/z

It was real simple to implement this, I just extended
TemplateLink and overrode setPage(String t) as follows:

setPage(String t) {
	super.setPage(t);
	addPathInfo("a","z");
	return this;
}

Then set the link tool to my extended TemplateLink in TR.props.

Hope this helps someone other than me...
Eliot

> Hi everyone,
> 
> Just been doing a bit of time-wasting research on Google to get
> an idea of how many Turbine/Velocity powered sites there are
> using searches like:
> 
> allinurl:template , .vm
> 
> As a big fan of turbine I was glad to see this search returned
> thousands of results.
> 
> What interested me though was that _none_ of the URLs that
> contained a comma ended with .vm .  The only indexed
> comma-containing URLs were those that had data appended after
> the.vm as a query string or pathInfo.
> 
> For instance, it appears to me that URLs
> 
> www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> Page.vm
> 
> doesn't get indexed by Google.
> 
> However URLs with pathInfo or a query string do get indexed: 
> e.g.
> www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> Page.vm/action/FooAction or
> www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> Page.vm?myVar=myVal
> 
> I'm beginning to wonder if its actually possible to have a
> Turbine powered site that does perform well in Google's
> results?
> 
> If anyones had any success with getting a Turbine powered web
> site to rank highly in relevant Google searches how did you get
> around the problem above?  I imagine the simplest solution is
> to use the TemplateLinkWithSlash tool instead of the
> TemplateLink tool to eliminate commas from the URLs.
> 
> Please contribute your thoughts and experiences on this
> problem.
> 
> Eliot
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org> For
> additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: How to get Google to index your Turbine-generated pages (was: "Is Turbine-powered search engine success possible?")

Posted by Alexander Banthien <al...@questech.de>.
Hi,

what Eliot says about appending dummy data to the URL is a good idea for another
reason aswell.

For a URL like
http://my.server.com/mycontext/servlet/turbine/template/path,page.jsp

the servlet engine has to decide, wether to run the servlet, or the jsp. I found
that Tomcat prior to Catalina (take this as only a rough indication) decide to
rather try and run a jsp directly, which he obviously cant find.

This trouble is solved by appending dummy path-info as described by Eliot.

Regards, Alex

Eliot wrote:

> If you want to make sure your turbine generated pages get indexed
> by Google then you'll be interested in this post.
>
> I dug a bit deeper in Google's index and this _seems_ (I'm not
> certain) to be the rule for certain Turbine generated URLs *not*
> being indexed:
>
> Take the substring after the very last forwards slash in the URL.
>
> If that string contains a comma (,) or the URLEncode of a comma
> (%2C) then the URL will *not* be indexed by Googlebot UNLESS
> there is a query string on the end.
>
> So taking a fictional example:
>
> While,
>
> http://www.mydom.com/index.html
>
> is okay,
>
> http://www.mydom.com/company,index.html
>
> is NOT okay.  Google will not index this page UNLESS it has a
> query string, e.g.
>
> http://www.mydom.com/company,index.html?key=val
>
> If you want to make sure Google indexes all the pages
> generated by Turbine make sure all your URLs either have a query
> string, or end with a PathInfo appended name-value pair.
>
> On my site I've made sure all URLs end with a redundant PathInfo
> appended name-value pair (a=z) so now all my URLs end with the
> string "/a/z" e.g.
> http://www.mydom.com/cntxt/servlet/turbine/template/home,Index.vm/a/z
>
> It was real simple to implement this, I just extended
> TemplateLink and overrode setPage(String t) as follows:
>
> setPage(String t) {
>         super.setPage(t);
>         addPathInfo("a","z");
>         return this;
> }
>
> Then set the link tool to my extended TemplateLink in TR.props.
>
> Hope this helps someone other than me...
> Eliot
>
> > Hi everyone,
> >
> > Just been doing a bit of time-wasting research on Google to get
> > an idea of how many Turbine/Velocity powered sites there are
> > using searches like:
> >
> > allinurl:template , .vm
> >
> > As a big fan of turbine I was glad to see this search returned
> > thousands of results.
> >
> > What interested me though was that _none_ of the URLs that
> > contained a comma ended with .vm .  The only indexed
> > comma-containing URLs were those that had data appended after
> > the.vm as a query string or pathInfo.
> >
> > For instance, it appears to me that URLs
> >
> > www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> > Page.vm
> >
> > doesn't get indexed by Google.
> >
> > However URLs with pathInfo or a query string do get indexed:
> > e.g.
> > www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> > Page.vm/action/FooAction or
> > www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> > Page.vm?myVar=myVal
> >
> > I'm beginning to wonder if its actually possible to have a
> > Turbine powered site that does perform well in Google's
> > results?
> >
> > If anyones had any success with getting a Turbine powered web
> > site to rank highly in relevant Google searches how did you get
> > around the problem above?  I imagine the simplest solution is
> > to use the TemplateLinkWithSlash tool instead of the
> > TemplateLink tool to eliminate commas from the URLs.
> >
> > Please contribute your thoughts and experiences on this
> > problem.
> >
> > Eliot
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org> For
> > additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
>
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>

--

Mit freundlichen Grüßen
Alexander Banthien

_______________________________________
Questech GmbH
Schwarzwaldstr. 19
79199 Kirchzarten

Fon: +49 (0)7661 90 35-15
Fax: +49 (0)7661 90 35-20
www.questech.de

_______________________________________



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>