You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@turbine.apache.org by Eliot <tu...@juxti.co.uk> on 2002/08/22 23:34:14 UTC
How to get Google to index your Turbine-generated pages (was: "Is Turbine-powered search engine success possible?")
If you want to make sure your turbine generated pages get indexed
by Google then you'll be interested in this post.
I dug a bit deeper in Google's index and this _seems_ (I'm not
certain) to be the rule for certain Turbine generated URLs *not*
being indexed:
Take the substring after the very last forwards slash in the URL.
If that string contains a comma (,) or the URLEncode of a comma
(%2C) then the URL will *not* be indexed by Googlebot UNLESS
there is a query string on the end.
So taking a fictional example:
While,
http://www.mydom.com/index.html
is okay,
http://www.mydom.com/company,index.html
is NOT okay. Google will not index this page UNLESS it has a
query string, e.g.
http://www.mydom.com/company,index.html?key=val
If you want to make sure Google indexes all the pages
generated by Turbine make sure all your URLs either have a query
string, or end with a PathInfo appended name-value pair.
On my site I've made sure all URLs end with a redundant PathInfo
appended name-value pair (a=z) so now all my URLs end with the
string "/a/z" e.g.
http://www.mydom.com/cntxt/servlet/turbine/template/home,Index.vm/a/z
It was real simple to implement this, I just extended
TemplateLink and overrode setPage(String t) as follows:
setPage(String t) {
super.setPage(t);
addPathInfo("a","z");
return this;
}
Then set the link tool to my extended TemplateLink in TR.props.
Hope this helps someone other than me...
Eliot
> Hi everyone,
>
> Just been doing a bit of time-wasting research on Google to get
> an idea of how many Turbine/Velocity powered sites there are
> using searches like:
>
> allinurl:template , .vm
>
> As a big fan of turbine I was glad to see this search returned
> thousands of results.
>
> What interested me though was that _none_ of the URLs that
> contained a comma ended with .vm . The only indexed
> comma-containing URLs were those that had data appended after
> the.vm as a query string or pathInfo.
>
> For instance, it appears to me that URLs
>
> www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> Page.vm
>
> doesn't get indexed by Google.
>
> However URLs with pathInfo or a query string do get indexed:
> e.g.
> www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> Page.vm/action/FooAction or
> www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> Page.vm?myVar=myVal
>
> I'm beginning to wonder if its actually possible to have a
> Turbine powered site that does perform well in Google's
> results?
>
> If anyones had any success with getting a Turbine powered web
> site to rank highly in relevant Google searches how did you get
> around the problem above? I imagine the simplest solution is
> to use the TemplateLinkWithSlash tool instead of the
> TemplateLink tool to eliminate commas from the URLs.
>
> Please contribute your thoughts and experiences on this
> problem.
>
> Eliot
>
> --
> To unsubscribe, e-mail:
> <ma...@jakarta.apache.org> For
> additional commands, e-mail:
> <ma...@jakarta.apache.org>
>
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: How to get Google to index your Turbine-generated pages (was: "Is
Turbine-powered search engine success possible?")
Posted by Alexander Banthien <al...@questech.de>.
Hi,
what Eliot says about appending dummy data to the URL is a good idea for another
reason aswell.
For a URL like
http://my.server.com/mycontext/servlet/turbine/template/path,page.jsp
the servlet engine has to decide, wether to run the servlet, or the jsp. I found
that Tomcat prior to Catalina (take this as only a rough indication) decide to
rather try and run a jsp directly, which he obviously cant find.
This trouble is solved by appending dummy path-info as described by Eliot.
Regards, Alex
Eliot wrote:
> If you want to make sure your turbine generated pages get indexed
> by Google then you'll be interested in this post.
>
> I dug a bit deeper in Google's index and this _seems_ (I'm not
> certain) to be the rule for certain Turbine generated URLs *not*
> being indexed:
>
> Take the substring after the very last forwards slash in the URL.
>
> If that string contains a comma (,) or the URLEncode of a comma
> (%2C) then the URL will *not* be indexed by Googlebot UNLESS
> there is a query string on the end.
>
> So taking a fictional example:
>
> While,
>
> http://www.mydom.com/index.html
>
> is okay,
>
> http://www.mydom.com/company,index.html
>
> is NOT okay. Google will not index this page UNLESS it has a
> query string, e.g.
>
> http://www.mydom.com/company,index.html?key=val
>
> If you want to make sure Google indexes all the pages
> generated by Turbine make sure all your URLs either have a query
> string, or end with a PathInfo appended name-value pair.
>
> On my site I've made sure all URLs end with a redundant PathInfo
> appended name-value pair (a=z) so now all my URLs end with the
> string "/a/z" e.g.
> http://www.mydom.com/cntxt/servlet/turbine/template/home,Index.vm/a/z
>
> It was real simple to implement this, I just extended
> TemplateLink and overrode setPage(String t) as follows:
>
> setPage(String t) {
> super.setPage(t);
> addPathInfo("a","z");
> return this;
> }
>
> Then set the link tool to my extended TemplateLink in TR.props.
>
> Hope this helps someone other than me...
> Eliot
>
> > Hi everyone,
> >
> > Just been doing a bit of time-wasting research on Google to get
> > an idea of how many Turbine/Velocity powered sites there are
> > using searches like:
> >
> > allinurl:template , .vm
> >
> > As a big fan of turbine I was glad to see this search returned
> > thousands of results.
> >
> > What interested me though was that _none_ of the URLs that
> > contained a comma ended with .vm . The only indexed
> > comma-containing URLs were those that had data appended after
> > the.vm as a query string or pathInfo.
> >
> > For instance, it appears to me that URLs
> >
> > www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> > Page.vm
> >
> > doesn't get indexed by Google.
> >
> > However URLs with pathInfo or a query string do get indexed:
> > e.g.
> > www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> > Page.vm/action/FooAction or
> > www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> > Page.vm?myVar=myVal
> >
> > I'm beginning to wonder if its actually possible to have a
> > Turbine powered site that does perform well in Google's
> > results?
> >
> > If anyones had any success with getting a Turbine powered web
> > site to rank highly in relevant Google searches how did you get
> > around the problem above? I imagine the simplest solution is
> > to use the TemplateLinkWithSlash tool instead of the
> > TemplateLink tool to eliminate commas from the URLs.
> >
> > Please contribute your thoughts and experiences on this
> > problem.
> >
> > Eliot
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org> For
> > additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
>
> --
> To unsubscribe, e-mail: <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
--
Mit freundlichen Grüßen
Alexander Banthien
_______________________________________
Questech GmbH
Schwarzwaldstr. 19
79199 Kirchzarten
Fon: +49 (0)7661 90 35-15
Fax: +49 (0)7661 90 35-20
www.questech.de
_______________________________________
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>