You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@turbine.apache.org by Eliot <tu...@juxti.co.uk> on 2002/08/22 17:48:24 UTC

Is Turbine-powered search engine success possible?

Hi everyone,

Just been doing a bit of time-wasting research on Google to get
an idea of how many Turbine/Velocity powered sites there are
using searches like:

allinurl:template , .vm

As a big fan of turbine I was glad to see this search returned
thousands of results.

What interested me though was that _none_ of the URLs that
contained a comma ended with .vm .  The only indexed
comma-containing URLs were those that had data appended after the
.vm as a query string or pathInfo.

For instance, it appears to me that URLs

www.somedomain.com/somecontext/servlet/turbine/template/foo,MyPage.vm

doesn't get indexed by Google.

However URLs with pathInfo or a query string do get indexed: 
e.g.
www.somedomain.com/somecontext/servlet/turbine/template/foo,MyPage.vm/action/FooAction or
www.somedomain.com/somecontext/servlet/turbine/template/foo,MyPage.vm?myVar=myVal

I'm beginning to wonder if its actually possible to have a
Turbine powered site that does perform well in Google's results?

If anyones had any success with getting a Turbine powered web
site to rank highly in relevant Google searches how did you get
around the problem above?  I imagine the simplest solution is to
use the TemplateLinkWithSlash tool instead of the TemplateLink
tool to eliminate commas from the URLs.

Please contribute your thoughts and experiences on this problem.

Eliot

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: How to get Google to index your Turbine-generated pages (was: "Is Turbine-powered search engine success possible?")

Posted by Alexander Banthien <al...@questech.de>.
Hi,

what Eliot says about appending dummy data to the URL is a good idea for another
reason aswell.

For a URL like
http://my.server.com/mycontext/servlet/turbine/template/path,page.jsp

the servlet engine has to decide, wether to run the servlet, or the jsp. I found
that Tomcat prior to Catalina (take this as only a rough indication) decide to
rather try and run a jsp directly, which he obviously cant find.

This trouble is solved by appending dummy path-info as described by Eliot.

Regards, Alex

Eliot wrote:

> If you want to make sure your turbine generated pages get indexed
> by Google then you'll be interested in this post.
>
> I dug a bit deeper in Google's index and this _seems_ (I'm not
> certain) to be the rule for certain Turbine generated URLs *not*
> being indexed:
>
> Take the substring after the very last forwards slash in the URL.
>
> If that string contains a comma (,) or the URLEncode of a comma
> (%2C) then the URL will *not* be indexed by Googlebot UNLESS
> there is a query string on the end.
>
> So taking a fictional example:
>
> While,
>
> http://www.mydom.com/index.html
>
> is okay,
>
> http://www.mydom.com/company,index.html
>
> is NOT okay.  Google will not index this page UNLESS it has a
> query string, e.g.
>
> http://www.mydom.com/company,index.html?key=val
>
> If you want to make sure Google indexes all the pages
> generated by Turbine make sure all your URLs either have a query
> string, or end with a PathInfo appended name-value pair.
>
> On my site I've made sure all URLs end with a redundant PathInfo
> appended name-value pair (a=z) so now all my URLs end with the
> string "/a/z" e.g.
> http://www.mydom.com/cntxt/servlet/turbine/template/home,Index.vm/a/z
>
> It was real simple to implement this, I just extended
> TemplateLink and overrode setPage(String t) as follows:
>
> setPage(String t) {
>         super.setPage(t);
>         addPathInfo("a","z");
>         return this;
> }
>
> Then set the link tool to my extended TemplateLink in TR.props.
>
> Hope this helps someone other than me...
> Eliot
>
> > Hi everyone,
> >
> > Just been doing a bit of time-wasting research on Google to get
> > an idea of how many Turbine/Velocity powered sites there are
> > using searches like:
> >
> > allinurl:template , .vm
> >
> > As a big fan of turbine I was glad to see this search returned
> > thousands of results.
> >
> > What interested me though was that _none_ of the URLs that
> > contained a comma ended with .vm .  The only indexed
> > comma-containing URLs were those that had data appended after
> > the.vm as a query string or pathInfo.
> >
> > For instance, it appears to me that URLs
> >
> > www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> > Page.vm
> >
> > doesn't get indexed by Google.
> >
> > However URLs with pathInfo or a query string do get indexed:
> > e.g.
> > www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> > Page.vm/action/FooAction or
> > www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> > Page.vm?myVar=myVal
> >
> > I'm beginning to wonder if its actually possible to have a
> > Turbine powered site that does perform well in Google's
> > results?
> >
> > If anyones had any success with getting a Turbine powered web
> > site to rank highly in relevant Google searches how did you get
> > around the problem above?  I imagine the simplest solution is
> > to use the TemplateLinkWithSlash tool instead of the
> > TemplateLink tool to eliminate commas from the URLs.
> >
> > Please contribute your thoughts and experiences on this
> > problem.
> >
> > Eliot
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org> For
> > additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
>
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>

--

Mit freundlichen Grüßen
Alexander Banthien

_______________________________________
Questech GmbH
Schwarzwaldstr. 19
79199 Kirchzarten

Fon: +49 (0)7661 90 35-15
Fax: +49 (0)7661 90 35-20
www.questech.de

_______________________________________



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


How to get Google to index your Turbine-generated pages (was: "Is Turbine-powered search engine success possible?")

Posted by Eliot <tu...@juxti.co.uk>.
If you want to make sure your turbine generated pages get indexed
by Google then you'll be interested in this post.

I dug a bit deeper in Google's index and this _seems_ (I'm not
certain) to be the rule for certain Turbine generated URLs *not*
being indexed:

Take the substring after the very last forwards slash in the URL.

If that string contains a comma (,) or the URLEncode of a comma
(%2C) then the URL will *not* be indexed by Googlebot UNLESS
there is a query string on the end.

So taking a fictional example:

While,

http://www.mydom.com/index.html

is okay,

http://www.mydom.com/company,index.html

is NOT okay.  Google will not index this page UNLESS it has a
query string, e.g.

http://www.mydom.com/company,index.html?key=val

If you want to make sure Google indexes all the pages
generated by Turbine make sure all your URLs either have a query
string, or end with a PathInfo appended name-value pair.

On my site I've made sure all URLs end with a redundant PathInfo
appended name-value pair (a=z) so now all my URLs end with the
string "/a/z" e.g.
http://www.mydom.com/cntxt/servlet/turbine/template/home,Index.vm/a/z

It was real simple to implement this, I just extended
TemplateLink and overrode setPage(String t) as follows:

setPage(String t) {
	super.setPage(t);
	addPathInfo("a","z");
	return this;
}

Then set the link tool to my extended TemplateLink in TR.props.

Hope this helps someone other than me...
Eliot

> Hi everyone,
> 
> Just been doing a bit of time-wasting research on Google to get
> an idea of how many Turbine/Velocity powered sites there are
> using searches like:
> 
> allinurl:template , .vm
> 
> As a big fan of turbine I was glad to see this search returned
> thousands of results.
> 
> What interested me though was that _none_ of the URLs that
> contained a comma ended with .vm .  The only indexed
> comma-containing URLs were those that had data appended after
> the.vm as a query string or pathInfo.
> 
> For instance, it appears to me that URLs
> 
> www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> Page.vm
> 
> doesn't get indexed by Google.
> 
> However URLs with pathInfo or a query string do get indexed: 
> e.g.
> www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> Page.vm/action/FooAction or
> www.somedomain.com/somecontext/servlet/turbine/template/foo,My
> Page.vm?myVar=myVal
> 
> I'm beginning to wonder if its actually possible to have a
> Turbine powered site that does perform well in Google's
> results?
> 
> If anyones had any success with getting a Turbine powered web
> site to rank highly in relevant Google searches how did you get
> around the problem above?  I imagine the simplest solution is
> to use the TemplateLinkWithSlash tool instead of the
> TemplateLink tool to eliminate commas from the URLs.
> 
> Please contribute your thoughts and experiences on this
> problem.
> 
> Eliot
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org> For
> additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>