You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@wicket.apache.org by Dan Kaplan <dk...@citizenhawk.com> on 2008/04/03 02:20:35 UTC

Removing the jsessionid for SEO

victori_ provided this information on IRC and I just wanted to share it with
everyone else.  Googlebot and others don’t use cookies.  This means when
they visit your site it adds ;jsessionid=code to the end of all your urls
they visit.  When they re-visit it, they get a different code, consider that
a different url with the same content and punish you.  So, for the web
crawling bots, it’s very important to get rid of this (Perhaps it’s
worthwhile to check this code in to the code base).  

Here’s what you do in your Application:

  @Override  
    protected WebResponse newWebResponse(final HttpServletResponse servletRe
sponse) {  
          return CleanWebResponse.getNew(this, servletResponse);  
      } 

Here's the CleanWebResponse class:
public class CleanWebResponse {
    public static WebResponse getNew(final Application app, final
HttpServletResponse servletResponse) {
        return app.getRequestCycleSettings().getBufferResponse() ? new
Buffered(servletResponse) : new Unbuffered(
                servletResponse);
    }

    static class Buffered extends BufferedWebResponse {
        public Buffered(final HttpServletResponse httpServletResponse) {
            super(httpServletResponse);
        }

        @Override
        public CharSequence encodeURL(final CharSequence url) {
            return url;
        }
    }

    static class Unbuffered extends WebResponse {
        public Unbuffered(final HttpServletResponse httpServletResponse) {
            super(httpServletResponse);
        }

        @Override
        public CharSequence encodeURL(final CharSequence url) {
            return url;
        }
    }
}

Note, I haven't tested this myself yet but I plan to tonight.  Hope this was
helpful.  


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


RE: Removing the jsessionid for SEO

Posted by Dan Kaplan <dk...@citizenhawk.com>.
Sorry, I'm going to reply again.  It differentiate between my comments and
yours:

> -----Original Message-----
> From: Martijn Dashorst [mailto:martijn.dashorst@gmail.com]
> Sent: Thursday, April 03, 2008 3:36 PM
> To: users@wicket.apache.org
> Subject: Re: Removing the jsessionid for SEO
> 
> On 4/4/08, Dan Kaplan <dk...@citizenhawk.com> wrote:
> > Regardless, at the very least this makes your site look "weird" and
> >  unprofessional when google puts a jsessionid on your url.
> 
> 0.5% of your users care about the URL that is displayed in a google
> search result. It doesn't look weird or unprofessional. It is not like
> your URL ends in .php or *gawk* .asp is it? It brings the
> sophistication of Java to your users.

0.5% of your users care about the URL that is displayed in a google search
result. It doesn't look weird or unprofessional. It is not like your URL
ends in .php or *gawk* .asp is it? It brings the sophistication of Java to
your users.

My URL ends with ;jsessionid=an7goabg0az (my actual situation).  I
personally think that looks weirder than .php or .asp.  

Where did you get that 0.5% statistic?  Regardless, my users won't see ANY
url if my site is on the 50th page of the search.  That's the important
issue here.  

> 
> >  There has got to
> >  be some negative effect when google visits it the second time and the
> >  jsessionid has changed but it sees the same exact content.  Worst case,
> >  it'll think you're trying to trick it.
> 
> I think you need to give the google engineers *some* credit. I
> seriously doubt they are *THAT* stupid.
> 
> Martijn

These links suggest otherwise:
http://www.webmasterworld.com/google/3238326.htm
http://www.webmasterworld.com/forum3/5624.htm
http://www.webmasterworld.com/forum3/5479.htm
http://randomcoder.com/articles/jsessionid-considered-harmful


Google "jsessionid SEO" for more.  Most of the results tell you to get rid
of the jsessionid.  Granted, it doesn't seem google has specifically
mentioned this either way so all these comments are rumors.  But the fact of
the matter is Google *DOES* index your urls with the jessionid still in it.
You'd think they'd be smart enough to remove that, right?  If they can't get
that much right, I wouldn't want to make any other assumptions about their
abilities on similar matters.  

> 
> --
> Buy Wicket in Action: http://manning.com/dashorst
> Apache Wicket 1.3.2 is released
> Get it now: http://www.apache.org/dyn/closer.cgi/wicket/1.3.2
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Martijn Dashorst <ma...@gmail.com>.
On 4/4/08, Dan Kaplan <dk...@citizenhawk.com> wrote:
> My URL ends with ;jsessionid=an7goabg0az (my actual situation).  I
> personally think that looks weirder than .php or .asp.

Nah, it shows that you are using Java. Much more sophisticated!

>  Where did you get that 0.5% statistic?  Regardless, my users won't see ANY
>  url if my site is on the 50th page of the search.  That's the important
>  issue here.

I made the 0.5% statistic up. Developers are notoriously anal about
URL's where John and Jane Doe typically just use the google search box
as their URL bar. Did you ever look at the URLs of Amazon? they are
not pretty, and you'd need to have a very weird jsessionid to
overthrow Amazon's URL scheme on the ugly scale.

Where's the proof that Google punishes you for having a jsessionid in the URL?

>>  I think you need to give the google engineers *some* credit. I
>>  seriously doubt they are *THAT* stupid.
> These links suggest otherwise:
>  http://www.webmasterworld.com/google/3238326.htm
>  http://www.webmasterworld.com/forum3/5624.htm
>  http://www.webmasterworld.com/forum3/5479.htm
>  http://randomcoder.com/articles/jsessionid-considered-harmful

These links are from 2002 (over 5 years ago). Wicket wasn't even born
then. I surely hope that technology has evolved since then.

Anyway, I'm glad I don't have to build apps that require SEO or public
bots that navigate our sites. In fact if that ever happened, I think
our company would instantly be very famous (we deal with privacy
sensitive information that should stay out of Google/Yahoo/LiveSeach's
indexes)

Martijn

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


RE: Removing the jsessionid for SEO

Posted by John Patterson <jd...@gmail.com>.


Dan Kaplan-3 wrote:
> 
> Google "jsessionid SEO" for more.  Most of the results tell you to get rid
> of the jsessionid.  Granted, it doesn't seem google has specifically
> mentioned this either way so all these comments are rumors.  But the fact
> of
> the matter is Google *DOES* index your urls with the jessionid still in
> it.
> You'd think they'd be smart enough to remove that, right?  If they can't
> get
> that much right, I wouldn't want to make any other assumptions about their
> abilities on similar matters.  
> 

Search Matt Cutts blog for session id.  He specifically suggests to not even
include query string parameters that "look" like session ids.  From what I
remember Google can and does index pages with session ids BUT to a reduced
degree.
-- 
View this message in context: http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16646137.html
Sent from the Wicket - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


RE: Removing the jsessionid for SEO

Posted by Dan Kaplan <dk...@citizenhawk.com>.

-----Original Message-----
From: Martijn Dashorst [mailto:martijn.dashorst@gmail.com] 
Sent: Thursday, April 03, 2008 3:36 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

On 4/4/08, Dan Kaplan <dk...@citizenhawk.com> wrote:
> Regardless, at the very least this makes your site look "weird" and
>  unprofessional when google puts a jsessionid on your url.

0.5% of your users care about the URL that is displayed in a google
search result. It doesn't look weird or unprofessional. It is not like
your URL ends in .php or *gawk* .asp is it? It brings the
sophistication of Java to your users.

My URL ends with ;jsessionid=an7goabg0az (my actual situation).  I
personally think that looks weirder than .php or .asp.  

Where did you get that 0.5% statistic?  Regardless, my users won't see ANY
url if my site is on the 50th page of the search.  That's the important
issue here.  

>  There has got to
>  be some negative effect when google visits it the second time and the
>  jsessionid has changed but it sees the same exact content.  Worst case,
>  it'll think you're trying to trick it.

I think you need to give the google engineers *some* credit. I
seriously doubt they are *THAT* stupid.

Martijn

These links suggest otherwise:
http://www.webmasterworld.com/google/3238326.htm
http://www.webmasterworld.com/forum3/5624.htm
http://www.webmasterworld.com/forum3/5479.htm
http://randomcoder.com/articles/jsessionid-considered-harmful


Google "jsessionid SEO" for more.  Most of the results tell you to get rid
of the jsessionid.  Granted, it doesn't seem google has specifically
mentioned this either way so all these comments are rumors.  But the fact of
the matter is Google *DOES* index your urls with the jessionid still in it.
You'd think they'd be smart enough to remove that, right?  If they can't get
that much right, I wouldn't want to make any other assumptions about their
abilities on similar matters.  



-- 
Buy Wicket in Action: http://manning.com/dashorst
Apache Wicket 1.3.2 is released
Get it now: http://www.apache.org/dyn/closer.cgi/wicket/1.3.2

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Martijn Dashorst <ma...@gmail.com>.
On 4/4/08, Dan Kaplan <dk...@citizenhawk.com> wrote:
> Regardless, at the very least this makes your site look "weird" and
>  unprofessional when google puts a jsessionid on your url.

0.5% of your users care about the URL that is displayed in a google
search result. It doesn't look weird or unprofessional. It is not like
your URL ends in .php or *gawk* .asp is it? It brings the
sophistication of Java to your users.

>  There has got to
>  be some negative effect when google visits it the second time and the
>  jsessionid has changed but it sees the same exact content.  Worst case,
>  it'll think you're trying to trick it.

I think you need to give the google engineers *some* credit. I
seriously doubt they are *THAT* stupid.

Martijn

-- 
Buy Wicket in Action: http://manning.com/dashorst
Apache Wicket 1.3.2 is released
Get it now: http://www.apache.org/dyn/closer.cgi/wicket/1.3.2

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


RE: Removing the jsessionid for SEO

Posted by Dan Kaplan <dk...@citizenhawk.com>.
Clarifications:

When I said "About those 404s", I was talking about if you use the fix I
provided and turn off cookies on your browser.

When I said, "If I ban cookies" I mean to say, "If I require cookies"

-----Original Message-----
From: Dan Kaplan [mailto:dkaplan@citizenhawk.com] 
Sent: Thursday, April 03, 2008 3:22 PM
To: users@wicket.apache.org
Subject: RE: Removing the jsessionid for SEO

Regardless, at the very least this makes your site look "weird" and
unprofessional when google puts a jsessionid on your url.  There has got to
be some negative effect when google visits it the second time and the
jsessionid has changed but it sees the same exact content.  Worst case,
it'll think you're trying to trick it.

About those 404s, I'm finding that with the fix I provided I don't get a
404, but the links refresh the page I'm already on.  IE: If I'm on A, and a
link to B is non-bookmarkable, clicking B refreshes A.  

This issue is very disconcerting to me.  It's one of the reasons I wish that
DataView had an option to work in stateless mode.  Cause if I ban cookies
and Googlebot visits my home page (with a navigator on it), it'll try to
follow all these page links and from its perspective, they all lead back to
the first page.  So it's kinda a catch-22: Include the jsessionid in the
urls and get bad SEO or remove the jsessionid and get bad SEO :(

Perhaps the answer to my prayers is a combination of the noindex/nofollow
meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the home
page (so googlebot doesn't try to follow the navigator links) and use the
sitemap.xml to point out the individual pages I want it to index.  


Matej: can you go into more detail about your hybrid URL statement?  Won't
google index, for example, /home and /home.1 if I use it?  When it follows
the next page, won't the url become /home.1.2 or something?  That .2 is a
page version: If google indexes that and tries to visit it again, won't it
report about an invalid session?  
 
-----Original Message-----
From: Matej Knopp [mailto:matej.knopp@gmail.com] 
Sent: Thursday, April 03, 2008 11:10 AM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

On the other hand, crawling non-bookmarkable pages is not very useful
anyway, since ?wicket:interface url will always get page expired when
you click on the result.

However, preserving session makes lot of sense with hybrid url. Google
remembers the original url (without page instance) while indexing the
real page (after redirect).

I think though that the crawler is quite advanced. I'm would think  it
supports cookies (at least JSESSIONID) as well as it evaluates some of
the javascript on page.

-Matej

On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg <ig...@gmail.com>
wrote:
> right. if you strip sessionid then all your nonbookmarkable urls will
>  resolve to a 404. that will probably drop your rank a lot faster....
>
>  -igor
>
>
>
>
>  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner <jc...@gmail.com>
wrote:
>  > the problem is that then you have to have all stateless pages. Else
google
>  >  can't crawl your website.
>  >  And if that is the case then you could be completely stateless so you
dont
>  >  have a session (id) to worry about at all.
>  >
>  >  johan
>  >
>  >
>  >
>  >
>  >
>  >
>  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
>  >  Larry.Zappaterrini@fnis.com> wrote:
>  >
>  >  > When Google asks to not have special treatment for their bot, they
are
>  >  > referring to content more than anything. Regarding the session id
being
>  >  > coded in the URL, see the Technical guidelines section of Google's
>  >  > Webmaster Guidelines -
>  >  >
http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>  >  > gn
>  >  >
>  >  > It specifically recommends "allow(ing) search bots to crawl your
sites
>  >  > without session IDs or arguments that track their path through the
>  >  > site."
>  >  >
>  >  > -----Original Message-----
>  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
>  >  > Sent: Thursday, April 03, 2008 7:35 AM
>  >  > To: users@wicket.apache.org
>  >  > Subject: Re: Removing the jsessionid for SEO
>  >  >
>  >  > isnt google always saying that you shouldn't alter behavior of your
site
>  >  > depending of it is there bot or not?
>  >  >
>  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W. <a_...@gazeta.pl>
wrote:
>  >  >
>  >  > >
>  >  > > Hi!
>  >  > >
>  >  > >
>  >  > > igor.vaynberg wrote:
>  >  > > >
>  >  > > > also by doing what you have done users with cookies disabled
wont be
>  >  > > > able to use your site...
>  >  > > >
>  >  > >
>  >  > > In my opinion session id is a problem. Google index the same page
>  >  > again
>  >  > > and
>  >  > > again.
>  >  > >
>  >  > > About the users without cookies we can do like this:
>  >  > >
>  >  > >
>  >  > >        static class Unbuffered extends WebResponse {
>  >  > >
>  >  > >                 private static final String[] botAgents = {
>  >  > "onetszukaj",
>  >  > > "googlebot",
>  >  > > "appie", "architext",
>  >  > >                        "jeeves", "bjaaland", "ferret", "gulliver",
>  >  > > "harvest", "htdig",
>  >  > >                        "linkwalker", "lycos_", "moget",
>  >  > "muscatferret",
>  >  > > "myweb", "nomad",
>  >  > > "scooter",
>  >  > >                        "yahoo!\\sslurp\\schina", "slurp",
"weblayers",
>  >  > > "antibot", "bruinbot",
>  >  > > "digout4u",
>  >  > >                        "echo!", "ia_archiver", "jennybot",
"mercator",
>  >  > > "netcraft", "msnbot",
>  >  > > "petersnews",
>  >  > >                        "unlost_web_crawler", "voila", "webbase",
>  >  > > "webcollage", "cfetch",
>  >  > > "zyborg",
>  >  > >                        "wisenutbot", "robot", "crawl", "spider" };
/*
>  >  > and
>  >  > > so on... */
>  >  > >
>  >  > >                public Unbuffered(final HttpServletResponse res) {
>  >  > >            super(res);
>  >  > >         }
>  >  > >
>  >  > >        @Override
>  >  > >        public CharSequence encodeURL(final CharSequence url) {
>  >  > >             return isAgent() ? url : super.encodeURL(url);
>  >  > >        }
>  >  > >
>  >  > >                private static boolean isAgent() {
>  >  > >
>  >  > >                        String agent =
>  >  > >
>  >  > >
>  >  >
((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>  >  > tHeader("User-Agent");
>  >  > >
>  >  > >                        for(String bot : botAgents) {
>  >  > >                                if
(agent.toLowerCase().indexOf(bot) !=
>  >  > -1)
>  >  > > {
>  >  > >                                        return true;
>  >  > >                                }
>  >  > >                        }
>  >  > >
>  >  > >                        return false;
>  >  > >                }
>  >  > >    }
>  >  > >
>  >  > >
>  >  > > I didn't test this code but I do similar thing in my old
application
>  >  > in
>  >  > > Spring and it works.
>  >  > >
>  >  > > Take care,
>  >  > > Artur
>  >  > >
>  >  > >
>  >  > > --
>  >  > > View this message in context:
>  >  > >
>  >  >
http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
>  >  >
6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646
7396.html>
>  >
>  >
>  > > > Sent from the Wicket - User mailing list archive at Nabble.com.
>  >  > >
>  >  > >
>  >  > >
---------------------------------------------------------------------
>  >  > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  > > For additional commands, e-mail: users-help@wicket.apache.org
>  >  > >
>  >  > >
>  >  >
>  >  > ______________
>  >  >
>  >  > The information contained in this message is proprietary and/or
>  >  > confidential. If you are not the
>  >  > intended recipient, please: (i) delete the message and all copies;
(ii) do
>  >  > not disclose,
>  >  > distribute or use the message in any manner; and (iii) notify the
sender
>  >  > immediately. In addition,
>  >  > please be aware that any message addressed to our domain is subject
to
>  >  > archiving and review by
>  >  > persons other than the intended recipient. Thank you.
>  >  > _____________
>  >  >
>  >  >
---------------------------------------------------------------------
>  >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  > For additional commands, e-mail: users-help@wicket.apache.org
>  >  >
>  >  >
>  >
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>



-- 
Resizable and reorderable grid components.
http://www.inmethod.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Rüdiger Schulz <ru...@googlemail.com>.
I'll wrap something up in the course of this week, and post it on my blog.
(so little time a.t.m.)

greetings,


Rüdiger

2008/4/14, Erik van Oosten <e....@grons.nl>:
>
> Hi Rüdiger,
>
> I would be very interested in the code.
> If you can not find a suitable repository, could you just do something
> simple like linking to a  zip from a blog post?
>
> Regards,
>
>     Erik.
>
>
>
>
> Rüdiger Schulz wrote:
> > Hello everybody,
> >
> > I just want to add my 2 cents to this discussion.
> >
> > At IndyPhone we too wanted to get rid of jesessionid-URLs in google's
> index.
> > Yeah, it would be nice if the google bot would be as clever as the one
> from
> > yahoo, and just remove them himself. But he doesn't.
> >
> > So I implemented a Servlet-Filter which checks the user agent header for
> > google bot, and skips the url rewriting just for those clients. As this
> will
> > generate lots of new sessions, the filter invalidates the session right
> > after the request. Also, if a crawler is doing a request containing a
> > jsessionid (which he stored before the filter was implemented), he
> redirects
> > the crawler to the same URL, just without the jsessionid parameter. That
> > way, the index will be updated for those old URLs.
> >
> > Now we have almost none of those URLs in google's index.
> >
> > If anyone is interested in the code, I'd be willing to publish this. As
> it
> > is not wicket specific, I could share it with some generic servlet tools
> OS
> > project - is there something like that on apache or elsewhere?
> >
> > But maybe Google is smarter by now, and it is not required anymore?
> >
> >
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>


-- 
greetings from Berlin,

Rüdiger Schulz

www.2rue.de
www.indyphone.de - Coole Handy Logos einfach selber bauen

Re: Removing the jsessionid for SEO

Posted by Jeremy Levy <je...@meetmoi.com>.
To clarify my message below:  With a CryptedUrlWebRequestCodingStrategy and
alot of BookmarkablePages.

On Thu, Apr 3, 2008 at 9:16 PM, Jeremy Levy <je...@meetmoi.com> wrote:

> We have a similar issue, and are trying the following out right now..
>
> http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40367
>
> User-agent: *
> Disallow: /*?
>
>
>
>
> On Thu, Apr 3, 2008 at 9:09 PM, Dan Kaplan <dk...@citizenhawk.com>
> wrote:
>
> > Ok, at least I'm not missing anything.  I understand the benefits it's
> > providing with its stateful framework.  Developing a site with Wicket is
> > easier than with any other framework I've used.  But this statefulness,
> > which makes websites so easy to develop, seems to be counter productive
> > to
> > SEO:
> >
> > GoogleBot will follow and index stateful links.  Worst case scenario,
> > these
> > actually become visible to google users and when they click the link it
> > takes them to an "invalid session" page.  They think, "This site is
> > broken"
> > and move on to the next link of their search result.
> >
> > Another approach to solving this is to block all the stateful pages in
> > my
> > robots.txt file.  But how can I block these links in robots.txt since
> > they
> > change per session?  Is there any way to know what the url will resolve
> > to
> > when googlebot tries to visit my site so I can tell it to disallow:
> > /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?
> >
> >
> > > -----Original Message-----
> > > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > Sent: Thursday, April 03, 2008 5:45 PM
> > > To: users@wicket.apache.org
> > > Subject: Re: Removing the jsessionid for SEO
> > >
> > > On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan <dk...@citizenhawk.com>
> > > wrote:
> > > > Ok I did a little preliminary research on this.  Right now
> > > PagingNavigator
> > > >  uses PagingNavigationLink's to represent its page.  This extends
> > Link.
> > > I'm
> > > >  supposed to override PagingNavigator's newPagingNavigationLink()
> > method
> > > to
> > > >  accomplish this (I think) but past that, this isn't very
> > > straightforward to
> > > >  me.
> > > >
> > > >  Do I need to create my own BookmarkablePagingNavigationLink?  When
> > I
> > > do...
> > > >  what next?  I really don't know enough about bookmarkablePageLinks
> > to
> > > do
> > > >  this.  Right now, all the magic happens inside
> > PagingNavigationLink.
> > > Won't
> > > >  I have to move all that logic into the WebPage that I'm passing
> > into
> > > >  BookmarkablePagingNavigationLink?  This seems like a lot of work.
> >  Am I
> > > >  missing something critical?
> > >
> > > no, you are not missing anything. you see, when you go stateless, like
> > > what you want, then you have to recreate all the magic stuff that
> > > makes stateful links Just Work. Without state you are back to the
> > > servlet/mvc programming model: you have to encode the state that you
> > > want into the link, then on the trip back decode it, recreate
> > > something from it, and then apply that something onto the components.
> > > This is the crapwork that wicket does for you usually.
> > >
> > > -igor
> > >
> > >
> > > >
> > > >
> > > >  > -----Original Message-----
> > > >  > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > >
> > > >
> > > > > Sent: Thursday, April 03, 2008 3:40 PM
> > > >  > To: users@wicket.apache.org
> > > >  > Subject: Re: Removing the jsessionid for SEO
> > > >  >
> > > >  > you subclass the pagenavigator and make it use bookmarkable links
> > > >  > also. it has factory methods for all the links it uses.
> > > >  >
> > > >  > -igor
> > > >  >
> > > >  >
> > > >  > On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan <
> > dkaplan@citizenhawk.com>
> > > >  > wrote:
> > > >  > > I wasn't talking about the links that are on the list (I
> > already
> > > make
> > > >  > those
> > > >  > >  bookmarkable).  I'm talking about the links that the Navigator
> > > >  > generates.
> > > >  > >  How do I make it so page 2 is bookmarkable?
> > > >  > >
> > > >  > >
> > > >  > >  -----Original Message-----
> > > >  > >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > >  > >
> > > >  > >
> > > >  > > Sent: Thursday, April 03, 2008 3:30 PM
> > > >  > >  To: users@wicket.apache.org
> > > >  > >  Subject: Re: Removing the jsessionid for SEO
> > > >  > >
> > > >  > >  instead of
> > > >  > >
> > > >  > >  item.add(new link("foo") { onclick() });
> > > >  > >
> > > >  > >  do
> > > >  > >
> > > >  > >  item.add(new bookmarkablepagelink("foo", page.class));
> > > >  > >
> > > >  > >  -igor
> > > >  > >
> > > >  > >
> > > >  > >  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
> > > <dk...@citizenhawk.com>
> > > >  > wrote:
> > > >  > >  > How?  I asked how to do it before and nobody suggested this
> > as a
> > > >  > >  >  possibility.
> > > >  > >  >
> > > >  > >  >
> > > >  > >  >
> > > >  > >  >  -----Original Message-----
> > > >  > >  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > >  > >  >  Sent: Thursday, April 03, 2008 3:26 PM
> > > >  > >  >  To: users@wicket.apache.org
> > > >  > >  >  Subject: Re: Removing the jsessionid for SEO
> > > >  > >  >
> > > >  > >  >  dataview can work in a stateless mode, just use
> > bookmarkable
> > > links
> > > >  > inside
> > > >  > >  it
> > > >  > >  >
> > > >  > >  >  -igor
> > > >  > >  >
> > > >  > >  >
> > > >  > >  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
> > > <dk...@citizenhawk.com>
> > > >  > >  wrote:
> > > >  > >  >  > Regardless, at the very least this makes your site look
> > > "weird"
> > > >  > and
> > > >  > >  >  >  unprofessional when google puts a jsessionid on your
> > url.
> > > There
> > > >  > has
> > > >  > >  got
> > > >  > >  >  to
> > > >  > >  >  >  be some negative effect when google visits it the second
> > > time and
> > > >  > the
> > > >  > >  >  >  jsessionid has changed but it sees the same exact
> > content.
> > > Worst
> > > >  > >  case,
> > > >  > >  >  >  it'll think you're trying to trick it.
> > > >  > >  >  >
> > > >  > >  >  >  About those 404s, I'm finding that with the fix I
> > provided I
> > > >  > don't get
> > > >  > >  a
> > > >  > >  >  >  404, but the links refresh the page I'm already on.  IE:
> > If
> > > I'm
> > > >  > on A,
> > > >  > >  and
> > > >  > >  >  a
> > > >  > >  >  >  link to B is non-bookmarkable, clicking B refreshes A.
> > > >  > >  >  >
> > > >  > >  >  >  This issue is very disconcerting to me.  It's one of the
> > > reasons
> > > >  > I
> > > >  > >  wish
> > > >  > >  >  that
> > > >  > >  >  >  DataView had an option to work in stateless mode.  Cause
> > if
> > > I ban
> > > >  > >  cookies
> > > >  > >  >  >  and Googlebot visits my home page (with a navigator on
> > it),
> > > it'll
> > > >  > try
> > > >  > >  to
> > > >  > >  >  >  follow all these page links and from its perspective,
> > they
> > > all
> > > >  > lead
> > > >  > >  back
> > > >  > >  >  to
> > > >  > >  >  >  the first page.  So it's kinda a catch-22: Include the
> > > jsessionid
> > > >  > in
> > > >  > >  the
> > > >  > >  >  >  urls and get bad SEO or remove the jsessionid and get
> > bad
> > > SEO :(
> > > >  > >  >  >
> > > >  > >  >  >  Perhaps the answer to my prayers is a combination of the
> > > >  > >  noindex/nofollow
> > > >  > >  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a
> > > nofollow
> > > >  > on the
> > > >  > >  >  home
> > > >  > >  >  >  page (so googlebot doesn't try to follow the navigator
> > > links) and
> > > >  > use
> > > >  > >  the
> > > >  > >  >  >  sitemap.xml to point out the individual pages I want it
> > to
> > > index.
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >  Matej: can you go into more detail about your hybrid URL
> > > >  > statement?
> > > >  > >  >  Won't
> > > >  > >  >  >  google index, for example, /home and /home.1 if I use
> > it?
> > > When
> > > >  > it
> > > >  > >  >  follows
> > > >  > >  >  >  the next page, won't the url become /home.1.2 or
> > something?
> > > That
> > > >  > .2
> > > >  > >  is a
> > > >  > >  >  >  page version: If google indexes that and tries to visit
> > it
> > > again,
> > > >  > >  won't
> > > >  > >  >  it
> > > >  > >  >  >  report about an invalid session?
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >  -----Original Message-----
> > > >  > >  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
> > > >  > >  >  >  Sent: Thursday, April 03, 2008 11:10 AM
> > > >  > >  >  >  To: users@wicket.apache.org
> > > >  > >  >  >  Subject: Re: Removing the jsessionid for SEO
> > > >  > >  >  >
> > > >  > >  >  >  On the other hand, crawling non-bookmarkable pages is
> > not
> > > very
> > > >  > useful
> > > >  > >  >  >  anyway, since ?wicket:interface url will always get page
> > > expired
> > > >  > when
> > > >  > >  >  >  you click on the result.
> > > >  > >  >  >
> > > >  > >  >  >  However, preserving session makes lot of sense with
> > hybrid
> > > url.
> > > >  > Google
> > > >  > >  >  >  remembers the original url (without page instance) while
> > > indexing
> > > >  > the
> > > >  > >  >  >  real page (after redirect).
> > > >  > >  >  >
> > > >  > >  >  >  I think though that the crawler is quite advanced. I'm
> > would
> > > >  > think  it
> > > >  > >  >  >  supports cookies (at least JSESSIONID) as well as it
> > > evaluates
> > > >  > some of
> > > >  > >  >  >  the javascript on page.
> > > >  > >  >  >
> > > >  > >  >  >  -Matej
> > > >  > >  >  >
> > > >  > >  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
> > > >  > >  <ig...@gmail.com>
> > > >  > >  >  >  wrote:
> > > >  > >  >  >  > right. if you strip sessionid then all your
> > > nonbookmarkable
> > > >  > urls
> > > >  > >  will
> > > >  > >  >  >  >  resolve to a 404. that will probably drop your rank a
> > lot
> > > >  > >  faster....
> > > >  > >  >  >  >
> > > >  > >  >  >  >  -igor
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
> > > >  > >  <jc...@gmail.com>
> > > >  > >  >  >  wrote:
> > > >  > >  >  >  >  > the problem is that then you have to have all
> > stateless
> > > >  > pages.
> > > >  > >  Else
> > > >  > >  >  >  google
> > > >  > >  >  >  >  >  can't crawl your website.
> > > >  > >  >  >  >  >  And if that is the case then you could be
> > completely
> > > >  > stateless
> > > >  > >  so
> > > >  > >  >  you
> > > >  > >  >  >  dont
> > > >  > >  >  >  >  >  have a session (id) to worry about at all.
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >  johan
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini,
> > Larry <
> > > >  > >  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >  > When Google asks to not have special treatment
> > for
> > > their
> > > >  > bot,
> > > >  > >  >  they
> > > >  > >  >  >  are
> > > >  > >  >  >  >  >  > referring to content more than anything.
> > Regarding
> > > the
> > > >  > session
> > > >  > >  id
> > > >  > >  >  >  being
> > > >  > >  >  >  >  >  > coded in the URL, see the Technical guidelines
> > > section of
> > > >  > >  >  Google's
> > > >  > >  >  >  >  >  > Webmaster Guidelines -
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> > > >  > >
> > > >  >
> > >
> > http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
> > > >  > >  >  >  >  >  > gn
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > It specifically recommends "allow(ing) search
> > bots
> > > to
> > > >  > crawl
> > > >  > >  your
> > > >  > >  >  >  sites
> > > >  > >  >  >  >  >  > without session IDs or arguments that track
> > their
> > > path
> > > >  > through
> > > >  > >  >  the
> > > >  > >  >  >  >  >  > site."
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > -----Original Message-----
> > > >  > >  >  >  >  >  > From: Johan Compagner [mailto:
> > jcompagner@gmail.com]
> > > >  > >  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
> > > >  > >  >  >  >  >  > To: users@wicket.apache.org
> > > >  > >  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > isnt google always saying that you shouldn't
> > alter
> > > >  > behavior of
> > > >  > >  >  your
> > > >  > >  >  >  site
> > > >  > >  >  >  >  >  > depending of it is there bot or not?
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
> > > >  > <a_...@gazeta.pl>
> > > >  > >  >  >  wrote:
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > Hi!
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > igor.vaynberg wrote:
> > > >  > >  >  >  >  >  > > >
> > > >  > >  >  >  >  >  > > > also by doing what you have done users with
> > > cookies
> > > >  > >  disabled
> > > >  > >  >  >  wont be
> > > >  > >  >  >  >  >  > > > able to use your site...
> > > >  > >  >  >  >  >  > > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > In my opinion session id is a problem. Google
> > > index the
> > > >  > same
> > > >  > >  >  page
> > > >  > >  >  >  >  >  > again
> > > >  > >  >  >  >  >  > > and
> > > >  > >  >  >  >  >  > > again.
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > About the users without cookies we can do like
> > > this:
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >        static class Unbuffered extends
> > WebResponse
> > > {
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                 private static final String[]
> > > botAgents
> > > >  > = {
> > > >  > >  >  >  >  >  > "onetszukaj",
> > > >  > >  >  >  >  >  > > "googlebot",
> > > >  > >  >  >  >  >  > > "appie", "architext",
> > > >  > >  >  >  >  >  > >                        "jeeves", "bjaaland",
> > > "ferret",
> > > >  > >  >  "gulliver",
> > > >  > >  >  >  >  >  > > "harvest", "htdig",
> > > >  > >  >  >  >  >  > >                        "linkwalker", "lycos_",
> > > "moget",
> > > >  > >  >  >  >  >  > "muscatferret",
> > > >  > >  >  >  >  >  > > "myweb", "nomad",
> > > >  > >  >  >  >  >  > > "scooter",
> > > >  > >  >  >  >  >  > >
> >  "yahoo!\\sslurp\\schina",
> > > >  > "slurp",
> > > >  > >  >  >  "weblayers",
> > > >  > >  >  >  >  >  > > "antibot", "bruinbot",
> > > >  > >  >  >  >  >  > > "digout4u",
> > > >  > >  >  >  >  >  > >                        "echo!", "ia_archiver",
> > > >  > "jennybot",
> > > >  > >  >  >  "mercator",
> > > >  > >  >  >  >  >  > > "netcraft", "msnbot",
> > > >  > >  >  >  >  >  > > "petersnews",
> > > >  > >  >  >  >  >  > >                        "unlost_web_crawler",
> > > "voila",
> > > >  > >  >  "webbase",
> > > >  > >  >  >  >  >  > > "webcollage", "cfetch",
> > > >  > >  >  >  >  >  > > "zyborg",
> > > >  > >  >  >  >  >  > >                        "wisenutbot", "robot",
> > > "crawl",
> > > >  > >  "spider"
> > > >  > >  >  };
> > > >  > >  >  >  /*
> > > >  > >  >  >  >  >  > and
> > > >  > >  >  >  >  >  > > so on... */
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                public Unbuffered(final
> > > >  > HttpServletResponse
> > > >  > >  res)
> > > >  > >  >  {
> > > >  > >  >  >  >  >  > >            super(res);
> > > >  > >  >  >  >  >  > >         }
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >        @Override
> > > >  > >  >  >  >  >  > >        public CharSequence encodeURL(final
> > > CharSequence
> > > >  > url)
> > > >  > >  {
> > > >  > >  >  >  >  >  > >             return isAgent() ? url :
> > > >  > super.encodeURL(url);
> > > >  > >  >  >  >  >  > >        }
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                private static boolean
> > isAgent() {
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                        String agent =
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> > > >  > >
> > > >  >
> > >
> > ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
> > > >  > >  >  >  >  >  > tHeader("User-Agent");
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                        for(String bot :
> > botAgents)
> > > {
> > > >  > >  >  >  >  >  > >                                if
> > > >  > >  >  >  (agent.toLowerCase().indexOf(bot) !=
> > > >  > >  >  >  >  >  > -1)
> > > >  > >  >  >  >  >  > > {
> > > >  > >  >  >  >  >  > >                                        return
> > > true;
> > > >  > >  >  >  >  >  > >                                }
> > > >  > >  >  >  >  >  > >                        }
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                        return false;
> > > >  > >  >  >  >  >  > >                }
> > > >  > >  >  >  >  >  > >    }
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > I didn't test this code but I do similar thing
> > in
> > > my
> > > >  > old
> > > >  > >  >  >  application
> > > >  > >  >  >  >  >  > in
> > > >  > >  >  >  >  >  > > Spring and it works.
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > Take care,
> > > >  > >  >  >  >  >  > > Artur
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > --
> > > >  > >  >  >  >  >  > > View this message in context:
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> > > >  > >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> > > >  > tp16464534p1646739
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> > > >  > >  >
> > > >  > >  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> > > >  > tp16464534p1646
> > > >  > >  >  >  7396.html>
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  > > > Sent from the Wicket - User mailing list
> > archive at
> > > >  > >  Nabble.com.
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >
> >  ------------------------------------------------------------
> > > -----
> > > >  > ----
> > > >  > >  >  >  >  >  > > To unsubscribe, e-mail: users-
> > > >  > unsubscribe@wicket.apache.org
> > > >  > >  >  >  >  >  > > For additional commands, e-mail:
> > > >  > >  users-help@wicket.apache.org
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > ______________
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > The information contained in this message is
> > > proprietary
> > > >  > >  and/or
> > > >  > >  >  >  >  >  > confidential. If you are not the
> > > >  > >  >  >  >  >  > intended recipient, please: (i) delete the
> > message
> > > and
> > > >  > all
> > > >  > >  >  copies;
> > > >  > >  >  >  (ii) do
> > > >  > >  >  >  >  >  > not disclose,
> > > >  > >  >  >  >  >  > distribute or use the message in any manner; and
> > > (iii)
> > > >  > notify
> > > >  > >  the
> > > >  > >  >  >  sender
> > > >  > >  >  >  >  >  > immediately. In addition,
> > > >  > >  >  >  >  >  > please be aware that any message addressed to
> > our
> > > domain
> > > >  > is
> > > >  > >  >  subject
> > > >  > >  >  >  to
> > > >  > >  >  >  >  >  > archiving and review by
> > > >  > >  >  >  >  >  > persons other than the intended recipient. Thank
> > > you.
> > > >  > >  >  >  >  >  > _____________
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> >  ------------------------------------------------------------
> > > -----
> > > >  > ----
> > > >  > >  >  >  >  >  > To unsubscribe, e-mail: users-
> > > >  > unsubscribe@wicket.apache.org
> > > >  > >  >  >  >  >  > For additional commands, e-mail: users-
> > > >  > help@wicket.apache.org
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >
> >  ------------------------------------------------------------------
> > > ---
> > > >  > >  >  >  >  To unsubscribe, e-mail: users-
> > > unsubscribe@wicket.apache.org
> > > >  > >  >  >  >  For additional commands, e-mail: users-
> > > help@wicket.apache.org
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >  --
> > > >  > >  >  >  Resizable and reorderable grid components.
> > > >  > >  >  >  http://www.inmethod.com
> > > >  > >  >  >
> > > >  > >  >  >
> >  ------------------------------------------------------------
> > > -----
> > > >  > ----
> > > >  > >  >  >  To unsubscribe, e-mail:
> > users-unsubscribe@wicket.apache.org
> > > >  > >  >  >  For additional commands, e-mail: users-
> > > help@wicket.apache.org
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >
> >  ------------------------------------------------------------
> > > -----
> > > >  > ----
> > > >  > >  >  >  To unsubscribe, e-mail:
> > users-unsubscribe@wicket.apache.org
> > > >  > >  >  >  For additional commands, e-mail: users-
> > > help@wicket.apache.org
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >
> > > >  > >  >
> >  ---------------------------------------------------------------
> > > -----
> > > >  > -
> > > >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > >  >  For additional commands, e-mail:
> > users-help@wicket.apache.org
> > > >  > >  >
> > > >  > >  >
> > > >  > >  >
> >  ---------------------------------------------------------------
> > > -----
> > > >  > -
> > > >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > >  >  For additional commands, e-mail:
> > users-help@wicket.apache.org
> > > >  > >  >
> > > >  > >  >
> > > >  > >
> > > >  > >
> >  ------------------------------------------------------------------
> > > ---
> > > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > >  For additional commands, e-mail: users-help@wicket.apache.org
> > > >  > >
> > > >  > >
> > > >  > >
> >  ------------------------------------------------------------------
> > > ---
> > > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > >  For additional commands, e-mail: users-help@wicket.apache.org
> > > >  > >
> > > >  > >
> > > >  >
> > > >  >
> > ---------------------------------------------------------------------
> > > >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > For additional commands, e-mail: users-help@wicket.apache.org
> > > >
> > > >
> > > >
> >  ---------------------------------------------------------------------
> > > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  For additional commands, e-mail: users-help@wicket.apache.org
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > For additional commands, e-mail: users-help@wicket.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > For additional commands, e-mail: users-help@wicket.apache.org
> >
> >
>

Re: Removing the jsessionid for SEO

Posted by Rüdiger Schulz <ru...@googlemail.com>.
Hello,

I still didn't find the time to make a blog post about this. So I just
put the code on pastebin:

http://pastebin.org/31242

I'm looking forward to your feedback :)

I tested this filter on Jetty and Tomcat (with Firefox' user agent
switcher) where it worked fine. However, as stated in the code, some app
servers might behave a little different, so YMMV.


greetings,

Rüdiger



Am Montag, den 14.04.2008, 16:37 +0200 schrieb Korbinian Bachl - privat:
> Yeah, its quite a shame that google doesnt open source their logic ;)
> 
> would be nice if you could give us the code however, so we could have a 
> look at it :)
> 
> Rüdiger Schulz schrieb:
> > Hm, SEO is really a little bit like black science sometimes *g*
> > 
> > This (german) article states, that SID cloaking would be ok for google:
> > http://www.trafficmaxx.de/blog/google/gutes-cloaking-schlechtes-cloaking
> > 
> > Some more googling, and here someone seems to confirm this:
> > http://www.webmasterworld.com/cloaking/3201743.htm
> > " I was actually at SMX West and Matt Cutts specifically sa*id* that this is
> > OK"
> > 
> > All I can say in our case is that I added this filter several months ago,
> > and I can't see any negative effects so far.
> > 
> > 
> > greetings,
> > 
> > Rüdiger
> > 
> > 
> > 2008/4/14, Korbinian Bachl - privat <ko...@whiskyworld.de>:
> >> Hi Rüdiger,
> >>
> >> AFAIK this could lead to some punishment by google, as he browses the site
> >> multiple times using different agents and origin IPs and in case he sees
> >> different behaviours he thinks about cloaking/ prepared content and will act
> >> accordingly to it;
> >>
> >> This is usually noticed after the regular google index refreshes that
> >> happen some times a year - you should keep an eye onto this;
> >>
> >> Best,
> >>
> >> Korbinian
> >>
> >> Rüdiger Schulz schrieb:
> >>
> >>> Hello everybody,
> >>>
> >>> I just want to add my 2 cents to this discussion.
> >>>
> >>> At IndyPhone we too wanted to get rid of jesessionid-URLs in google's
> >>> index.
> >>> Yeah, it would be nice if the google bot would be as clever as the one
> >>> from
> >>> yahoo, and just remove them himself. But he doesn't.
> >>>
> >>> So I implemented a Servlet-Filter which checks the user agent header for
> >>> google bot, and skips the url rewriting just for those clients. As this
> >>> will
> >>> generate lots of new sessions, the filter invalidates the session right
> >>> after the request. Also, if a crawler is doing a request containing a
> >>> jsessionid (which he stored before the filter was implemented), he
> >>> redirects
> >>> the crawler to the same URL, just without the jsessionid parameter. That
> >>> way, the index will be updated for those old URLs.
> >>>
> >>> Now we have almost none of those URLs in google's index.
> >>>
> >>> If anyone is interested in the code, I'd be willing to publish this. As
> >>> it
> >>> is not wicket specific, I could share it with some generic servlet tools
> >>> OS
> >>> project - is there something like that on apache or elsewhere?
> >>>
> >>> But maybe Google is smarter by now, and it is not required anymore?
> >>>
> >>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >> For additional commands, e-mail: users-help@wicket.apache.org
> >>
> >>
> > 
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Korbinian Bachl - privat <ko...@whiskyworld.de>.
Yeah, its quite a shame that google doesnt open source their logic ;)

would be nice if you could give us the code however, so we could have a 
look at it :)

Rüdiger Schulz schrieb:
> Hm, SEO is really a little bit like black science sometimes *g*
> 
> This (german) article states, that SID cloaking would be ok for google:
> http://www.trafficmaxx.de/blog/google/gutes-cloaking-schlechtes-cloaking
> 
> Some more googling, and here someone seems to confirm this:
> http://www.webmasterworld.com/cloaking/3201743.htm
> " I was actually at SMX West and Matt Cutts specifically sa*id* that this is
> OK"
> 
> All I can say in our case is that I added this filter several months ago,
> and I can't see any negative effects so far.
> 
> 
> greetings,
> 
> Rüdiger
> 
> 
> 2008/4/14, Korbinian Bachl - privat <ko...@whiskyworld.de>:
>> Hi Rüdiger,
>>
>> AFAIK this could lead to some punishment by google, as he browses the site
>> multiple times using different agents and origin IPs and in case he sees
>> different behaviours he thinks about cloaking/ prepared content and will act
>> accordingly to it;
>>
>> This is usually noticed after the regular google index refreshes that
>> happen some times a year - you should keep an eye onto this;
>>
>> Best,
>>
>> Korbinian
>>
>> Rüdiger Schulz schrieb:
>>
>>> Hello everybody,
>>>
>>> I just want to add my 2 cents to this discussion.
>>>
>>> At IndyPhone we too wanted to get rid of jesessionid-URLs in google's
>>> index.
>>> Yeah, it would be nice if the google bot would be as clever as the one
>>> from
>>> yahoo, and just remove them himself. But he doesn't.
>>>
>>> So I implemented a Servlet-Filter which checks the user agent header for
>>> google bot, and skips the url rewriting just for those clients. As this
>>> will
>>> generate lots of new sessions, the filter invalidates the session right
>>> after the request. Also, if a crawler is doing a request containing a
>>> jsessionid (which he stored before the filter was implemented), he
>>> redirects
>>> the crawler to the same URL, just without the jsessionid parameter. That
>>> way, the index will be updated for those old URLs.
>>>
>>> Now we have almost none of those URLs in google's index.
>>>
>>> If anyone is interested in the code, I'd be willing to publish this. As
>>> it
>>> is not wicket specific, I could share it with some generic servlet tools
>>> OS
>>> project - is there something like that on apache or elsewhere?
>>>
>>> But maybe Google is smarter by now, and it is not required anymore?
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>> For additional commands, e-mail: users-help@wicket.apache.org
>>
>>
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Rüdiger Schulz <ru...@googlemail.com>.
Hm, SEO is really a little bit like black science sometimes *g*

This (german) article states, that SID cloaking would be ok for google:
http://www.trafficmaxx.de/blog/google/gutes-cloaking-schlechtes-cloaking

Some more googling, and here someone seems to confirm this:
http://www.webmasterworld.com/cloaking/3201743.htm
" I was actually at SMX West and Matt Cutts specifically sa*id* that this is
OK"

All I can say in our case is that I added this filter several months ago,
and I can't see any negative effects so far.


greetings,

Rüdiger


2008/4/14, Korbinian Bachl - privat <ko...@whiskyworld.de>:
>
> Hi Rüdiger,
>
> AFAIK this could lead to some punishment by google, as he browses the site
> multiple times using different agents and origin IPs and in case he sees
> different behaviours he thinks about cloaking/ prepared content and will act
> accordingly to it;
>
> This is usually noticed after the regular google index refreshes that
> happen some times a year - you should keep an eye onto this;
>
> Best,
>
> Korbinian
>
> Rüdiger Schulz schrieb:
>
> > Hello everybody,
> >
> > I just want to add my 2 cents to this discussion.
> >
> > At IndyPhone we too wanted to get rid of jesessionid-URLs in google's
> > index.
> > Yeah, it would be nice if the google bot would be as clever as the one
> > from
> > yahoo, and just remove them himself. But he doesn't.
> >
> > So I implemented a Servlet-Filter which checks the user agent header for
> > google bot, and skips the url rewriting just for those clients. As this
> > will
> > generate lots of new sessions, the filter invalidates the session right
> > after the request. Also, if a crawler is doing a request containing a
> > jsessionid (which he stored before the filter was implemented), he
> > redirects
> > the crawler to the same URL, just without the jsessionid parameter. That
> > way, the index will be updated for those old URLs.
> >
> > Now we have almost none of those URLs in google's index.
> >
> > If anyone is interested in the code, I'd be willing to publish this. As
> > it
> > is not wicket specific, I could share it with some generic servlet tools
> > OS
> > project - is there something like that on apache or elsewhere?
> >
> > But maybe Google is smarter by now, and it is not required anymore?
> >
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>


-- 
greetings from Berlin,

Rüdiger Schulz

www.2rue.de
www.indyphone.de - Coole Handy Logos einfach selber bauen

Re: Removing the jsessionid for SEO

Posted by Korbinian Bachl - privat <ko...@whiskyworld.de>.
Hi Rüdiger,

AFAIK this could lead to some punishment by google, as he browses the 
site multiple times using different agents and origin IPs and in case he 
sees different behaviours he thinks about cloaking/ prepared content and 
will act accordingly to it;

This is usually noticed after the regular google index refreshes that 
happen some times a year - you should keep an eye onto this;

Best,

Korbinian

Rüdiger Schulz schrieb:
> Hello everybody,
> 
> I just want to add my 2 cents to this discussion.
> 
> At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index.
> Yeah, it would be nice if the google bot would be as clever as the one from
> yahoo, and just remove them himself. But he doesn't.
> 
> So I implemented a Servlet-Filter which checks the user agent header for
> google bot, and skips the url rewriting just for those clients. As this will
> generate lots of new sessions, the filter invalidates the session right
> after the request. Also, if a crawler is doing a request containing a
> jsessionid (which he stored before the filter was implemented), he redirects
> the crawler to the same URL, just without the jsessionid parameter. That
> way, the index will be updated for those old URLs.
> 
> Now we have almost none of those URLs in google's index.
> 
> If anyone is interested in the code, I'd be willing to publish this. As it
> is not wicket specific, I could share it with some generic servlet tools OS
> project - is there something like that on apache or elsewhere?
> 
> But maybe Google is smarter by now, and it is not required anymore?
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Erik van Oosten <e....@grons.nl>.
Hi Rüdiger,

I would be very interested in the code.
If you can not find a suitable repository, could you just do something
simple like linking to a  zip from a blog post?

Regards,
    Erik.



Rüdiger Schulz wrote:
> Hello everybody,
>
> I just want to add my 2 cents to this discussion.
>
> At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index.
> Yeah, it would be nice if the google bot would be as clever as the one from
> yahoo, and just remove them himself. But he doesn't.
>
> So I implemented a Servlet-Filter which checks the user agent header for
> google bot, and skips the url rewriting just for those clients. As this will
> generate lots of new sessions, the filter invalidates the session right
> after the request. Also, if a crawler is doing a request containing a
> jsessionid (which he stored before the filter was implemented), he redirects
> the crawler to the same URL, just without the jsessionid parameter. That
> way, the index will be updated for those old URLs.
>
> Now we have almost none of those URLs in google's index.
>
> If anyone is interested in the code, I'd be willing to publish this. As it
> is not wicket specific, I could share it with some generic servlet tools OS
> project - is there something like that on apache or elsewhere?
>
> But maybe Google is smarter by now, and it is not required anymore?
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Rüdiger Schulz <ru...@googlemail.com>.
Hello everybody,

I just want to add my 2 cents to this discussion.

At IndyPhone we too wanted to get rid of jesessionid-URLs in google's index.
Yeah, it would be nice if the google bot would be as clever as the one from
yahoo, and just remove them himself. But he doesn't.

So I implemented a Servlet-Filter which checks the user agent header for
google bot, and skips the url rewriting just for those clients. As this will
generate lots of new sessions, the filter invalidates the session right
after the request. Also, if a crawler is doing a request containing a
jsessionid (which he stored before the filter was implemented), he redirects
the crawler to the same URL, just without the jsessionid parameter. That
way, the index will be updated for those old URLs.

Now we have almost none of those URLs in google's index.

If anyone is interested in the code, I'd be willing to publish this. As it
is not wicket specific, I could share it with some generic servlet tools OS
project - is there something like that on apache or elsewhere?

But maybe Google is smarter by now, and it is not required anymore?

-- 
greetings from Berlin,

Rüdiger Schulz

www.2rue.de
www.indyphone.de - Coole Handy Logos einfach selber bauen

Re: Removing the jsessionid for SEO

Posted by Korbinian Bachl - privat <ko...@whiskyworld.de>.
Hi Jeremy,

youre absolutely right; Nearly all spiders today can handle the default 
sessions, may it be Java, PHP, .Net etc. ; those guys at google and 
mircosoft arent beginners!

And its also important to understand that a URL with wicket in fact is 
to a part nothing more than a plain string that can be manipulated the 
way you like. Wicket (a.k.a. your page) needs 1 or maybe 2 parameters - 
and it has to know how to find them, the rest of the URL can be messed 
with so it fits to your needs as long as you can garantue the unique 
behaviour of content-to-URL so you dont get marked as duplicate content 
spammer;

Best,


Korbinian

Jeremy Thomerson schrieb:
> If I understood you correctly, the first page is bookmarkable, the second is
> a wicket URL, tied to the session.  That'd be bad for SEO - search engines
> couldn't see page 2, or they would, but the URL is tied to their session, so
> even if a user visited that URL, they wouldn't get that page.  This means
> that any content past page one is unreachable from a search engine.  I had
> another thread going about a problem I was having with sessions, which
> turned up some interesting data.  I have over 31,000 pages indexed by
> Google, they are visiting bookmarkable URLS that DO have jsessionid in them,
> but only two pages in their index have a jsessionid in them.  They obviously
> handle jsessionid fine these days, or at least they are for me.
> 
> If you need all of your content to be indexed, you really need to concern
> yourself with making every page bookmarkable.  Take a look at Korbinian's
> comments above - it looks like he is doing it well.  Or have a look at my
> comments or my site http://www.texashuntfish.com.
> 
> You should specifically look at http://www.texashuntfish.com/thf/app/forum -
> I am using DataTable's there, but every link (including sort, etc) is
> bookmarkable.  So, you may go into a category and get an URL like
> http://www.texashuntfish.com/thf/app/forum/cat-53/Let-s-Talk-Texas-Outdoors-Classifieds-Buy-Sell-Tradeor
> http://www.texashuntfish.com/thf/app/forum/18395/Winchester-22-model-61-for-sell.
> The "cat-53" or the "/18395/" are the only things that matters.  I have a
> strategy mounted on "/forum" that will take the first parameter and use it
> to decode what kind of page is being requested - a category page, or a
> specific post, etc.  Everything after that first parameter is specifically
> for SEO.
> 
> Putting good keywords in the URL like that, and putting the subject of every
> article / calendar event / news or forum thread is what shot us up in the
> rankings of multiple search engines.  Migrating the app from what it was
> before somerandomscript.cfm?foo=123123&bar=12321 to this made a HUGE
> difference.  It wasn't without work - Wicket is super easy if you don't have
> to worry about URLs - but they also make it easy to totally customize all of
> your URLs, too.
> 
> Shoot back any questions you have.  Hopefully I can share more information,
> or even some code later.  Maybe Korbinian and I should put some information
> on the Wiki about pretty URLs and SEO.
> 
> Jeremy
> 
> On Fri, Apr 4, 2008 at 1:09 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
> 
>> Thanks,
>>
>> That's kinda the route I've already taken.  On my site, www.startfound.com
>> ,
>> if you click on any company to see more details it goes to a bookmarkable
>> page.  Same with any tag.  Maybe if I've already got that much, I
>> shouldn't
>> concern myself with the fact that page 2 of my list is not bookmarkable
>> but
>> reachable by google bot.  Or maybe I should just add a noindex meta tag on
>> every page that's not page 1.
>>
>> It'd be kinda ridiculous to require login to see past page 1.  That may be
>> good for SEO but it'll drive people away.
>>
>>> -----Original Message-----
>>> From: jeremythomerson@gmail.com [mailto:jeremythomerson@gmail.com] On
>>> Behalf Of Jeremy Thomerson
>>> Sent: Thursday, April 03, 2008 10:00 PM
>>> To: users@wicket.apache.org
>>> Subject: Re: Removing the jsessionid for SEO
>>>
>>> I've been building a community-driven hunting and fishing site in Texas
>>> for
>>> the past year and a half.  Since I have converted it to Wicket from
>>> ColdFusion, our search engine rankings have gone WAY UP.  That's right,
>>> we're on the first page for tons of searches.  Search for "texas
>> hunting"
>>> -
>>> we're second under only the Texas Parks and Wildlife Association.
>>>
>>> How?  With Wicket?  Yes - it requires a little more work.  What I do is
>>> that
>>> for any link that I want Google to be able to follow, I have a subclass
>> of
>>> Link specific to that.  For instance, ViewThreadLink, which takes the ID
>>> for
>>> the link and a model (detachable) of the thread.  Then I mount an
>>> IRequestTargetUrlCodingStrategy for each big category of things in my
>>> webapp.  I've made several strategies that I use over and over, just
>>> giving
>>> them a different mount path and a different parameter to tell it what
>> kind
>>> of article, etc, that it will match to.  This is made easier because
>> over
>>> 75% of the objects in our site are all similar enough that the extend
>> from
>>> a
>>> base class that provides the basic functionality for an article / thread
>> /
>>> etc that has a title, text, pictures, comments, the standard stuff.
>>>
>>> So, yes, it takes work.  But that's okay - SEO always takes work.  I
>> also
>>> have given a lot of care to use good page titles, good semantic HTML and
>>> stuff things into the URL that don't have anything to do with locating
>> the
>>> resource, but give the search engines a clue as to what the content is.
>>>
>>> Yes, some pages end up with a jsessionid - and I don't like it (example:
>>> http://www.google.com/search?hl=en&client=firefox-a&rls=com.ubuntu%3Aen-
>>> US%3Aofficial&q=%22south+texas+management+buck%22&btnG=Search).
>>> But, most don't because almost all of my links are bookmarkable.  When
>> the
>>> user clicks something that they can only do as a signed-in user, then it
>>> redirects them to the sign in page, they sign in, and are taken back to
>>> the
>>> page they were on.  Then they can pick up, and I don't worry about
>>> bookmarkable URLs for anything that requires user-authentication
>> (wizards
>>> to
>>> post a new listing, story, admin links, etc).
>>>
>>> Jeremy Thomerson
>>> TexasHuntFish.com
>>>
>>> On Thu, Apr 3, 2008 at 8:09 PM, Dan Kaplan <dk...@citizenhawk.com>
>>> wrote:
>>>
>>>> Ok, at least I'm not missing anything.  I understand the benefits it's
>>>> providing with its stateful framework.  Developing a site with Wicket
>> is
>>>> easier than with any other framework I've used.  But this
>> statefulness,
>>>> which makes websites so easy to develop, seems to be counter
>> productive
>>> to
>>>> SEO:
>>>>
>>>> GoogleBot will follow and index stateful links.  Worst case scenario,
>>>> these
>>>> actually become visible to google users and when they click the link
>> it
>>>> takes them to an "invalid session" page.  They think, "This site is
>>>> broken"
>>>> and move on to the next link of their search result.
>>>>
>>>> Another approach to solving this is to block all the stateful pages in
>>> my
>>>> robots.txt file.  But how can I block these links in robots.txt since
>>> they
>>>> change per session?  Is there any way to know what the url will
>> resolve
>>> to
>>>> when googlebot tries to visit my site so I can tell it to disallow:
>>>> /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>>>>> Sent: Thursday, April 03, 2008 5:45 PM
>>>>> To: users@wicket.apache.org
>>>>> Subject: Re: Removing the jsessionid for SEO
>>>>>
>>>>> On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan <dk...@citizenhawk.com>
>>>>> wrote:
>>>>>> Ok I did a little preliminary research on this.  Right now
>>>>> PagingNavigator
>>>>>>  uses PagingNavigationLink's to represent its page.  This extends
>>>> Link.
>>>>> I'm
>>>>>>  supposed to override PagingNavigator's newPagingNavigationLink()
>>>> method
>>>>> to
>>>>>>  accomplish this (I think) but past that, this isn't very
>>>>> straightforward to
>>>>>>  me.
>>>>>>
>>>>>>  Do I need to create my own BookmarkablePagingNavigationLink?
>>  When
>>> I
>>>>> do...
>>>>>>  what next?  I really don't know enough about
>> bookmarkablePageLinks
>>> to
>>>>> do
>>>>>>  this.  Right now, all the magic happens inside
>>> PagingNavigationLink.
>>>>> Won't
>>>>>>  I have to move all that logic into the WebPage that I'm passing
>>> into
>>>>>>  BookmarkablePagingNavigationLink?  This seems like a lot of work.
>>> Am
>>>> I
>>>>>>  missing something critical?
>>>>> no, you are not missing anything. you see, when you go stateless,
>> like
>>>>> what you want, then you have to recreate all the magic stuff that
>>>>> makes stateful links Just Work. Without state you are back to the
>>>>> servlet/mvc programming model: you have to encode the state that you
>>>>> want into the link, then on the trip back decode it, recreate
>>>>> something from it, and then apply that something onto the
>> components.
>>>>> This is the crapwork that wicket does for you usually.
>>>>>
>>>>> -igor
>>>>>
>>>>>
>>>>>>
>>>>>>  > -----Original Message-----
>>>>>>  > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>>>>>>
>>>>>>
>>>>>>> Sent: Thursday, April 03, 2008 3:40 PM
>>>>>>  > To: users@wicket.apache.org
>>>>>>  > Subject: Re: Removing the jsessionid for SEO
>>>>>>  >
>>>>>>  > you subclass the pagenavigator and make it use bookmarkable
>> links
>>>>>>  > also. it has factory methods for all the links it uses.
>>>>>>  >
>>>>>>  > -igor
>>>>>>  >
>>>>>>  >
>>>>>>  > On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan
>>> <dkaplan@citizenhawk.com
>>>>>>  > wrote:
>>>>>>  > > I wasn't talking about the links that are on the list (I
>>> already
>>>>> make
>>>>>>  > those
>>>>>>  > >  bookmarkable).  I'm talking about the links that the
>> Navigator
>>>>>>  > generates.
>>>>>>  > >  How do I make it so page 2 is bookmarkable?
>>>>>>  > >
>>>>>>  > >
>>>>>>  > >  -----Original Message-----
>>>>>>  > >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>>>>>>  > >
>>>>>>  > >
>>>>>>  > > Sent: Thursday, April 03, 2008 3:30 PM
>>>>>>  > >  To: users@wicket.apache.org
>>>>>>  > >  Subject: Re: Removing the jsessionid for SEO
>>>>>>  > >
>>>>>>  > >  instead of
>>>>>>  > >
>>>>>>  > >  item.add(new link("foo") { onclick() });
>>>>>>  > >
>>>>>>  > >  do
>>>>>>  > >
>>>>>>  > >  item.add(new bookmarkablepagelink("foo", page.class));
>>>>>>  > >
>>>>>>  > >  -igor
>>>>>>  > >
>>>>>>  > >
>>>>>>  > >  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
>>>>> <dk...@citizenhawk.com>
>>>>>>  > wrote:
>>>>>>  > >  > How?  I asked how to do it before and nobody suggested
>> this
>>> as
>>>> a
>>>>>>  > >  >  possibility.
>>>>>>  > >  >
>>>>>>  > >  >
>>>>>>  > >  >
>>>>>>  > >  >  -----Original Message-----
>>>>>>  > >  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>>>>>>  > >  >  Sent: Thursday, April 03, 2008 3:26 PM
>>>>>>  > >  >  To: users@wicket.apache.org
>>>>>>  > >  >  Subject: Re: Removing the jsessionid for SEO
>>>>>>  > >  >
>>>>>>  > >  >  dataview can work in a stateless mode, just use
>>> bookmarkable
>>>>> links
>>>>>>  > inside
>>>>>>  > >  it
>>>>>>  > >  >
>>>>>>  > >  >  -igor
>>>>>>  > >  >
>>>>>>  > >  >
>>>>>>  > >  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
>>>>> <dk...@citizenhawk.com>
>>>>>>  > >  wrote:
>>>>>>  > >  >  > Regardless, at the very least this makes your site look
>>>>> "weird"
>>>>>>  > and
>>>>>>  > >  >  >  unprofessional when google puts a jsessionid on your
>>> url.
>>>>> There
>>>>>>  > has
>>>>>>  > >  got
>>>>>>  > >  >  to
>>>>>>  > >  >  >  be some negative effect when google visits it the
>> second
>>>>> time and
>>>>>>  > the
>>>>>>  > >  >  >  jsessionid has changed but it sees the same exact
>>> content.
>>>>> Worst
>>>>>>  > >  case,
>>>>>>  > >  >  >  it'll think you're trying to trick it.
>>>>>>  > >  >  >
>>>>>>  > >  >  >  About those 404s, I'm finding that with the fix I
>>> provided
>>>> I
>>>>>>  > don't get
>>>>>>  > >  a
>>>>>>  > >  >  >  404, but the links refresh the page I'm already on.
>>  IE:
>>>> If
>>>>> I'm
>>>>>>  > on A,
>>>>>>  > >  and
>>>>>>  > >  >  a
>>>>>>  > >  >  >  link to B is non-bookmarkable, clicking B refreshes A.
>>>>>>  > >  >  >
>>>>>>  > >  >  >  This issue is very disconcerting to me.  It's one of
>> the
>>>>> reasons
>>>>>>  > I
>>>>>>  > >  wish
>>>>>>  > >  >  that
>>>>>>  > >  >  >  DataView had an option to work in stateless mode.
>>  Cause
>>>> if
>>>>> I ban
>>>>>>  > >  cookies
>>>>>>  > >  >  >  and Googlebot visits my home page (with a navigator on
>>>> it),
>>>>> it'll
>>>>>>  > try
>>>>>>  > >  to
>>>>>>  > >  >  >  follow all these page links and from its perspective,
>>> they
>>>>> all
>>>>>>  > lead
>>>>>>  > >  back
>>>>>>  > >  >  to
>>>>>>  > >  >  >  the first page.  So it's kinda a catch-22: Include the
>>>>> jsessionid
>>>>>>  > in
>>>>>>  > >  the
>>>>>>  > >  >  >  urls and get bad SEO or remove the jsessionid and get
>>> bad
>>>>> SEO :(
>>>>>>  > >  >  >
>>>>>>  > >  >  >  Perhaps the answer to my prayers is a combination of
>> the
>>>>>>  > >  noindex/nofollow
>>>>>>  > >  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a
>>>>> nofollow
>>>>>>  > on the
>>>>>>  > >  >  home
>>>>>>  > >  >  >  page (so googlebot doesn't try to follow the navigator
>>>>> links) and
>>>>>>  > use
>>>>>>  > >  the
>>>>>>  > >  >  >  sitemap.xml to point out the individual pages I want
>> it
>>> to
>>>>> index.
>>>>>>  > >  >  >
>>>>>>  > >  >  >
>>>>>>  > >  >  >  Matej: can you go into more detail about your hybrid
>> URL
>>>>>>  > statement?
>>>>>>  > >  >  Won't
>>>>>>  > >  >  >  google index, for example, /home and /home.1 if I use
>>> it?
>>>>> When
>>>>>>  > it
>>>>>>  > >  >  follows
>>>>>>  > >  >  >  the next page, won't the url become /home.1.2 or
>>>> something?
>>>>> That
>>>>>>  > .2
>>>>>>  > >  is a
>>>>>>  > >  >  >  page version: If google indexes that and tries to
>> visit
>>> it
>>>>> again,
>>>>>>  > >  won't
>>>>>>  > >  >  it
>>>>>>  > >  >  >  report about an invalid session?
>>>>>>  > >  >  >
>>>>>>  > >  >  >
>>>>>>  > >  >  >
>>>>>>  > >  >  >  -----Original Message-----
>>>>>>  > >  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
>>>>>>  > >  >  >  Sent: Thursday, April 03, 2008 11:10 AM
>>>>>>  > >  >  >  To: users@wicket.apache.org
>>>>>>  > >  >  >  Subject: Re: Removing the jsessionid for SEO
>>>>>>  > >  >  >
>>>>>>  > >  >  >  On the other hand, crawling non-bookmarkable pages is
>>> not
>>>>> very
>>>>>>  > useful
>>>>>>  > >  >  >  anyway, since ?wicket:interface url will always get
>> page
>>>>> expired
>>>>>>  > when
>>>>>>  > >  >  >  you click on the result.
>>>>>>  > >  >  >
>>>>>>  > >  >  >  However, preserving session makes lot of sense with
>>> hybrid
>>>>> url.
>>>>>>  > Google
>>>>>>  > >  >  >  remembers the original url (without page instance)
>> while
>>>>> indexing
>>>>>>  > the
>>>>>>  > >  >  >  real page (after redirect).
>>>>>>  > >  >  >
>>>>>>  > >  >  >  I think though that the crawler is quite advanced. I'm
>>>> would
>>>>>>  > think  it
>>>>>>  > >  >  >  supports cookies (at least JSESSIONID) as well as it
>>>>> evaluates
>>>>>>  > some of
>>>>>>  > >  >  >  the javascript on page.
>>>>>>  > >  >  >
>>>>>>  > >  >  >  -Matej
>>>>>>  > >  >  >
>>>>>>  > >  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
>>>>>>  > >  <ig...@gmail.com>
>>>>>>  > >  >  >  wrote:
>>>>>>  > >  >  >  > right. if you strip sessionid then all your
>>>>> nonbookmarkable
>>>>>>  > urls
>>>>>>  > >  will
>>>>>>  > >  >  >  >  resolve to a 404. that will probably drop your rank
>> a
>>>> lot
>>>>>>  > >  faster....
>>>>>>  > >  >  >  >
>>>>>>  > >  >  >  >  -igor
>>>>>>  > >  >  >  >
>>>>>>  > >  >  >  >
>>>>>>  > >  >  >  >
>>>>>>  > >  >  >  >
>>>>>>  > >  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
>>>>>>  > >  <jc...@gmail.com>
>>>>>>  > >  >  >  wrote:
>>>>>>  > >  >  >  >  > the problem is that then you have to have all
>>>> stateless
>>>>>>  > pages.
>>>>>>  > >  Else
>>>>>>  > >  >  >  google
>>>>>>  > >  >  >  >  >  can't crawl your website.
>>>>>>  > >  >  >  >  >  And if that is the case then you could be
>>> completely
>>>>>>  > stateless
>>>>>>  > >  so
>>>>>>  > >  >  you
>>>>>>  > >  >  >  dont
>>>>>>  > >  >  >  >  >  have a session (id) to worry about at all.
>>>>>>  > >  >  >  >  >
>>>>>>  > >  >  >  >  >  johan
>>>>>>  > >  >  >  >  >
>>>>>>  > >  >  >  >  >
>>>>>>  > >  >  >  >  >
>>>>>>  > >  >  >  >  >
>>>>>>  > >  >  >  >  >
>>>>>>  > >  >  >  >  >
>>>>>>  > >  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini,
>>> Larry
>>>> <
>>>>>>  > >  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
>>>>>>  > >  >  >  >  >
>>>>>>  > >  >  >  >  >  > When Google asks to not have special treatment
>>> for
>>>>> their
>>>>>>  > bot,
>>>>>>  > >  >  they
>>>>>>  > >  >  >  are
>>>>>>  > >  >  >  >  >  > referring to content more than anything.
>>> Regarding
>>>>> the
>>>>>>  > session
>>>>>>  > >  id
>>>>>>  > >  >  >  being
>>>>>>  > >  >  >  >  >  > coded in the URL, see the Technical guidelines
>>>>> section of
>>>>>>  > >  >  Google's
>>>>>>  > >  >  >  >  >  > Webmaster Guidelines -
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >
>>>>>>  > >
>>>>>>  >
>>> http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>>>>>>  > >  >  >  >  >  > gn
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >  >  >  > It specifically recommends "allow(ing) search
>>> bots
>>>>> to
>>>>>>  > crawl
>>>>>>  > >  your
>>>>>>  > >  >  >  sites
>>>>>>  > >  >  >  >  >  > without session IDs or arguments that track
>>> their
>>>>> path
>>>>>>  > through
>>>>>>  > >  >  the
>>>>>>  > >  >  >  >  >  > site."
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >  >  >  > -----Original Message-----
>>>>>>  > >  >  >  >  >  > From: Johan Compagner
>>> [mailto:jcompagner@gmail.com
>>>> ]
>>>>>>  > >  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
>>>>>>  > >  >  >  >  >  > To: users@wicket.apache.org
>>>>>>  > >  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >  >  >  > isnt google always saying that you shouldn't
>>> alter
>>>>>>  > behavior of
>>>>>>  > >  >  your
>>>>>>  > >  >  >  site
>>>>>>  > >  >  >  >  >  > depending of it is there bot or not?
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
>>>>>>  > <a_...@gazeta.pl>
>>>>>>  > >  >  >  wrote:
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > > Hi!
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > > igor.vaynberg wrote:
>>>>>>  > >  >  >  >  >  > > >
>>>>>>  > >  >  >  >  >  > > > also by doing what you have done users
>> with
>>>>> cookies
>>>>>>  > >  disabled
>>>>>>  > >  >  >  wont be
>>>>>>  > >  >  >  >  >  > > > able to use your site...
>>>>>>  > >  >  >  >  >  > > >
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > > In my opinion session id is a problem.
>> Google
>>>>> index the
>>>>>>  > same
>>>>>>  > >  >  page
>>>>>>  > >  >  >  >  >  > again
>>>>>>  > >  >  >  >  >  > > and
>>>>>>  > >  >  >  >  >  > > again.
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > > About the users without cookies we can do
>> like
>>>>> this:
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >        static class Unbuffered extends
>>>> WebResponse
>>>>> {
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >                 private static final
>> String[]
>>>>> botAgents
>>>>>>  > = {
>>>>>>  > >  >  >  >  >  > "onetszukaj",
>>>>>>  > >  >  >  >  >  > > "googlebot",
>>>>>>  > >  >  >  >  >  > > "appie", "architext",
>>>>>>  > >  >  >  >  >  > >                        "jeeves", "bjaaland",
>>>>> "ferret",
>>>>>>  > >  >  "gulliver",
>>>>>>  > >  >  >  >  >  > > "harvest", "htdig",
>>>>>>  > >  >  >  >  >  > >                        "linkwalker",
>> "lycos_",
>>>>> "moget",
>>>>>>  > >  >  >  >  >  > "muscatferret",
>>>>>>  > >  >  >  >  >  > > "myweb", "nomad",
>>>>>>  > >  >  >  >  >  > > "scooter",
>>>>>>  > >  >  >  >  >  > >
>>> "yahoo!\\sslurp\\schina",
>>>>>>  > "slurp",
>>>>>>  > >  >  >  "weblayers",
>>>>>>  > >  >  >  >  >  > > "antibot", "bruinbot",
>>>>>>  > >  >  >  >  >  > > "digout4u",
>>>>>>  > >  >  >  >  >  > >                        "echo!",
>> "ia_archiver",
>>>>>>  > "jennybot",
>>>>>>  > >  >  >  "mercator",
>>>>>>  > >  >  >  >  >  > > "netcraft", "msnbot",
>>>>>>  > >  >  >  >  >  > > "petersnews",
>>>>>>  > >  >  >  >  >  > >                        "unlost_web_crawler",
>>>>> "voila",
>>>>>>  > >  >  "webbase",
>>>>>>  > >  >  >  >  >  > > "webcollage", "cfetch",
>>>>>>  > >  >  >  >  >  > > "zyborg",
>>>>>>  > >  >  >  >  >  > >                        "wisenutbot",
>> "robot",
>>>>> "crawl",
>>>>>>  > >  "spider"
>>>>>>  > >  >  };
>>>>>>  > >  >  >  /*
>>>>>>  > >  >  >  >  >  > and
>>>>>>  > >  >  >  >  >  > > so on... */
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >                public Unbuffered(final
>>>>>>  > HttpServletResponse
>>>>>>  > >  res)
>>>>>>  > >  >  {
>>>>>>  > >  >  >  >  >  > >            super(res);
>>>>>>  > >  >  >  >  >  > >         }
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >        @Override
>>>>>>  > >  >  >  >  >  > >        public CharSequence encodeURL(final
>>>>> CharSequence
>>>>>>  > url)
>>>>>>  > >  {
>>>>>>  > >  >  >  >  >  > >             return isAgent() ? url :
>>>>>>  > super.encodeURL(url);
>>>>>>  > >  >  >  >  >  > >        }
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >                private static boolean
>>> isAgent()
>>>> {
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >                        String agent =
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >
>>>>>>  > >
>>>>>>  >
>>> ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>>>>>>  > >  >  >  >  >  > tHeader("User-Agent");
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >                        for(String bot :
>>>> botAgents)
>>>>> {
>>>>>>  > >  >  >  >  >  > >                                if
>>>>>>  > >  >  >  (agent.toLowerCase().indexOf(bot) !=
>>>>>>  > >  >  >  >  >  > -1)
>>>>>>  > >  >  >  >  >  > > {
>>>>>>  > >  >  >  >  >  > >
>>  return
>>>>> true;
>>>>>>  > >  >  >  >  >  > >                                }
>>>>>>  > >  >  >  >  >  > >                        }
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >                        return false;
>>>>>>  > >  >  >  >  >  > >                }
>>>>>>  > >  >  >  >  >  > >    }
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > > I didn't test this code but I do similar
>> thing
>>>> in
>>>>> my
>>>>>>  > old
>>>>>>  > >  >  >  application
>>>>>>  > >  >  >  >  >  > in
>>>>>>  > >  >  >  >  >  > > Spring and it works.
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > > Take care,
>>>>>>  > >  >  >  >  >  > > Artur
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > > --
>>>>>>  > >  >  >  >  >  > > View this message in context:
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >
>>>>>>  > >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-
>>>>>>  > tp16464534p1646739
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >
>>>>>>  > >  >
>>>>>>  > >  6.html<
>> http://www.nabble.com/Removing-the-jsessionid-for-SEO-
>>>>>>  > tp16464534p1646
>>>>>>  > >  >  >  7396.html>
>>>>>>  > >  >  >  >  >
>>>>>>  > >  >  >  >  >
>>>>>>  > >  >  >  >  > > > Sent from the Wicket - User mailing list
>>> archive
>>>> at
>>>>>>  > >  Nabble.com.
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >
>>>>  ------------------------------------------------------------
>>>>> -----
>>>>>>  > ----
>>>>>>  > >  >  >  >  >  > > To unsubscribe, e-mail: users-
>>>>>>  > unsubscribe@wicket.apache.org
>>>>>>  > >  >  >  >  >  > > For additional commands, e-mail:
>>>>>>  > >  users-help@wicket.apache.org
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  > >
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >  >  >  > ______________
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >  >  >  > The information contained in this message is
>>>>> proprietary
>>>>>>  > >  and/or
>>>>>>  > >  >  >  >  >  > confidential. If you are not the
>>>>>>  > >  >  >  >  >  > intended recipient, please: (i) delete the
>>> message
>>>>> and
>>>>>>  > all
>>>>>>  > >  >  copies;
>>>>>>  > >  >  >  (ii) do
>>>>>>  > >  >  >  >  >  > not disclose,
>>>>>>  > >  >  >  >  >  > distribute or use the message in any manner;
>> and
>>>>> (iii)
>>>>>>  > notify
>>>>>>  > >  the
>>>>>>  > >  >  >  sender
>>>>>>  > >  >  >  >  >  > immediately. In addition,
>>>>>>  > >  >  >  >  >  > please be aware that any message addressed to
>>> our
>>>>> domain
>>>>>>  > is
>>>>>>  > >  >  subject
>>>>>>  > >  >  >  to
>>>>>>  > >  >  >  >  >  > archiving and review by
>>>>>>  > >  >  >  >  >  > persons other than the intended recipient.
>> Thank
>>>>> you.
>>>>>>  > >  >  >  >  >  > _____________
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >
>>>>  ------------------------------------------------------------
>>>>> -----
>>>>>>  > ----
>>>>>>  > >  >  >  >  >  > To unsubscribe, e-mail: users-
>>>>>>  > unsubscribe@wicket.apache.org
>>>>>>  > >  >  >  >  >  > For additional commands, e-mail: users-
>>>>>>  > help@wicket.apache.org
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >  >  >  >
>>>>>>  > >  >  >  >  >
>>>>>>  > >  >  >  >
>>>>>>  > >  >  >  >
>>>>>>  > >
>>>>  ------------------------------------------------------------------
>>>>> ---
>>>>>>  > >  >  >  >  To unsubscribe, e-mail: users-
>>>>> unsubscribe@wicket.apache.org
>>>>>>  > >  >  >  >  For additional commands, e-mail: users-
>>>>> help@wicket.apache.org
>>>>>>  > >  >  >  >
>>>>>>  > >  >  >  >
>>>>>>  > >  >  >
>>>>>>  > >  >  >
>>>>>>  > >  >  >
>>>>>>  > >  >  >  --
>>>>>>  > >  >  >  Resizable and reorderable grid components.
>>>>>>  > >  >  >  http://www.inmethod.com
>>>>>>  > >  >  >
>>>>>>  > >  >  >
>>>>  ------------------------------------------------------------
>>>>> -----
>>>>>>  > ----
>>>>>>  > >  >  >  To unsubscribe, e-mail:
>>>> users-unsubscribe@wicket.apache.org
>>>>>>  > >  >  >  For additional commands, e-mail: users-
>>>>> help@wicket.apache.org
>>>>>>  > >  >  >
>>>>>>  > >  >  >
>>>>>>  > >  >  >
>>>>  ------------------------------------------------------------
>>>>> -----
>>>>>>  > ----
>>>>>>  > >  >  >  To unsubscribe, e-mail:
>>>> users-unsubscribe@wicket.apache.org
>>>>>>  > >  >  >  For additional commands, e-mail: users-
>>>>> help@wicket.apache.org
>>>>>>  > >  >  >
>>>>>>  > >  >  >
>>>>>>  > >  >
>>>>>>  > >  >
>>>>  ---------------------------------------------------------------
>>>>> -----
>>>>>>  > -
>>>>>>  > >  >  To unsubscribe, e-mail:
>> users-unsubscribe@wicket.apache.org
>>>>>>  > >  >  For additional commands, e-mail: users-
>>> help@wicket.apache.org
>>>>>>  > >  >
>>>>>>  > >  >
>>>>>>  > >  >
>>>>  ---------------------------------------------------------------
>>>>> -----
>>>>>>  > -
>>>>>>  > >  >  To unsubscribe, e-mail:
>> users-unsubscribe@wicket.apache.org
>>>>>>  > >  >  For additional commands, e-mail: users-
>>> help@wicket.apache.org
>>>>>>  > >  >
>>>>>>  > >  >
>>>>>>  > >
>>>>>>  > >
>>>>  ------------------------------------------------------------------
>>>>> ---
>>>>>>  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>  > >  For additional commands, e-mail:
>> users-help@wicket.apache.org
>>>>>>  > >
>>>>>>  > >
>>>>>>  > >
>>>>  ------------------------------------------------------------------
>>>>> ---
>>>>>>  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>  > >  For additional commands, e-mail:
>> users-help@wicket.apache.org
>>>>>>  > >
>>>>>>  > >
>>>>>>  >
>>>>>>  >
>>>> ---------------------------------------------------------------------
>>>>>>  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>  > For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>
>>>>>>
>>>>>>
>>  -------------------------------------------------------------------
>>> --
>>>>>>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>>>  For additional commands, e-mail: users-help@wicket.apache.org
>>>>>>
>>>>>>
>>>>>
>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>>>> For additional commands, e-mail: users-help@wicket.apache.org
>>>>
>>>>
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Jeremy Thomerson <je...@thomersonfamily.com>.
If I understood you correctly, the first page is bookmarkable, the second is
a wicket URL, tied to the session.  That'd be bad for SEO - search engines
couldn't see page 2, or they would, but the URL is tied to their session, so
even if a user visited that URL, they wouldn't get that page.  This means
that any content past page one is unreachable from a search engine.  I had
another thread going about a problem I was having with sessions, which
turned up some interesting data.  I have over 31,000 pages indexed by
Google, they are visiting bookmarkable URLS that DO have jsessionid in them,
but only two pages in their index have a jsessionid in them.  They obviously
handle jsessionid fine these days, or at least they are for me.

If you need all of your content to be indexed, you really need to concern
yourself with making every page bookmarkable.  Take a look at Korbinian's
comments above - it looks like he is doing it well.  Or have a look at my
comments or my site http://www.texashuntfish.com.

You should specifically look at http://www.texashuntfish.com/thf/app/forum -
I am using DataTable's there, but every link (including sort, etc) is
bookmarkable.  So, you may go into a category and get an URL like
http://www.texashuntfish.com/thf/app/forum/cat-53/Let-s-Talk-Texas-Outdoors-Classifieds-Buy-Sell-Tradeor
http://www.texashuntfish.com/thf/app/forum/18395/Winchester-22-model-61-for-sell.
The "cat-53" or the "/18395/" are the only things that matters.  I have a
strategy mounted on "/forum" that will take the first parameter and use it
to decode what kind of page is being requested - a category page, or a
specific post, etc.  Everything after that first parameter is specifically
for SEO.

Putting good keywords in the URL like that, and putting the subject of every
article / calendar event / news or forum thread is what shot us up in the
rankings of multiple search engines.  Migrating the app from what it was
before somerandomscript.cfm?foo=123123&bar=12321 to this made a HUGE
difference.  It wasn't without work - Wicket is super easy if you don't have
to worry about URLs - but they also make it easy to totally customize all of
your URLs, too.

Shoot back any questions you have.  Hopefully I can share more information,
or even some code later.  Maybe Korbinian and I should put some information
on the Wiki about pretty URLs and SEO.

Jeremy

On Fri, Apr 4, 2008 at 1:09 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:

> Thanks,
>
> That's kinda the route I've already taken.  On my site, www.startfound.com
> ,
> if you click on any company to see more details it goes to a bookmarkable
> page.  Same with any tag.  Maybe if I've already got that much, I
> shouldn't
> concern myself with the fact that page 2 of my list is not bookmarkable
> but
> reachable by google bot.  Or maybe I should just add a noindex meta tag on
> every page that's not page 1.
>
> It'd be kinda ridiculous to require login to see past page 1.  That may be
> good for SEO but it'll drive people away.
>
> > -----Original Message-----
> > From: jeremythomerson@gmail.com [mailto:jeremythomerson@gmail.com] On
> > Behalf Of Jeremy Thomerson
> > Sent: Thursday, April 03, 2008 10:00 PM
> > To: users@wicket.apache.org
> > Subject: Re: Removing the jsessionid for SEO
> >
> > I've been building a community-driven hunting and fishing site in Texas
> > for
> > the past year and a half.  Since I have converted it to Wicket from
> > ColdFusion, our search engine rankings have gone WAY UP.  That's right,
> > we're on the first page for tons of searches.  Search for "texas
> hunting"
> > -
> > we're second under only the Texas Parks and Wildlife Association.
> >
> > How?  With Wicket?  Yes - it requires a little more work.  What I do is
> > that
> > for any link that I want Google to be able to follow, I have a subclass
> of
> > Link specific to that.  For instance, ViewThreadLink, which takes the ID
> > for
> > the link and a model (detachable) of the thread.  Then I mount an
> > IRequestTargetUrlCodingStrategy for each big category of things in my
> > webapp.  I've made several strategies that I use over and over, just
> > giving
> > them a different mount path and a different parameter to tell it what
> kind
> > of article, etc, that it will match to.  This is made easier because
> over
> > 75% of the objects in our site are all similar enough that the extend
> from
> > a
> > base class that provides the basic functionality for an article / thread
> /
> > etc that has a title, text, pictures, comments, the standard stuff.
> >
> > So, yes, it takes work.  But that's okay - SEO always takes work.  I
> also
> > have given a lot of care to use good page titles, good semantic HTML and
> > stuff things into the URL that don't have anything to do with locating
> the
> > resource, but give the search engines a clue as to what the content is.
> >
> > Yes, some pages end up with a jsessionid - and I don't like it (example:
> > http://www.google.com/search?hl=en&client=firefox-a&rls=com.ubuntu%3Aen-
> > US%3Aofficial&q=%22south+texas+management+buck%22&btnG=Search).
> > But, most don't because almost all of my links are bookmarkable.  When
> the
> > user clicks something that they can only do as a signed-in user, then it
> > redirects them to the sign in page, they sign in, and are taken back to
> > the
> > page they were on.  Then they can pick up, and I don't worry about
> > bookmarkable URLs for anything that requires user-authentication
> (wizards
> > to
> > post a new listing, story, admin links, etc).
> >
> > Jeremy Thomerson
> > TexasHuntFish.com
> >
> > On Thu, Apr 3, 2008 at 8:09 PM, Dan Kaplan <dk...@citizenhawk.com>
> > wrote:
> >
> > > Ok, at least I'm not missing anything.  I understand the benefits it's
> > > providing with its stateful framework.  Developing a site with Wicket
> is
> > > easier than with any other framework I've used.  But this
> statefulness,
> > > which makes websites so easy to develop, seems to be counter
> productive
> > to
> > > SEO:
> > >
> > > GoogleBot will follow and index stateful links.  Worst case scenario,
> > > these
> > > actually become visible to google users and when they click the link
> it
> > > takes them to an "invalid session" page.  They think, "This site is
> > > broken"
> > > and move on to the next link of their search result.
> > >
> > > Another approach to solving this is to block all the stateful pages in
> > my
> > > robots.txt file.  But how can I block these links in robots.txt since
> > they
> > > change per session?  Is there any way to know what the url will
> resolve
> > to
> > > when googlebot tries to visit my site so I can tell it to disallow:
> > > /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?
> > >
> > >
> > > > -----Original Message-----
> > > > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > > Sent: Thursday, April 03, 2008 5:45 PM
> > > > To: users@wicket.apache.org
> > > > Subject: Re: Removing the jsessionid for SEO
> > > >
> > > > On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan <dk...@citizenhawk.com>
> > > > wrote:
> > > > > Ok I did a little preliminary research on this.  Right now
> > > > PagingNavigator
> > > > >  uses PagingNavigationLink's to represent its page.  This extends
> > > Link.
> > > > I'm
> > > > >  supposed to override PagingNavigator's newPagingNavigationLink()
> > > method
> > > > to
> > > > >  accomplish this (I think) but past that, this isn't very
> > > > straightforward to
> > > > >  me.
> > > > >
> > > > >  Do I need to create my own BookmarkablePagingNavigationLink?
>  When
> > I
> > > > do...
> > > > >  what next?  I really don't know enough about
> bookmarkablePageLinks
> > to
> > > > do
> > > > >  this.  Right now, all the magic happens inside
> > PagingNavigationLink.
> > > > Won't
> > > > >  I have to move all that logic into the WebPage that I'm passing
> > into
> > > > >  BookmarkablePagingNavigationLink?  This seems like a lot of work.
> > Am
> > > I
> > > > >  missing something critical?
> > > >
> > > > no, you are not missing anything. you see, when you go stateless,
> like
> > > > what you want, then you have to recreate all the magic stuff that
> > > > makes stateful links Just Work. Without state you are back to the
> > > > servlet/mvc programming model: you have to encode the state that you
> > > > want into the link, then on the trip back decode it, recreate
> > > > something from it, and then apply that something onto the
> components.
> > > > This is the crapwork that wicket does for you usually.
> > > >
> > > > -igor
> > > >
> > > >
> > > > >
> > > > >
> > > > >  > -----Original Message-----
> > > > >  > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > > >
> > > > >
> > > > > > Sent: Thursday, April 03, 2008 3:40 PM
> > > > >  > To: users@wicket.apache.org
> > > > >  > Subject: Re: Removing the jsessionid for SEO
> > > > >  >
> > > > >  > you subclass the pagenavigator and make it use bookmarkable
> links
> > > > >  > also. it has factory methods for all the links it uses.
> > > > >  >
> > > > >  > -igor
> > > > >  >
> > > > >  >
> > > > >  > On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan
> > <dkaplan@citizenhawk.com
> > > >
> > > > >  > wrote:
> > > > >  > > I wasn't talking about the links that are on the list (I
> > already
> > > > make
> > > > >  > those
> > > > >  > >  bookmarkable).  I'm talking about the links that the
> Navigator
> > > > >  > generates.
> > > > >  > >  How do I make it so page 2 is bookmarkable?
> > > > >  > >
> > > > >  > >
> > > > >  > >  -----Original Message-----
> > > > >  > >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > > >  > >
> > > > >  > >
> > > > >  > > Sent: Thursday, April 03, 2008 3:30 PM
> > > > >  > >  To: users@wicket.apache.org
> > > > >  > >  Subject: Re: Removing the jsessionid for SEO
> > > > >  > >
> > > > >  > >  instead of
> > > > >  > >
> > > > >  > >  item.add(new link("foo") { onclick() });
> > > > >  > >
> > > > >  > >  do
> > > > >  > >
> > > > >  > >  item.add(new bookmarkablepagelink("foo", page.class));
> > > > >  > >
> > > > >  > >  -igor
> > > > >  > >
> > > > >  > >
> > > > >  > >  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
> > > > <dk...@citizenhawk.com>
> > > > >  > wrote:
> > > > >  > >  > How?  I asked how to do it before and nobody suggested
> this
> > as
> > > a
> > > > >  > >  >  possibility.
> > > > >  > >  >
> > > > >  > >  >
> > > > >  > >  >
> > > > >  > >  >  -----Original Message-----
> > > > >  > >  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > > >  > >  >  Sent: Thursday, April 03, 2008 3:26 PM
> > > > >  > >  >  To: users@wicket.apache.org
> > > > >  > >  >  Subject: Re: Removing the jsessionid for SEO
> > > > >  > >  >
> > > > >  > >  >  dataview can work in a stateless mode, just use
> > bookmarkable
> > > > links
> > > > >  > inside
> > > > >  > >  it
> > > > >  > >  >
> > > > >  > >  >  -igor
> > > > >  > >  >
> > > > >  > >  >
> > > > >  > >  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
> > > > <dk...@citizenhawk.com>
> > > > >  > >  wrote:
> > > > >  > >  >  > Regardless, at the very least this makes your site look
> > > > "weird"
> > > > >  > and
> > > > >  > >  >  >  unprofessional when google puts a jsessionid on your
> > url.
> > > > There
> > > > >  > has
> > > > >  > >  got
> > > > >  > >  >  to
> > > > >  > >  >  >  be some negative effect when google visits it the
> second
> > > > time and
> > > > >  > the
> > > > >  > >  >  >  jsessionid has changed but it sees the same exact
> > content.
> > > > Worst
> > > > >  > >  case,
> > > > >  > >  >  >  it'll think you're trying to trick it.
> > > > >  > >  >  >
> > > > >  > >  >  >  About those 404s, I'm finding that with the fix I
> > provided
> > > I
> > > > >  > don't get
> > > > >  > >  a
> > > > >  > >  >  >  404, but the links refresh the page I'm already on.
>  IE:
> > > If
> > > > I'm
> > > > >  > on A,
> > > > >  > >  and
> > > > >  > >  >  a
> > > > >  > >  >  >  link to B is non-bookmarkable, clicking B refreshes A.
> > > > >  > >  >  >
> > > > >  > >  >  >  This issue is very disconcerting to me.  It's one of
> the
> > > > reasons
> > > > >  > I
> > > > >  > >  wish
> > > > >  > >  >  that
> > > > >  > >  >  >  DataView had an option to work in stateless mode.
>  Cause
> > > if
> > > > I ban
> > > > >  > >  cookies
> > > > >  > >  >  >  and Googlebot visits my home page (with a navigator on
> > > it),
> > > > it'll
> > > > >  > try
> > > > >  > >  to
> > > > >  > >  >  >  follow all these page links and from its perspective,
> > they
> > > > all
> > > > >  > lead
> > > > >  > >  back
> > > > >  > >  >  to
> > > > >  > >  >  >  the first page.  So it's kinda a catch-22: Include the
> > > > jsessionid
> > > > >  > in
> > > > >  > >  the
> > > > >  > >  >  >  urls and get bad SEO or remove the jsessionid and get
> > bad
> > > > SEO :(
> > > > >  > >  >  >
> > > > >  > >  >  >  Perhaps the answer to my prayers is a combination of
> the
> > > > >  > >  noindex/nofollow
> > > > >  > >  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a
> > > > nofollow
> > > > >  > on the
> > > > >  > >  >  home
> > > > >  > >  >  >  page (so googlebot doesn't try to follow the navigator
> > > > links) and
> > > > >  > use
> > > > >  > >  the
> > > > >  > >  >  >  sitemap.xml to point out the individual pages I want
> it
> > to
> > > > index.
> > > > >  > >  >  >
> > > > >  > >  >  >
> > > > >  > >  >  >  Matej: can you go into more detail about your hybrid
> URL
> > > > >  > statement?
> > > > >  > >  >  Won't
> > > > >  > >  >  >  google index, for example, /home and /home.1 if I use
> > it?
> > > > When
> > > > >  > it
> > > > >  > >  >  follows
> > > > >  > >  >  >  the next page, won't the url become /home.1.2 or
> > > something?
> > > > That
> > > > >  > .2
> > > > >  > >  is a
> > > > >  > >  >  >  page version: If google indexes that and tries to
> visit
> > it
> > > > again,
> > > > >  > >  won't
> > > > >  > >  >  it
> > > > >  > >  >  >  report about an invalid session?
> > > > >  > >  >  >
> > > > >  > >  >  >
> > > > >  > >  >  >
> > > > >  > >  >  >  -----Original Message-----
> > > > >  > >  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
> > > > >  > >  >  >  Sent: Thursday, April 03, 2008 11:10 AM
> > > > >  > >  >  >  To: users@wicket.apache.org
> > > > >  > >  >  >  Subject: Re: Removing the jsessionid for SEO
> > > > >  > >  >  >
> > > > >  > >  >  >  On the other hand, crawling non-bookmarkable pages is
> > not
> > > > very
> > > > >  > useful
> > > > >  > >  >  >  anyway, since ?wicket:interface url will always get
> page
> > > > expired
> > > > >  > when
> > > > >  > >  >  >  you click on the result.
> > > > >  > >  >  >
> > > > >  > >  >  >  However, preserving session makes lot of sense with
> > hybrid
> > > > url.
> > > > >  > Google
> > > > >  > >  >  >  remembers the original url (without page instance)
> while
> > > > indexing
> > > > >  > the
> > > > >  > >  >  >  real page (after redirect).
> > > > >  > >  >  >
> > > > >  > >  >  >  I think though that the crawler is quite advanced. I'm
> > > would
> > > > >  > think  it
> > > > >  > >  >  >  supports cookies (at least JSESSIONID) as well as it
> > > > evaluates
> > > > >  > some of
> > > > >  > >  >  >  the javascript on page.
> > > > >  > >  >  >
> > > > >  > >  >  >  -Matej
> > > > >  > >  >  >
> > > > >  > >  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
> > > > >  > >  <ig...@gmail.com>
> > > > >  > >  >  >  wrote:
> > > > >  > >  >  >  > right. if you strip sessionid then all your
> > > > nonbookmarkable
> > > > >  > urls
> > > > >  > >  will
> > > > >  > >  >  >  >  resolve to a 404. that will probably drop your rank
> a
> > > lot
> > > > >  > >  faster....
> > > > >  > >  >  >  >
> > > > >  > >  >  >  >  -igor
> > > > >  > >  >  >  >
> > > > >  > >  >  >  >
> > > > >  > >  >  >  >
> > > > >  > >  >  >  >
> > > > >  > >  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
> > > > >  > >  <jc...@gmail.com>
> > > > >  > >  >  >  wrote:
> > > > >  > >  >  >  >  > the problem is that then you have to have all
> > > stateless
> > > > >  > pages.
> > > > >  > >  Else
> > > > >  > >  >  >  google
> > > > >  > >  >  >  >  >  can't crawl your website.
> > > > >  > >  >  >  >  >  And if that is the case then you could be
> > completely
> > > > >  > stateless
> > > > >  > >  so
> > > > >  > >  >  you
> > > > >  > >  >  >  dont
> > > > >  > >  >  >  >  >  have a session (id) to worry about at all.
> > > > >  > >  >  >  >  >
> > > > >  > >  >  >  >  >  johan
> > > > >  > >  >  >  >  >
> > > > >  > >  >  >  >  >
> > > > >  > >  >  >  >  >
> > > > >  > >  >  >  >  >
> > > > >  > >  >  >  >  >
> > > > >  > >  >  >  >  >
> > > > >  > >  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini,
> > Larry
> > > <
> > > > >  > >  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
> > > > >  > >  >  >  >  >
> > > > >  > >  >  >  >  >  > When Google asks to not have special treatment
> > for
> > > > their
> > > > >  > bot,
> > > > >  > >  >  they
> > > > >  > >  >  >  are
> > > > >  > >  >  >  >  >  > referring to content more than anything.
> > Regarding
> > > > the
> > > > >  > session
> > > > >  > >  id
> > > > >  > >  >  >  being
> > > > >  > >  >  >  >  >  > coded in the URL, see the Technical guidelines
> > > > section of
> > > > >  > >  >  Google's
> > > > >  > >  >  >  >  >  > Webmaster Guidelines -
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >
> > > > >  > >
> > > > >  >
> > > >
> > http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
> > > > >  > >  >  >  >  >  > gn
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >  >  >  > It specifically recommends "allow(ing) search
> > bots
> > > > to
> > > > >  > crawl
> > > > >  > >  your
> > > > >  > >  >  >  sites
> > > > >  > >  >  >  >  >  > without session IDs or arguments that track
> > their
> > > > path
> > > > >  > through
> > > > >  > >  >  the
> > > > >  > >  >  >  >  >  > site."
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >  >  >  > -----Original Message-----
> > > > >  > >  >  >  >  >  > From: Johan Compagner
> > [mailto:jcompagner@gmail.com
> > > ]
> > > > >  > >  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
> > > > >  > >  >  >  >  >  > To: users@wicket.apache.org
> > > > >  > >  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >  >  >  > isnt google always saying that you shouldn't
> > alter
> > > > >  > behavior of
> > > > >  > >  >  your
> > > > >  > >  >  >  site
> > > > >  > >  >  >  >  >  > depending of it is there bot or not?
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
> > > > >  > <a_...@gazeta.pl>
> > > > >  > >  >  >  wrote:
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > > Hi!
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > > igor.vaynberg wrote:
> > > > >  > >  >  >  >  >  > > >
> > > > >  > >  >  >  >  >  > > > also by doing what you have done users
> with
> > > > cookies
> > > > >  > >  disabled
> > > > >  > >  >  >  wont be
> > > > >  > >  >  >  >  >  > > > able to use your site...
> > > > >  > >  >  >  >  >  > > >
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > > In my opinion session id is a problem.
> Google
> > > > index the
> > > > >  > same
> > > > >  > >  >  page
> > > > >  > >  >  >  >  >  > again
> > > > >  > >  >  >  >  >  > > and
> > > > >  > >  >  >  >  >  > > again.
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > > About the users without cookies we can do
> like
> > > > this:
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >        static class Unbuffered extends
> > > WebResponse
> > > > {
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >                 private static final
> String[]
> > > > botAgents
> > > > >  > = {
> > > > >  > >  >  >  >  >  > "onetszukaj",
> > > > >  > >  >  >  >  >  > > "googlebot",
> > > > >  > >  >  >  >  >  > > "appie", "architext",
> > > > >  > >  >  >  >  >  > >                        "jeeves", "bjaaland",
> > > > "ferret",
> > > > >  > >  >  "gulliver",
> > > > >  > >  >  >  >  >  > > "harvest", "htdig",
> > > > >  > >  >  >  >  >  > >                        "linkwalker",
> "lycos_",
> > > > "moget",
> > > > >  > >  >  >  >  >  > "muscatferret",
> > > > >  > >  >  >  >  >  > > "myweb", "nomad",
> > > > >  > >  >  >  >  >  > > "scooter",
> > > > >  > >  >  >  >  >  > >
> > "yahoo!\\sslurp\\schina",
> > > > >  > "slurp",
> > > > >  > >  >  >  "weblayers",
> > > > >  > >  >  >  >  >  > > "antibot", "bruinbot",
> > > > >  > >  >  >  >  >  > > "digout4u",
> > > > >  > >  >  >  >  >  > >                        "echo!",
> "ia_archiver",
> > > > >  > "jennybot",
> > > > >  > >  >  >  "mercator",
> > > > >  > >  >  >  >  >  > > "netcraft", "msnbot",
> > > > >  > >  >  >  >  >  > > "petersnews",
> > > > >  > >  >  >  >  >  > >                        "unlost_web_crawler",
> > > > "voila",
> > > > >  > >  >  "webbase",
> > > > >  > >  >  >  >  >  > > "webcollage", "cfetch",
> > > > >  > >  >  >  >  >  > > "zyborg",
> > > > >  > >  >  >  >  >  > >                        "wisenutbot",
> "robot",
> > > > "crawl",
> > > > >  > >  "spider"
> > > > >  > >  >  };
> > > > >  > >  >  >  /*
> > > > >  > >  >  >  >  >  > and
> > > > >  > >  >  >  >  >  > > so on... */
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >                public Unbuffered(final
> > > > >  > HttpServletResponse
> > > > >  > >  res)
> > > > >  > >  >  {
> > > > >  > >  >  >  >  >  > >            super(res);
> > > > >  > >  >  >  >  >  > >         }
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >        @Override
> > > > >  > >  >  >  >  >  > >        public CharSequence encodeURL(final
> > > > CharSequence
> > > > >  > url)
> > > > >  > >  {
> > > > >  > >  >  >  >  >  > >             return isAgent() ? url :
> > > > >  > super.encodeURL(url);
> > > > >  > >  >  >  >  >  > >        }
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >                private static boolean
> > isAgent()
> > > {
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >                        String agent =
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >
> > > > >  > >
> > > > >  >
> > > >
> > ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
> > > > >  > >  >  >  >  >  > tHeader("User-Agent");
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >                        for(String bot :
> > > botAgents)
> > > > {
> > > > >  > >  >  >  >  >  > >                                if
> > > > >  > >  >  >  (agent.toLowerCase().indexOf(bot) !=
> > > > >  > >  >  >  >  >  > -1)
> > > > >  > >  >  >  >  >  > > {
> > > > >  > >  >  >  >  >  > >
>  return
> > > > true;
> > > > >  > >  >  >  >  >  > >                                }
> > > > >  > >  >  >  >  >  > >                        }
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >                        return false;
> > > > >  > >  >  >  >  >  > >                }
> > > > >  > >  >  >  >  >  > >    }
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > > I didn't test this code but I do similar
> thing
> > > in
> > > > my
> > > > >  > old
> > > > >  > >  >  >  application
> > > > >  > >  >  >  >  >  > in
> > > > >  > >  >  >  >  >  > > Spring and it works.
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > > Take care,
> > > > >  > >  >  >  >  >  > > Artur
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > > --
> > > > >  > >  >  >  >  >  > > View this message in context:
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >
> > > > >  > >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> > > > >  > tp16464534p1646739
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >
> > > > >  > >  >
> > > > >  > >  6.html<
> http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> > > > >  > tp16464534p1646
> > > > >  > >  >  >  7396.html>
> > > > >  > >  >  >  >  >
> > > > >  > >  >  >  >  >
> > > > >  > >  >  >  >  > > > Sent from the Wicket - User mailing list
> > archive
> > > at
> > > > >  > >  Nabble.com.
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >
> > >  ------------------------------------------------------------
> > > > -----
> > > > >  > ----
> > > > >  > >  >  >  >  >  > > To unsubscribe, e-mail: users-
> > > > >  > unsubscribe@wicket.apache.org
> > > > >  > >  >  >  >  >  > > For additional commands, e-mail:
> > > > >  > >  users-help@wicket.apache.org
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  > >
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >  >  >  > ______________
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >  >  >  > The information contained in this message is
> > > > proprietary
> > > > >  > >  and/or
> > > > >  > >  >  >  >  >  > confidential. If you are not the
> > > > >  > >  >  >  >  >  > intended recipient, please: (i) delete the
> > message
> > > > and
> > > > >  > all
> > > > >  > >  >  copies;
> > > > >  > >  >  >  (ii) do
> > > > >  > >  >  >  >  >  > not disclose,
> > > > >  > >  >  >  >  >  > distribute or use the message in any manner;
> and
> > > > (iii)
> > > > >  > notify
> > > > >  > >  the
> > > > >  > >  >  >  sender
> > > > >  > >  >  >  >  >  > immediately. In addition,
> > > > >  > >  >  >  >  >  > please be aware that any message addressed to
> > our
> > > > domain
> > > > >  > is
> > > > >  > >  >  subject
> > > > >  > >  >  >  to
> > > > >  > >  >  >  >  >  > archiving and review by
> > > > >  > >  >  >  >  >  > persons other than the intended recipient.
> Thank
> > > > you.
> > > > >  > >  >  >  >  >  > _____________
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >
> > >  ------------------------------------------------------------
> > > > -----
> > > > >  > ----
> > > > >  > >  >  >  >  >  > To unsubscribe, e-mail: users-
> > > > >  > unsubscribe@wicket.apache.org
> > > > >  > >  >  >  >  >  > For additional commands, e-mail: users-
> > > > >  > help@wicket.apache.org
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >  >  >  >
> > > > >  > >  >  >  >  >
> > > > >  > >  >  >  >
> > > > >  > >  >  >  >
> > > > >  > >
> > >  ------------------------------------------------------------------
> > > > ---
> > > > >  > >  >  >  >  To unsubscribe, e-mail: users-
> > > > unsubscribe@wicket.apache.org
> > > > >  > >  >  >  >  For additional commands, e-mail: users-
> > > > help@wicket.apache.org
> > > > >  > >  >  >  >
> > > > >  > >  >  >  >
> > > > >  > >  >  >
> > > > >  > >  >  >
> > > > >  > >  >  >
> > > > >  > >  >  >  --
> > > > >  > >  >  >  Resizable and reorderable grid components.
> > > > >  > >  >  >  http://www.inmethod.com
> > > > >  > >  >  >
> > > > >  > >  >  >
> > >  ------------------------------------------------------------
> > > > -----
> > > > >  > ----
> > > > >  > >  >  >  To unsubscribe, e-mail:
> > > users-unsubscribe@wicket.apache.org
> > > > >  > >  >  >  For additional commands, e-mail: users-
> > > > help@wicket.apache.org
> > > > >  > >  >  >
> > > > >  > >  >  >
> > > > >  > >  >  >
> > >  ------------------------------------------------------------
> > > > -----
> > > > >  > ----
> > > > >  > >  >  >  To unsubscribe, e-mail:
> > > users-unsubscribe@wicket.apache.org
> > > > >  > >  >  >  For additional commands, e-mail: users-
> > > > help@wicket.apache.org
> > > > >  > >  >  >
> > > > >  > >  >  >
> > > > >  > >  >
> > > > >  > >  >
> > >  ---------------------------------------------------------------
> > > > -----
> > > > >  > -
> > > > >  > >  >  To unsubscribe, e-mail:
> users-unsubscribe@wicket.apache.org
> > > > >  > >  >  For additional commands, e-mail: users-
> > help@wicket.apache.org
> > > > >  > >  >
> > > > >  > >  >
> > > > >  > >  >
> > >  ---------------------------------------------------------------
> > > > -----
> > > > >  > -
> > > > >  > >  >  To unsubscribe, e-mail:
> users-unsubscribe@wicket.apache.org
> > > > >  > >  >  For additional commands, e-mail: users-
> > help@wicket.apache.org
> > > > >  > >  >
> > > > >  > >  >
> > > > >  > >
> > > > >  > >
> > >  ------------------------------------------------------------------
> > > > ---
> > > > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > > >  > >  For additional commands, e-mail:
> users-help@wicket.apache.org
> > > > >  > >
> > > > >  > >
> > > > >  > >
> > >  ------------------------------------------------------------------
> > > > ---
> > > > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > > >  > >  For additional commands, e-mail:
> users-help@wicket.apache.org
> > > > >  > >
> > > > >  > >
> > > > >  >
> > > > >  >
> > > ---------------------------------------------------------------------
> > > > >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > > >  > For additional commands, e-mail: users-help@wicket.apache.org
> > > > >
> > > > >
> > > > >
>  -------------------------------------------------------------------
> > --
> > > > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > > >  For additional commands, e-mail: users-help@wicket.apache.org
> > > > >
> > > > >
> > > >
> > > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > > For additional commands, e-mail: users-help@wicket.apache.org
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > For additional commands, e-mail: users-help@wicket.apache.org
> > >
> > >
>
>
>

RE: Removing the jsessionid for SEO

Posted by Dan Kaplan <dk...@citizenhawk.com>.
That is helpful, but: "This is an extension of the standard, so not all bots
may follow it."  I wonder if the major ones do...

> -----Original Message-----
> From: jelevy@gmail.com [mailto:jelevy@gmail.com] On Behalf Of Jeremy Levy
> Sent: Thursday, April 03, 2008 6:16 PM
> To: users@wicket.apache.org
> Subject: Re: Removing the jsessionid for SEO
> 
> We have a similar issue, and are trying the following out right now..
> 
> http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40367
> 
> User-agent: *
> Disallow: /*?
> 
> 
> 
> 
> On Thu, Apr 3, 2008 at 9:09 PM, Dan Kaplan <dk...@citizenhawk.com>
> wrote:
> 
> > Ok, at least I'm not missing anything.  I understand the benefits it's
> > providing with its stateful framework.  Developing a site with Wicket is
> > easier than with any other framework I've used.  But this statefulness,
> > which makes websites so easy to develop, seems to be counter productive
> to
> > SEO:
> >
> > GoogleBot will follow and index stateful links.  Worst case scenario,
> > these
> > actually become visible to google users and when they click the link it
> > takes them to an "invalid session" page.  They think, "This site is
> > broken"
> > and move on to the next link of their search result.
> >
> > Another approach to solving this is to block all the stateful pages in
> my
> > robots.txt file.  But how can I block these links in robots.txt since
> they
> > change per session?  Is there any way to know what the url will resolve
> to
> > when googlebot tries to visit my site so I can tell it to disallow:
> > /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?
> >
> >
> > > -----Original Message-----
> > > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > Sent: Thursday, April 03, 2008 5:45 PM
> > > To: users@wicket.apache.org
> > > Subject: Re: Removing the jsessionid for SEO
> > >
> > > On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan <dk...@citizenhawk.com>
> > > wrote:
> > > > Ok I did a little preliminary research on this.  Right now
> > > PagingNavigator
> > > >  uses PagingNavigationLink's to represent its page.  This extends
> > Link.
> > > I'm
> > > >  supposed to override PagingNavigator's newPagingNavigationLink()
> > method
> > > to
> > > >  accomplish this (I think) but past that, this isn't very
> > > straightforward to
> > > >  me.
> > > >
> > > >  Do I need to create my own BookmarkablePagingNavigationLink?  When
> I
> > > do...
> > > >  what next?  I really don't know enough about bookmarkablePageLinks
> to
> > > do
> > > >  this.  Right now, all the magic happens inside
> PagingNavigationLink.
> > > Won't
> > > >  I have to move all that logic into the WebPage that I'm passing
> into
> > > >  BookmarkablePagingNavigationLink?  This seems like a lot of work.
> Am
> > I
> > > >  missing something critical?
> > >
> > > no, you are not missing anything. you see, when you go stateless, like
> > > what you want, then you have to recreate all the magic stuff that
> > > makes stateful links Just Work. Without state you are back to the
> > > servlet/mvc programming model: you have to encode the state that you
> > > want into the link, then on the trip back decode it, recreate
> > > something from it, and then apply that something onto the components.
> > > This is the crapwork that wicket does for you usually.
> > >
> > > -igor
> > >
> > >
> > > >
> > > >
> > > >  > -----Original Message-----
> > > >  > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > >
> > > >
> > > > > Sent: Thursday, April 03, 2008 3:40 PM
> > > >  > To: users@wicket.apache.org
> > > >  > Subject: Re: Removing the jsessionid for SEO
> > > >  >
> > > >  > you subclass the pagenavigator and make it use bookmarkable links
> > > >  > also. it has factory methods for all the links it uses.
> > > >  >
> > > >  > -igor
> > > >  >
> > > >  >
> > > >  > On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan
> <dkaplan@citizenhawk.com
> > >
> > > >  > wrote:
> > > >  > > I wasn't talking about the links that are on the list (I
> already
> > > make
> > > >  > those
> > > >  > >  bookmarkable).  I'm talking about the links that the Navigator
> > > >  > generates.
> > > >  > >  How do I make it so page 2 is bookmarkable?
> > > >  > >
> > > >  > >
> > > >  > >  -----Original Message-----
> > > >  > >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > >  > >
> > > >  > >
> > > >  > > Sent: Thursday, April 03, 2008 3:30 PM
> > > >  > >  To: users@wicket.apache.org
> > > >  > >  Subject: Re: Removing the jsessionid for SEO
> > > >  > >
> > > >  > >  instead of
> > > >  > >
> > > >  > >  item.add(new link("foo") { onclick() });
> > > >  > >
> > > >  > >  do
> > > >  > >
> > > >  > >  item.add(new bookmarkablepagelink("foo", page.class));
> > > >  > >
> > > >  > >  -igor
> > > >  > >
> > > >  > >
> > > >  > >  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
> > > <dk...@citizenhawk.com>
> > > >  > wrote:
> > > >  > >  > How?  I asked how to do it before and nobody suggested this
> as
> > a
> > > >  > >  >  possibility.
> > > >  > >  >
> > > >  > >  >
> > > >  > >  >
> > > >  > >  >  -----Original Message-----
> > > >  > >  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > >  > >  >  Sent: Thursday, April 03, 2008 3:26 PM
> > > >  > >  >  To: users@wicket.apache.org
> > > >  > >  >  Subject: Re: Removing the jsessionid for SEO
> > > >  > >  >
> > > >  > >  >  dataview can work in a stateless mode, just use
> bookmarkable
> > > links
> > > >  > inside
> > > >  > >  it
> > > >  > >  >
> > > >  > >  >  -igor
> > > >  > >  >
> > > >  > >  >
> > > >  > >  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
> > > <dk...@citizenhawk.com>
> > > >  > >  wrote:
> > > >  > >  >  > Regardless, at the very least this makes your site look
> > > "weird"
> > > >  > and
> > > >  > >  >  >  unprofessional when google puts a jsessionid on your
> url.
> > > There
> > > >  > has
> > > >  > >  got
> > > >  > >  >  to
> > > >  > >  >  >  be some negative effect when google visits it the second
> > > time and
> > > >  > the
> > > >  > >  >  >  jsessionid has changed but it sees the same exact
> content.
> > > Worst
> > > >  > >  case,
> > > >  > >  >  >  it'll think you're trying to trick it.
> > > >  > >  >  >
> > > >  > >  >  >  About those 404s, I'm finding that with the fix I
> provided
> > I
> > > >  > don't get
> > > >  > >  a
> > > >  > >  >  >  404, but the links refresh the page I'm already on.  IE:
> > If
> > > I'm
> > > >  > on A,
> > > >  > >  and
> > > >  > >  >  a
> > > >  > >  >  >  link to B is non-bookmarkable, clicking B refreshes A.
> > > >  > >  >  >
> > > >  > >  >  >  This issue is very disconcerting to me.  It's one of the
> > > reasons
> > > >  > I
> > > >  > >  wish
> > > >  > >  >  that
> > > >  > >  >  >  DataView had an option to work in stateless mode.  Cause
> > if
> > > I ban
> > > >  > >  cookies
> > > >  > >  >  >  and Googlebot visits my home page (with a navigator on
> > it),
> > > it'll
> > > >  > try
> > > >  > >  to
> > > >  > >  >  >  follow all these page links and from its perspective,
> they
> > > all
> > > >  > lead
> > > >  > >  back
> > > >  > >  >  to
> > > >  > >  >  >  the first page.  So it's kinda a catch-22: Include the
> > > jsessionid
> > > >  > in
> > > >  > >  the
> > > >  > >  >  >  urls and get bad SEO or remove the jsessionid and get
> bad
> > > SEO :(
> > > >  > >  >  >
> > > >  > >  >  >  Perhaps the answer to my prayers is a combination of the
> > > >  > >  noindex/nofollow
> > > >  > >  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a
> > > nofollow
> > > >  > on the
> > > >  > >  >  home
> > > >  > >  >  >  page (so googlebot doesn't try to follow the navigator
> > > links) and
> > > >  > use
> > > >  > >  the
> > > >  > >  >  >  sitemap.xml to point out the individual pages I want it
> to
> > > index.
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >  Matej: can you go into more detail about your hybrid URL
> > > >  > statement?
> > > >  > >  >  Won't
> > > >  > >  >  >  google index, for example, /home and /home.1 if I use
> it?
> > > When
> > > >  > it
> > > >  > >  >  follows
> > > >  > >  >  >  the next page, won't the url become /home.1.2 or
> > something?
> > > That
> > > >  > .2
> > > >  > >  is a
> > > >  > >  >  >  page version: If google indexes that and tries to visit
> it
> > > again,
> > > >  > >  won't
> > > >  > >  >  it
> > > >  > >  >  >  report about an invalid session?
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >  -----Original Message-----
> > > >  > >  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
> > > >  > >  >  >  Sent: Thursday, April 03, 2008 11:10 AM
> > > >  > >  >  >  To: users@wicket.apache.org
> > > >  > >  >  >  Subject: Re: Removing the jsessionid for SEO
> > > >  > >  >  >
> > > >  > >  >  >  On the other hand, crawling non-bookmarkable pages is
> not
> > > very
> > > >  > useful
> > > >  > >  >  >  anyway, since ?wicket:interface url will always get page
> > > expired
> > > >  > when
> > > >  > >  >  >  you click on the result.
> > > >  > >  >  >
> > > >  > >  >  >  However, preserving session makes lot of sense with
> hybrid
> > > url.
> > > >  > Google
> > > >  > >  >  >  remembers the original url (without page instance) while
> > > indexing
> > > >  > the
> > > >  > >  >  >  real page (after redirect).
> > > >  > >  >  >
> > > >  > >  >  >  I think though that the crawler is quite advanced. I'm
> > would
> > > >  > think  it
> > > >  > >  >  >  supports cookies (at least JSESSIONID) as well as it
> > > evaluates
> > > >  > some of
> > > >  > >  >  >  the javascript on page.
> > > >  > >  >  >
> > > >  > >  >  >  -Matej
> > > >  > >  >  >
> > > >  > >  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
> > > >  > >  <ig...@gmail.com>
> > > >  > >  >  >  wrote:
> > > >  > >  >  >  > right. if you strip sessionid then all your
> > > nonbookmarkable
> > > >  > urls
> > > >  > >  will
> > > >  > >  >  >  >  resolve to a 404. that will probably drop your rank a
> > lot
> > > >  > >  faster....
> > > >  > >  >  >  >
> > > >  > >  >  >  >  -igor
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
> > > >  > >  <jc...@gmail.com>
> > > >  > >  >  >  wrote:
> > > >  > >  >  >  >  > the problem is that then you have to have all
> > stateless
> > > >  > pages.
> > > >  > >  Else
> > > >  > >  >  >  google
> > > >  > >  >  >  >  >  can't crawl your website.
> > > >  > >  >  >  >  >  And if that is the case then you could be
> completely
> > > >  > stateless
> > > >  > >  so
> > > >  > >  >  you
> > > >  > >  >  >  dont
> > > >  > >  >  >  >  >  have a session (id) to worry about at all.
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >  johan
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini,
> Larry
> > <
> > > >  > >  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >  > When Google asks to not have special treatment
> for
> > > their
> > > >  > bot,
> > > >  > >  >  they
> > > >  > >  >  >  are
> > > >  > >  >  >  >  >  > referring to content more than anything.
> Regarding
> > > the
> > > >  > session
> > > >  > >  id
> > > >  > >  >  >  being
> > > >  > >  >  >  >  >  > coded in the URL, see the Technical guidelines
> > > section of
> > > >  > >  >  Google's
> > > >  > >  >  >  >  >  > Webmaster Guidelines -
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> > > >  > >
> > > >  >
> > >
> http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
> > > >  > >  >  >  >  >  > gn
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > It specifically recommends "allow(ing) search
> bots
> > > to
> > > >  > crawl
> > > >  > >  your
> > > >  > >  >  >  sites
> > > >  > >  >  >  >  >  > without session IDs or arguments that track
> their
> > > path
> > > >  > through
> > > >  > >  >  the
> > > >  > >  >  >  >  >  > site."
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > -----Original Message-----
> > > >  > >  >  >  >  >  > From: Johan Compagner
> [mailto:jcompagner@gmail.com
> > ]
> > > >  > >  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
> > > >  > >  >  >  >  >  > To: users@wicket.apache.org
> > > >  > >  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > isnt google always saying that you shouldn't
> alter
> > > >  > behavior of
> > > >  > >  >  your
> > > >  > >  >  >  site
> > > >  > >  >  >  >  >  > depending of it is there bot or not?
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
> > > >  > <a_...@gazeta.pl>
> > > >  > >  >  >  wrote:
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > Hi!
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > igor.vaynberg wrote:
> > > >  > >  >  >  >  >  > > >
> > > >  > >  >  >  >  >  > > > also by doing what you have done users with
> > > cookies
> > > >  > >  disabled
> > > >  > >  >  >  wont be
> > > >  > >  >  >  >  >  > > > able to use your site...
> > > >  > >  >  >  >  >  > > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > In my opinion session id is a problem. Google
> > > index the
> > > >  > same
> > > >  > >  >  page
> > > >  > >  >  >  >  >  > again
> > > >  > >  >  >  >  >  > > and
> > > >  > >  >  >  >  >  > > again.
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > About the users without cookies we can do like
> > > this:
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >        static class Unbuffered extends
> > WebResponse
> > > {
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                 private static final String[]
> > > botAgents
> > > >  > = {
> > > >  > >  >  >  >  >  > "onetszukaj",
> > > >  > >  >  >  >  >  > > "googlebot",
> > > >  > >  >  >  >  >  > > "appie", "architext",
> > > >  > >  >  >  >  >  > >                        "jeeves", "bjaaland",
> > > "ferret",
> > > >  > >  >  "gulliver",
> > > >  > >  >  >  >  >  > > "harvest", "htdig",
> > > >  > >  >  >  >  >  > >                        "linkwalker", "lycos_",
> > > "moget",
> > > >  > >  >  >  >  >  > "muscatferret",
> > > >  > >  >  >  >  >  > > "myweb", "nomad",
> > > >  > >  >  >  >  >  > > "scooter",
> > > >  > >  >  >  >  >  > >
> "yahoo!\\sslurp\\schina",
> > > >  > "slurp",
> > > >  > >  >  >  "weblayers",
> > > >  > >  >  >  >  >  > > "antibot", "bruinbot",
> > > >  > >  >  >  >  >  > > "digout4u",
> > > >  > >  >  >  >  >  > >                        "echo!", "ia_archiver",
> > > >  > "jennybot",
> > > >  > >  >  >  "mercator",
> > > >  > >  >  >  >  >  > > "netcraft", "msnbot",
> > > >  > >  >  >  >  >  > > "petersnews",
> > > >  > >  >  >  >  >  > >                        "unlost_web_crawler",
> > > "voila",
> > > >  > >  >  "webbase",
> > > >  > >  >  >  >  >  > > "webcollage", "cfetch",
> > > >  > >  >  >  >  >  > > "zyborg",
> > > >  > >  >  >  >  >  > >                        "wisenutbot", "robot",
> > > "crawl",
> > > >  > >  "spider"
> > > >  > >  >  };
> > > >  > >  >  >  /*
> > > >  > >  >  >  >  >  > and
> > > >  > >  >  >  >  >  > > so on... */
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                public Unbuffered(final
> > > >  > HttpServletResponse
> > > >  > >  res)
> > > >  > >  >  {
> > > >  > >  >  >  >  >  > >            super(res);
> > > >  > >  >  >  >  >  > >         }
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >        @Override
> > > >  > >  >  >  >  >  > >        public CharSequence encodeURL(final
> > > CharSequence
> > > >  > url)
> > > >  > >  {
> > > >  > >  >  >  >  >  > >             return isAgent() ? url :
> > > >  > super.encodeURL(url);
> > > >  > >  >  >  >  >  > >        }
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                private static boolean
> isAgent()
> > {
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                        String agent =
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> > > >  > >
> > > >  >
> > >
> ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
> > > >  > >  >  >  >  >  > tHeader("User-Agent");
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                        for(String bot :
> > botAgents)
> > > {
> > > >  > >  >  >  >  >  > >                                if
> > > >  > >  >  >  (agent.toLowerCase().indexOf(bot) !=
> > > >  > >  >  >  >  >  > -1)
> > > >  > >  >  >  >  >  > > {
> > > >  > >  >  >  >  >  > >                                        return
> > > true;
> > > >  > >  >  >  >  >  > >                                }
> > > >  > >  >  >  >  >  > >                        }
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                        return false;
> > > >  > >  >  >  >  >  > >                }
> > > >  > >  >  >  >  >  > >    }
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > I didn't test this code but I do similar thing
> > in
> > > my
> > > >  > old
> > > >  > >  >  >  application
> > > >  > >  >  >  >  >  > in
> > > >  > >  >  >  >  >  > > Spring and it works.
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > Take care,
> > > >  > >  >  >  >  >  > > Artur
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > --
> > > >  > >  >  >  >  >  > > View this message in context:
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> > > >  > >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> > > >  > tp16464534p1646739
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> > > >  > >  >
> > > >  > >  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> > > >  > tp16464534p1646
> > > >  > >  >  >  7396.html>
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  > > > Sent from the Wicket - User mailing list
> archive
> > at
> > > >  > >  Nabble.com.
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >
> >  ------------------------------------------------------------
> > > -----
> > > >  > ----
> > > >  > >  >  >  >  >  > > To unsubscribe, e-mail: users-
> > > >  > unsubscribe@wicket.apache.org
> > > >  > >  >  >  >  >  > > For additional commands, e-mail:
> > > >  > >  users-help@wicket.apache.org
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > ______________
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > The information contained in this message is
> > > proprietary
> > > >  > >  and/or
> > > >  > >  >  >  >  >  > confidential. If you are not the
> > > >  > >  >  >  >  >  > intended recipient, please: (i) delete the
> message
> > > and
> > > >  > all
> > > >  > >  >  copies;
> > > >  > >  >  >  (ii) do
> > > >  > >  >  >  >  >  > not disclose,
> > > >  > >  >  >  >  >  > distribute or use the message in any manner; and
> > > (iii)
> > > >  > notify
> > > >  > >  the
> > > >  > >  >  >  sender
> > > >  > >  >  >  >  >  > immediately. In addition,
> > > >  > >  >  >  >  >  > please be aware that any message addressed to
> our
> > > domain
> > > >  > is
> > > >  > >  >  subject
> > > >  > >  >  >  to
> > > >  > >  >  >  >  >  > archiving and review by
> > > >  > >  >  >  >  >  > persons other than the intended recipient. Thank
> > > you.
> > > >  > >  >  >  >  >  > _____________
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> >  ------------------------------------------------------------
> > > -----
> > > >  > ----
> > > >  > >  >  >  >  >  > To unsubscribe, e-mail: users-
> > > >  > unsubscribe@wicket.apache.org
> > > >  > >  >  >  >  >  > For additional commands, e-mail: users-
> > > >  > help@wicket.apache.org
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >
> >  ------------------------------------------------------------------
> > > ---
> > > >  > >  >  >  >  To unsubscribe, e-mail: users-
> > > unsubscribe@wicket.apache.org
> > > >  > >  >  >  >  For additional commands, e-mail: users-
> > > help@wicket.apache.org
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >  --
> > > >  > >  >  >  Resizable and reorderable grid components.
> > > >  > >  >  >  http://www.inmethod.com
> > > >  > >  >  >
> > > >  > >  >  >
> >  ------------------------------------------------------------
> > > -----
> > > >  > ----
> > > >  > >  >  >  To unsubscribe, e-mail:
> > users-unsubscribe@wicket.apache.org
> > > >  > >  >  >  For additional commands, e-mail: users-
> > > help@wicket.apache.org
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >
> >  ------------------------------------------------------------
> > > -----
> > > >  > ----
> > > >  > >  >  >  To unsubscribe, e-mail:
> > users-unsubscribe@wicket.apache.org
> > > >  > >  >  >  For additional commands, e-mail: users-
> > > help@wicket.apache.org
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >
> > > >  > >  >
> >  ---------------------------------------------------------------
> > > -----
> > > >  > -
> > > >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > >  >  For additional commands, e-mail: users-
> help@wicket.apache.org
> > > >  > >  >
> > > >  > >  >
> > > >  > >  >
> >  ---------------------------------------------------------------
> > > -----
> > > >  > -
> > > >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > >  >  For additional commands, e-mail: users-
> help@wicket.apache.org
> > > >  > >  >
> > > >  > >  >
> > > >  > >
> > > >  > >
> >  ------------------------------------------------------------------
> > > ---
> > > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > >  For additional commands, e-mail: users-help@wicket.apache.org
> > > >  > >
> > > >  > >
> > > >  > >
> >  ------------------------------------------------------------------
> > > ---
> > > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > >  For additional commands, e-mail: users-help@wicket.apache.org
> > > >  > >
> > > >  > >
> > > >  >
> > > >  >
> > ---------------------------------------------------------------------
> > > >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > For additional commands, e-mail: users-help@wicket.apache.org
> > > >
> > > >
> > > >  -------------------------------------------------------------------
> --
> > > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  For additional commands, e-mail: users-help@wicket.apache.org
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > For additional commands, e-mail: users-help@wicket.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > For additional commands, e-mail: users-help@wicket.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


RE: Removing the jsessionid for SEO

Posted by Dan Kaplan <dk...@citizenhawk.com>.
Thanks,

That's kinda the route I've already taken.  On my site, www.startfound.com,
if you click on any company to see more details it goes to a bookmarkable
page.  Same with any tag.  Maybe if I've already got that much, I shouldn't
concern myself with the fact that page 2 of my list is not bookmarkable but
reachable by google bot.  Or maybe I should just add a noindex meta tag on
every page that's not page 1.  

It'd be kinda ridiculous to require login to see past page 1.  That may be
good for SEO but it'll drive people away.  

> -----Original Message-----
> From: jeremythomerson@gmail.com [mailto:jeremythomerson@gmail.com] On
> Behalf Of Jeremy Thomerson
> Sent: Thursday, April 03, 2008 10:00 PM
> To: users@wicket.apache.org
> Subject: Re: Removing the jsessionid for SEO
> 
> I've been building a community-driven hunting and fishing site in Texas
> for
> the past year and a half.  Since I have converted it to Wicket from
> ColdFusion, our search engine rankings have gone WAY UP.  That's right,
> we're on the first page for tons of searches.  Search for "texas hunting"
> -
> we're second under only the Texas Parks and Wildlife Association.
> 
> How?  With Wicket?  Yes - it requires a little more work.  What I do is
> that
> for any link that I want Google to be able to follow, I have a subclass of
> Link specific to that.  For instance, ViewThreadLink, which takes the ID
> for
> the link and a model (detachable) of the thread.  Then I mount an
> IRequestTargetUrlCodingStrategy for each big category of things in my
> webapp.  I've made several strategies that I use over and over, just
> giving
> them a different mount path and a different parameter to tell it what kind
> of article, etc, that it will match to.  This is made easier because over
> 75% of the objects in our site are all similar enough that the extend from
> a
> base class that provides the basic functionality for an article / thread /
> etc that has a title, text, pictures, comments, the standard stuff.
> 
> So, yes, it takes work.  But that's okay - SEO always takes work.  I also
> have given a lot of care to use good page titles, good semantic HTML and
> stuff things into the URL that don't have anything to do with locating the
> resource, but give the search engines a clue as to what the content is.
> 
> Yes, some pages end up with a jsessionid - and I don't like it (example:
> http://www.google.com/search?hl=en&client=firefox-a&rls=com.ubuntu%3Aen-
> US%3Aofficial&q=%22south+texas+management+buck%22&btnG=Search).
> But, most don't because almost all of my links are bookmarkable.  When the
> user clicks something that they can only do as a signed-in user, then it
> redirects them to the sign in page, they sign in, and are taken back to
> the
> page they were on.  Then they can pick up, and I don't worry about
> bookmarkable URLs for anything that requires user-authentication (wizards
> to
> post a new listing, story, admin links, etc).
> 
> Jeremy Thomerson
> TexasHuntFish.com
> 
> On Thu, Apr 3, 2008 at 8:09 PM, Dan Kaplan <dk...@citizenhawk.com>
> wrote:
> 
> > Ok, at least I'm not missing anything.  I understand the benefits it's
> > providing with its stateful framework.  Developing a site with Wicket is
> > easier than with any other framework I've used.  But this statefulness,
> > which makes websites so easy to develop, seems to be counter productive
> to
> > SEO:
> >
> > GoogleBot will follow and index stateful links.  Worst case scenario,
> > these
> > actually become visible to google users and when they click the link it
> > takes them to an "invalid session" page.  They think, "This site is
> > broken"
> > and move on to the next link of their search result.
> >
> > Another approach to solving this is to block all the stateful pages in
> my
> > robots.txt file.  But how can I block these links in robots.txt since
> they
> > change per session?  Is there any way to know what the url will resolve
> to
> > when googlebot tries to visit my site so I can tell it to disallow:
> > /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?
> >
> >
> > > -----Original Message-----
> > > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > Sent: Thursday, April 03, 2008 5:45 PM
> > > To: users@wicket.apache.org
> > > Subject: Re: Removing the jsessionid for SEO
> > >
> > > On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan <dk...@citizenhawk.com>
> > > wrote:
> > > > Ok I did a little preliminary research on this.  Right now
> > > PagingNavigator
> > > >  uses PagingNavigationLink's to represent its page.  This extends
> > Link.
> > > I'm
> > > >  supposed to override PagingNavigator's newPagingNavigationLink()
> > method
> > > to
> > > >  accomplish this (I think) but past that, this isn't very
> > > straightforward to
> > > >  me.
> > > >
> > > >  Do I need to create my own BookmarkablePagingNavigationLink?  When
> I
> > > do...
> > > >  what next?  I really don't know enough about bookmarkablePageLinks
> to
> > > do
> > > >  this.  Right now, all the magic happens inside
> PagingNavigationLink.
> > > Won't
> > > >  I have to move all that logic into the WebPage that I'm passing
> into
> > > >  BookmarkablePagingNavigationLink?  This seems like a lot of work.
> Am
> > I
> > > >  missing something critical?
> > >
> > > no, you are not missing anything. you see, when you go stateless, like
> > > what you want, then you have to recreate all the magic stuff that
> > > makes stateful links Just Work. Without state you are back to the
> > > servlet/mvc programming model: you have to encode the state that you
> > > want into the link, then on the trip back decode it, recreate
> > > something from it, and then apply that something onto the components.
> > > This is the crapwork that wicket does for you usually.
> > >
> > > -igor
> > >
> > >
> > > >
> > > >
> > > >  > -----Original Message-----
> > > >  > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > >
> > > >
> > > > > Sent: Thursday, April 03, 2008 3:40 PM
> > > >  > To: users@wicket.apache.org
> > > >  > Subject: Re: Removing the jsessionid for SEO
> > > >  >
> > > >  > you subclass the pagenavigator and make it use bookmarkable links
> > > >  > also. it has factory methods for all the links it uses.
> > > >  >
> > > >  > -igor
> > > >  >
> > > >  >
> > > >  > On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan
> <dkaplan@citizenhawk.com
> > >
> > > >  > wrote:
> > > >  > > I wasn't talking about the links that are on the list (I
> already
> > > make
> > > >  > those
> > > >  > >  bookmarkable).  I'm talking about the links that the Navigator
> > > >  > generates.
> > > >  > >  How do I make it so page 2 is bookmarkable?
> > > >  > >
> > > >  > >
> > > >  > >  -----Original Message-----
> > > >  > >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > >  > >
> > > >  > >
> > > >  > > Sent: Thursday, April 03, 2008 3:30 PM
> > > >  > >  To: users@wicket.apache.org
> > > >  > >  Subject: Re: Removing the jsessionid for SEO
> > > >  > >
> > > >  > >  instead of
> > > >  > >
> > > >  > >  item.add(new link("foo") { onclick() });
> > > >  > >
> > > >  > >  do
> > > >  > >
> > > >  > >  item.add(new bookmarkablepagelink("foo", page.class));
> > > >  > >
> > > >  > >  -igor
> > > >  > >
> > > >  > >
> > > >  > >  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
> > > <dk...@citizenhawk.com>
> > > >  > wrote:
> > > >  > >  > How?  I asked how to do it before and nobody suggested this
> as
> > a
> > > >  > >  >  possibility.
> > > >  > >  >
> > > >  > >  >
> > > >  > >  >
> > > >  > >  >  -----Original Message-----
> > > >  > >  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > > >  > >  >  Sent: Thursday, April 03, 2008 3:26 PM
> > > >  > >  >  To: users@wicket.apache.org
> > > >  > >  >  Subject: Re: Removing the jsessionid for SEO
> > > >  > >  >
> > > >  > >  >  dataview can work in a stateless mode, just use
> bookmarkable
> > > links
> > > >  > inside
> > > >  > >  it
> > > >  > >  >
> > > >  > >  >  -igor
> > > >  > >  >
> > > >  > >  >
> > > >  > >  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
> > > <dk...@citizenhawk.com>
> > > >  > >  wrote:
> > > >  > >  >  > Regardless, at the very least this makes your site look
> > > "weird"
> > > >  > and
> > > >  > >  >  >  unprofessional when google puts a jsessionid on your
> url.
> > > There
> > > >  > has
> > > >  > >  got
> > > >  > >  >  to
> > > >  > >  >  >  be some negative effect when google visits it the second
> > > time and
> > > >  > the
> > > >  > >  >  >  jsessionid has changed but it sees the same exact
> content.
> > > Worst
> > > >  > >  case,
> > > >  > >  >  >  it'll think you're trying to trick it.
> > > >  > >  >  >
> > > >  > >  >  >  About those 404s, I'm finding that with the fix I
> provided
> > I
> > > >  > don't get
> > > >  > >  a
> > > >  > >  >  >  404, but the links refresh the page I'm already on.  IE:
> > If
> > > I'm
> > > >  > on A,
> > > >  > >  and
> > > >  > >  >  a
> > > >  > >  >  >  link to B is non-bookmarkable, clicking B refreshes A.
> > > >  > >  >  >
> > > >  > >  >  >  This issue is very disconcerting to me.  It's one of the
> > > reasons
> > > >  > I
> > > >  > >  wish
> > > >  > >  >  that
> > > >  > >  >  >  DataView had an option to work in stateless mode.  Cause
> > if
> > > I ban
> > > >  > >  cookies
> > > >  > >  >  >  and Googlebot visits my home page (with a navigator on
> > it),
> > > it'll
> > > >  > try
> > > >  > >  to
> > > >  > >  >  >  follow all these page links and from its perspective,
> they
> > > all
> > > >  > lead
> > > >  > >  back
> > > >  > >  >  to
> > > >  > >  >  >  the first page.  So it's kinda a catch-22: Include the
> > > jsessionid
> > > >  > in
> > > >  > >  the
> > > >  > >  >  >  urls and get bad SEO or remove the jsessionid and get
> bad
> > > SEO :(
> > > >  > >  >  >
> > > >  > >  >  >  Perhaps the answer to my prayers is a combination of the
> > > >  > >  noindex/nofollow
> > > >  > >  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a
> > > nofollow
> > > >  > on the
> > > >  > >  >  home
> > > >  > >  >  >  page (so googlebot doesn't try to follow the navigator
> > > links) and
> > > >  > use
> > > >  > >  the
> > > >  > >  >  >  sitemap.xml to point out the individual pages I want it
> to
> > > index.
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >  Matej: can you go into more detail about your hybrid URL
> > > >  > statement?
> > > >  > >  >  Won't
> > > >  > >  >  >  google index, for example, /home and /home.1 if I use
> it?
> > > When
> > > >  > it
> > > >  > >  >  follows
> > > >  > >  >  >  the next page, won't the url become /home.1.2 or
> > something?
> > > That
> > > >  > .2
> > > >  > >  is a
> > > >  > >  >  >  page version: If google indexes that and tries to visit
> it
> > > again,
> > > >  > >  won't
> > > >  > >  >  it
> > > >  > >  >  >  report about an invalid session?
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >  -----Original Message-----
> > > >  > >  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
> > > >  > >  >  >  Sent: Thursday, April 03, 2008 11:10 AM
> > > >  > >  >  >  To: users@wicket.apache.org
> > > >  > >  >  >  Subject: Re: Removing the jsessionid for SEO
> > > >  > >  >  >
> > > >  > >  >  >  On the other hand, crawling non-bookmarkable pages is
> not
> > > very
> > > >  > useful
> > > >  > >  >  >  anyway, since ?wicket:interface url will always get page
> > > expired
> > > >  > when
> > > >  > >  >  >  you click on the result.
> > > >  > >  >  >
> > > >  > >  >  >  However, preserving session makes lot of sense with
> hybrid
> > > url.
> > > >  > Google
> > > >  > >  >  >  remembers the original url (without page instance) while
> > > indexing
> > > >  > the
> > > >  > >  >  >  real page (after redirect).
> > > >  > >  >  >
> > > >  > >  >  >  I think though that the crawler is quite advanced. I'm
> > would
> > > >  > think  it
> > > >  > >  >  >  supports cookies (at least JSESSIONID) as well as it
> > > evaluates
> > > >  > some of
> > > >  > >  >  >  the javascript on page.
> > > >  > >  >  >
> > > >  > >  >  >  -Matej
> > > >  > >  >  >
> > > >  > >  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
> > > >  > >  <ig...@gmail.com>
> > > >  > >  >  >  wrote:
> > > >  > >  >  >  > right. if you strip sessionid then all your
> > > nonbookmarkable
> > > >  > urls
> > > >  > >  will
> > > >  > >  >  >  >  resolve to a 404. that will probably drop your rank a
> > lot
> > > >  > >  faster....
> > > >  > >  >  >  >
> > > >  > >  >  >  >  -igor
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
> > > >  > >  <jc...@gmail.com>
> > > >  > >  >  >  wrote:
> > > >  > >  >  >  >  > the problem is that then you have to have all
> > stateless
> > > >  > pages.
> > > >  > >  Else
> > > >  > >  >  >  google
> > > >  > >  >  >  >  >  can't crawl your website.
> > > >  > >  >  >  >  >  And if that is the case then you could be
> completely
> > > >  > stateless
> > > >  > >  so
> > > >  > >  >  you
> > > >  > >  >  >  dont
> > > >  > >  >  >  >  >  have a session (id) to worry about at all.
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >  johan
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini,
> Larry
> > <
> > > >  > >  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >  > When Google asks to not have special treatment
> for
> > > their
> > > >  > bot,
> > > >  > >  >  they
> > > >  > >  >  >  are
> > > >  > >  >  >  >  >  > referring to content more than anything.
> Regarding
> > > the
> > > >  > session
> > > >  > >  id
> > > >  > >  >  >  being
> > > >  > >  >  >  >  >  > coded in the URL, see the Technical guidelines
> > > section of
> > > >  > >  >  Google's
> > > >  > >  >  >  >  >  > Webmaster Guidelines -
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> > > >  > >
> > > >  >
> > >
> http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
> > > >  > >  >  >  >  >  > gn
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > It specifically recommends "allow(ing) search
> bots
> > > to
> > > >  > crawl
> > > >  > >  your
> > > >  > >  >  >  sites
> > > >  > >  >  >  >  >  > without session IDs or arguments that track
> their
> > > path
> > > >  > through
> > > >  > >  >  the
> > > >  > >  >  >  >  >  > site."
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > -----Original Message-----
> > > >  > >  >  >  >  >  > From: Johan Compagner
> [mailto:jcompagner@gmail.com
> > ]
> > > >  > >  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
> > > >  > >  >  >  >  >  > To: users@wicket.apache.org
> > > >  > >  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > isnt google always saying that you shouldn't
> alter
> > > >  > behavior of
> > > >  > >  >  your
> > > >  > >  >  >  site
> > > >  > >  >  >  >  >  > depending of it is there bot or not?
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
> > > >  > <a_...@gazeta.pl>
> > > >  > >  >  >  wrote:
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > Hi!
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > igor.vaynberg wrote:
> > > >  > >  >  >  >  >  > > >
> > > >  > >  >  >  >  >  > > > also by doing what you have done users with
> > > cookies
> > > >  > >  disabled
> > > >  > >  >  >  wont be
> > > >  > >  >  >  >  >  > > > able to use your site...
> > > >  > >  >  >  >  >  > > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > In my opinion session id is a problem. Google
> > > index the
> > > >  > same
> > > >  > >  >  page
> > > >  > >  >  >  >  >  > again
> > > >  > >  >  >  >  >  > > and
> > > >  > >  >  >  >  >  > > again.
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > About the users without cookies we can do like
> > > this:
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >        static class Unbuffered extends
> > WebResponse
> > > {
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                 private static final String[]
> > > botAgents
> > > >  > = {
> > > >  > >  >  >  >  >  > "onetszukaj",
> > > >  > >  >  >  >  >  > > "googlebot",
> > > >  > >  >  >  >  >  > > "appie", "architext",
> > > >  > >  >  >  >  >  > >                        "jeeves", "bjaaland",
> > > "ferret",
> > > >  > >  >  "gulliver",
> > > >  > >  >  >  >  >  > > "harvest", "htdig",
> > > >  > >  >  >  >  >  > >                        "linkwalker", "lycos_",
> > > "moget",
> > > >  > >  >  >  >  >  > "muscatferret",
> > > >  > >  >  >  >  >  > > "myweb", "nomad",
> > > >  > >  >  >  >  >  > > "scooter",
> > > >  > >  >  >  >  >  > >
> "yahoo!\\sslurp\\schina",
> > > >  > "slurp",
> > > >  > >  >  >  "weblayers",
> > > >  > >  >  >  >  >  > > "antibot", "bruinbot",
> > > >  > >  >  >  >  >  > > "digout4u",
> > > >  > >  >  >  >  >  > >                        "echo!", "ia_archiver",
> > > >  > "jennybot",
> > > >  > >  >  >  "mercator",
> > > >  > >  >  >  >  >  > > "netcraft", "msnbot",
> > > >  > >  >  >  >  >  > > "petersnews",
> > > >  > >  >  >  >  >  > >                        "unlost_web_crawler",
> > > "voila",
> > > >  > >  >  "webbase",
> > > >  > >  >  >  >  >  > > "webcollage", "cfetch",
> > > >  > >  >  >  >  >  > > "zyborg",
> > > >  > >  >  >  >  >  > >                        "wisenutbot", "robot",
> > > "crawl",
> > > >  > >  "spider"
> > > >  > >  >  };
> > > >  > >  >  >  /*
> > > >  > >  >  >  >  >  > and
> > > >  > >  >  >  >  >  > > so on... */
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                public Unbuffered(final
> > > >  > HttpServletResponse
> > > >  > >  res)
> > > >  > >  >  {
> > > >  > >  >  >  >  >  > >            super(res);
> > > >  > >  >  >  >  >  > >         }
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >        @Override
> > > >  > >  >  >  >  >  > >        public CharSequence encodeURL(final
> > > CharSequence
> > > >  > url)
> > > >  > >  {
> > > >  > >  >  >  >  >  > >             return isAgent() ? url :
> > > >  > super.encodeURL(url);
> > > >  > >  >  >  >  >  > >        }
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                private static boolean
> isAgent()
> > {
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                        String agent =
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> > > >  > >
> > > >  >
> > >
> ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
> > > >  > >  >  >  >  >  > tHeader("User-Agent");
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                        for(String bot :
> > botAgents)
> > > {
> > > >  > >  >  >  >  >  > >                                if
> > > >  > >  >  >  (agent.toLowerCase().indexOf(bot) !=
> > > >  > >  >  >  >  >  > -1)
> > > >  > >  >  >  >  >  > > {
> > > >  > >  >  >  >  >  > >                                        return
> > > true;
> > > >  > >  >  >  >  >  > >                                }
> > > >  > >  >  >  >  >  > >                        }
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >                        return false;
> > > >  > >  >  >  >  >  > >                }
> > > >  > >  >  >  >  >  > >    }
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > I didn't test this code but I do similar thing
> > in
> > > my
> > > >  > old
> > > >  > >  >  >  application
> > > >  > >  >  >  >  >  > in
> > > >  > >  >  >  >  >  > > Spring and it works.
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > Take care,
> > > >  > >  >  >  >  >  > > Artur
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > > --
> > > >  > >  >  >  >  >  > > View this message in context:
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> > > >  > >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> > > >  > tp16464534p1646739
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> > > >  > >  >
> > > >  > >  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> > > >  > tp16464534p1646
> > > >  > >  >  >  7396.html>
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >  > > > Sent from the Wicket - User mailing list
> archive
> > at
> > > >  > >  Nabble.com.
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >
> >  ------------------------------------------------------------
> > > -----
> > > >  > ----
> > > >  > >  >  >  >  >  > > To unsubscribe, e-mail: users-
> > > >  > unsubscribe@wicket.apache.org
> > > >  > >  >  >  >  >  > > For additional commands, e-mail:
> > > >  > >  users-help@wicket.apache.org
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  > >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > ______________
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  > The information contained in this message is
> > > proprietary
> > > >  > >  and/or
> > > >  > >  >  >  >  >  > confidential. If you are not the
> > > >  > >  >  >  >  >  > intended recipient, please: (i) delete the
> message
> > > and
> > > >  > all
> > > >  > >  >  copies;
> > > >  > >  >  >  (ii) do
> > > >  > >  >  >  >  >  > not disclose,
> > > >  > >  >  >  >  >  > distribute or use the message in any manner; and
> > > (iii)
> > > >  > notify
> > > >  > >  the
> > > >  > >  >  >  sender
> > > >  > >  >  >  >  >  > immediately. In addition,
> > > >  > >  >  >  >  >  > please be aware that any message addressed to
> our
> > > domain
> > > >  > is
> > > >  > >  >  subject
> > > >  > >  >  >  to
> > > >  > >  >  >  >  >  > archiving and review by
> > > >  > >  >  >  >  >  > persons other than the intended recipient. Thank
> > > you.
> > > >  > >  >  >  >  >  > _____________
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >
> >  ------------------------------------------------------------
> > > -----
> > > >  > ----
> > > >  > >  >  >  >  >  > To unsubscribe, e-mail: users-
> > > >  > unsubscribe@wicket.apache.org
> > > >  > >  >  >  >  >  > For additional commands, e-mail: users-
> > > >  > help@wicket.apache.org
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >  >
> > > >  > >  >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >
> >  ------------------------------------------------------------------
> > > ---
> > > >  > >  >  >  >  To unsubscribe, e-mail: users-
> > > unsubscribe@wicket.apache.org
> > > >  > >  >  >  >  For additional commands, e-mail: users-
> > > help@wicket.apache.org
> > > >  > >  >  >  >
> > > >  > >  >  >  >
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >  --
> > > >  > >  >  >  Resizable and reorderable grid components.
> > > >  > >  >  >  http://www.inmethod.com
> > > >  > >  >  >
> > > >  > >  >  >
> >  ------------------------------------------------------------
> > > -----
> > > >  > ----
> > > >  > >  >  >  To unsubscribe, e-mail:
> > users-unsubscribe@wicket.apache.org
> > > >  > >  >  >  For additional commands, e-mail: users-
> > > help@wicket.apache.org
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >  >
> >  ------------------------------------------------------------
> > > -----
> > > >  > ----
> > > >  > >  >  >  To unsubscribe, e-mail:
> > users-unsubscribe@wicket.apache.org
> > > >  > >  >  >  For additional commands, e-mail: users-
> > > help@wicket.apache.org
> > > >  > >  >  >
> > > >  > >  >  >
> > > >  > >  >
> > > >  > >  >
> >  ---------------------------------------------------------------
> > > -----
> > > >  > -
> > > >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > >  >  For additional commands, e-mail: users-
> help@wicket.apache.org
> > > >  > >  >
> > > >  > >  >
> > > >  > >  >
> >  ---------------------------------------------------------------
> > > -----
> > > >  > -
> > > >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > >  >  For additional commands, e-mail: users-
> help@wicket.apache.org
> > > >  > >  >
> > > >  > >  >
> > > >  > >
> > > >  > >
> >  ------------------------------------------------------------------
> > > ---
> > > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > >  For additional commands, e-mail: users-help@wicket.apache.org
> > > >  > >
> > > >  > >
> > > >  > >
> >  ------------------------------------------------------------------
> > > ---
> > > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > >  For additional commands, e-mail: users-help@wicket.apache.org
> > > >  > >
> > > >  > >
> > > >  >
> > > >  >
> > ---------------------------------------------------------------------
> > > >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  > For additional commands, e-mail: users-help@wicket.apache.org
> > > >
> > > >
> > > >  -------------------------------------------------------------------
> --
> > > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > >  For additional commands, e-mail: users-help@wicket.apache.org
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > > For additional commands, e-mail: users-help@wicket.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > For additional commands, e-mail: users-help@wicket.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Korbinian Bachl - privat <ko...@whiskyworld.de>.
Hi Jeremy,
Hi Dan,

for a project long ago I had the trail of making a product-browser SEO 
friendly; I used a plain PagingNavigator at first, and then extended it 
to have it to use the IndexedUrlPageParameters; this allowed me to put 
anything into the path to have a nice URL;

the key here is to look at the URL and treat it as a unique resource 
line; so I did it sth like that:

mountName{(/anyparams)}*{/pageNumber}

this gave me the possiblity to have a browsing URL where I could put 
anything in while the rest still works; remember also that the URL for 
SEO may (!) change in future, so go for maximum flexible designs, up you 
see a resource, then any params to feed the spider (there may be 0 to 
over 10) and a hook at the end that has to be a number (where 0 is 
pretended in case nothing at the end is a number);

so I was able to finally let the spider see things like:

e.g:
product/brand_New/BestItemOfTheWorld
product/specialCategory/moreSpecial/moreInfo/2
product/spcialCategory/moreSpecail/brandName/moreDetails/1

etc.

now, you wonder if I feed the spider with this how do I know where to 
end?  the key was that the part between got merged internally and was 
specified by the application so we overcome the problem of:

a, recreating the view that should be the right one (here: we had a 
tree-like behaviour for our products where we could compare to the tree 
in database)

b, duplicate content (very bad! - never, ever have a spider find the 
same content (or very very similar!) under more than one URL !)

this strategy did very well; Today with wicket 1.3 I would go nearly the 
same but stick to the HybridURL scheme, and maybe try to be even more 
flexible with URL scheme by having the basic schemes and resources 
specified in persistence (URL-hook, initialState); Remember it is 
important to feed same resources under same URLs out, else the spider 
will think you might try to fake content for him;

The jsessionID is sth. I dont care about anymore - its 2008, spiders 
knows it and the usual visitor/ surfer has no clue how to different a 
URL from an emailadress; however many people have turned cookies + JS 
off because of security fears - in turn the JSessionID will concern only 
few people who know about some details but hamper many people that have 
no knowledge of the internet and its techniques all over - IMHO.

@Jeremy: your aproach also seems interesting to me, can you give more 
details about it?

Best,

Korbinian

Jeremy Thomerson schrieb:
> ....

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Jeremy Thomerson <je...@thomersonfamily.com>.
I've been building a community-driven hunting and fishing site in Texas for
the past year and a half.  Since I have converted it to Wicket from
ColdFusion, our search engine rankings have gone WAY UP.  That's right,
we're on the first page for tons of searches.  Search for "texas hunting" -
we're second under only the Texas Parks and Wildlife Association.

How?  With Wicket?  Yes - it requires a little more work.  What I do is that
for any link that I want Google to be able to follow, I have a subclass of
Link specific to that.  For instance, ViewThreadLink, which takes the ID for
the link and a model (detachable) of the thread.  Then I mount an
IRequestTargetUrlCodingStrategy for each big category of things in my
webapp.  I've made several strategies that I use over and over, just giving
them a different mount path and a different parameter to tell it what kind
of article, etc, that it will match to.  This is made easier because over
75% of the objects in our site are all similar enough that the extend from a
base class that provides the basic functionality for an article / thread /
etc that has a title, text, pictures, comments, the standard stuff.

So, yes, it takes work.  But that's okay - SEO always takes work.  I also
have given a lot of care to use good page titles, good semantic HTML and
stuff things into the URL that don't have anything to do with locating the
resource, but give the search engines a clue as to what the content is.

Yes, some pages end up with a jsessionid - and I don't like it (example:
http://www.google.com/search?hl=en&client=firefox-a&rls=com.ubuntu%3Aen-US%3Aofficial&q=%22south+texas+management+buck%22&btnG=Search).
But, most don't because almost all of my links are bookmarkable.  When the
user clicks something that they can only do as a signed-in user, then it
redirects them to the sign in page, they sign in, and are taken back to the
page they were on.  Then they can pick up, and I don't worry about
bookmarkable URLs for anything that requires user-authentication (wizards to
post a new listing, story, admin links, etc).

Jeremy Thomerson
TexasHuntFish.com

On Thu, Apr 3, 2008 at 8:09 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:

> Ok, at least I'm not missing anything.  I understand the benefits it's
> providing with its stateful framework.  Developing a site with Wicket is
> easier than with any other framework I've used.  But this statefulness,
> which makes websites so easy to develop, seems to be counter productive to
> SEO:
>
> GoogleBot will follow and index stateful links.  Worst case scenario,
> these
> actually become visible to google users and when they click the link it
> takes them to an "invalid session" page.  They think, "This site is
> broken"
> and move on to the next link of their search result.
>
> Another approach to solving this is to block all the stateful pages in my
> robots.txt file.  But how can I block these links in robots.txt since they
> change per session?  Is there any way to know what the url will resolve to
> when googlebot tries to visit my site so I can tell it to disallow:
> /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?
>
>
> > -----Original Message-----
> > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > Sent: Thursday, April 03, 2008 5:45 PM
> > To: users@wicket.apache.org
> > Subject: Re: Removing the jsessionid for SEO
> >
> > On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan <dk...@citizenhawk.com>
> > wrote:
> > > Ok I did a little preliminary research on this.  Right now
> > PagingNavigator
> > >  uses PagingNavigationLink's to represent its page.  This extends
> Link.
> > I'm
> > >  supposed to override PagingNavigator's newPagingNavigationLink()
> method
> > to
> > >  accomplish this (I think) but past that, this isn't very
> > straightforward to
> > >  me.
> > >
> > >  Do I need to create my own BookmarkablePagingNavigationLink?  When I
> > do...
> > >  what next?  I really don't know enough about bookmarkablePageLinks to
> > do
> > >  this.  Right now, all the magic happens inside PagingNavigationLink.
> > Won't
> > >  I have to move all that logic into the WebPage that I'm passing into
> > >  BookmarkablePagingNavigationLink?  This seems like a lot of work.  Am
> I
> > >  missing something critical?
> >
> > no, you are not missing anything. you see, when you go stateless, like
> > what you want, then you have to recreate all the magic stuff that
> > makes stateful links Just Work. Without state you are back to the
> > servlet/mvc programming model: you have to encode the state that you
> > want into the link, then on the trip back decode it, recreate
> > something from it, and then apply that something onto the components.
> > This is the crapwork that wicket does for you usually.
> >
> > -igor
> >
> >
> > >
> > >
> > >  > -----Original Message-----
> > >  > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > >
> > >
> > > > Sent: Thursday, April 03, 2008 3:40 PM
> > >  > To: users@wicket.apache.org
> > >  > Subject: Re: Removing the jsessionid for SEO
> > >  >
> > >  > you subclass the pagenavigator and make it use bookmarkable links
> > >  > also. it has factory methods for all the links it uses.
> > >  >
> > >  > -igor
> > >  >
> > >  >
> > >  > On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan <dkaplan@citizenhawk.com
> >
> > >  > wrote:
> > >  > > I wasn't talking about the links that are on the list (I already
> > make
> > >  > those
> > >  > >  bookmarkable).  I'm talking about the links that the Navigator
> > >  > generates.
> > >  > >  How do I make it so page 2 is bookmarkable?
> > >  > >
> > >  > >
> > >  > >  -----Original Message-----
> > >  > >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > >  > >
> > >  > >
> > >  > > Sent: Thursday, April 03, 2008 3:30 PM
> > >  > >  To: users@wicket.apache.org
> > >  > >  Subject: Re: Removing the jsessionid for SEO
> > >  > >
> > >  > >  instead of
> > >  > >
> > >  > >  item.add(new link("foo") { onclick() });
> > >  > >
> > >  > >  do
> > >  > >
> > >  > >  item.add(new bookmarkablepagelink("foo", page.class));
> > >  > >
> > >  > >  -igor
> > >  > >
> > >  > >
> > >  > >  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
> > <dk...@citizenhawk.com>
> > >  > wrote:
> > >  > >  > How?  I asked how to do it before and nobody suggested this as
> a
> > >  > >  >  possibility.
> > >  > >  >
> > >  > >  >
> > >  > >  >
> > >  > >  >  -----Original Message-----
> > >  > >  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > >  > >  >  Sent: Thursday, April 03, 2008 3:26 PM
> > >  > >  >  To: users@wicket.apache.org
> > >  > >  >  Subject: Re: Removing the jsessionid for SEO
> > >  > >  >
> > >  > >  >  dataview can work in a stateless mode, just use bookmarkable
> > links
> > >  > inside
> > >  > >  it
> > >  > >  >
> > >  > >  >  -igor
> > >  > >  >
> > >  > >  >
> > >  > >  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
> > <dk...@citizenhawk.com>
> > >  > >  wrote:
> > >  > >  >  > Regardless, at the very least this makes your site look
> > "weird"
> > >  > and
> > >  > >  >  >  unprofessional when google puts a jsessionid on your url.
> > There
> > >  > has
> > >  > >  got
> > >  > >  >  to
> > >  > >  >  >  be some negative effect when google visits it the second
> > time and
> > >  > the
> > >  > >  >  >  jsessionid has changed but it sees the same exact content.
> > Worst
> > >  > >  case,
> > >  > >  >  >  it'll think you're trying to trick it.
> > >  > >  >  >
> > >  > >  >  >  About those 404s, I'm finding that with the fix I provided
> I
> > >  > don't get
> > >  > >  a
> > >  > >  >  >  404, but the links refresh the page I'm already on.  IE:
> If
> > I'm
> > >  > on A,
> > >  > >  and
> > >  > >  >  a
> > >  > >  >  >  link to B is non-bookmarkable, clicking B refreshes A.
> > >  > >  >  >
> > >  > >  >  >  This issue is very disconcerting to me.  It's one of the
> > reasons
> > >  > I
> > >  > >  wish
> > >  > >  >  that
> > >  > >  >  >  DataView had an option to work in stateless mode.  Cause
> if
> > I ban
> > >  > >  cookies
> > >  > >  >  >  and Googlebot visits my home page (with a navigator on
> it),
> > it'll
> > >  > try
> > >  > >  to
> > >  > >  >  >  follow all these page links and from its perspective, they
> > all
> > >  > lead
> > >  > >  back
> > >  > >  >  to
> > >  > >  >  >  the first page.  So it's kinda a catch-22: Include the
> > jsessionid
> > >  > in
> > >  > >  the
> > >  > >  >  >  urls and get bad SEO or remove the jsessionid and get bad
> > SEO :(
> > >  > >  >  >
> > >  > >  >  >  Perhaps the answer to my prayers is a combination of the
> > >  > >  noindex/nofollow
> > >  > >  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a
> > nofollow
> > >  > on the
> > >  > >  >  home
> > >  > >  >  >  page (so googlebot doesn't try to follow the navigator
> > links) and
> > >  > use
> > >  > >  the
> > >  > >  >  >  sitemap.xml to point out the individual pages I want it to
> > index.
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >  Matej: can you go into more detail about your hybrid URL
> > >  > statement?
> > >  > >  >  Won't
> > >  > >  >  >  google index, for example, /home and /home.1 if I use it?
> > When
> > >  > it
> > >  > >  >  follows
> > >  > >  >  >  the next page, won't the url become /home.1.2 or
> something?
> > That
> > >  > .2
> > >  > >  is a
> > >  > >  >  >  page version: If google indexes that and tries to visit it
> > again,
> > >  > >  won't
> > >  > >  >  it
> > >  > >  >  >  report about an invalid session?
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >  -----Original Message-----
> > >  > >  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
> > >  > >  >  >  Sent: Thursday, April 03, 2008 11:10 AM
> > >  > >  >  >  To: users@wicket.apache.org
> > >  > >  >  >  Subject: Re: Removing the jsessionid for SEO
> > >  > >  >  >
> > >  > >  >  >  On the other hand, crawling non-bookmarkable pages is not
> > very
> > >  > useful
> > >  > >  >  >  anyway, since ?wicket:interface url will always get page
> > expired
> > >  > when
> > >  > >  >  >  you click on the result.
> > >  > >  >  >
> > >  > >  >  >  However, preserving session makes lot of sense with hybrid
> > url.
> > >  > Google
> > >  > >  >  >  remembers the original url (without page instance) while
> > indexing
> > >  > the
> > >  > >  >  >  real page (after redirect).
> > >  > >  >  >
> > >  > >  >  >  I think though that the crawler is quite advanced. I'm
> would
> > >  > think  it
> > >  > >  >  >  supports cookies (at least JSESSIONID) as well as it
> > evaluates
> > >  > some of
> > >  > >  >  >  the javascript on page.
> > >  > >  >  >
> > >  > >  >  >  -Matej
> > >  > >  >  >
> > >  > >  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
> > >  > >  <ig...@gmail.com>
> > >  > >  >  >  wrote:
> > >  > >  >  >  > right. if you strip sessionid then all your
> > nonbookmarkable
> > >  > urls
> > >  > >  will
> > >  > >  >  >  >  resolve to a 404. that will probably drop your rank a
> lot
> > >  > >  faster....
> > >  > >  >  >  >
> > >  > >  >  >  >  -igor
> > >  > >  >  >  >
> > >  > >  >  >  >
> > >  > >  >  >  >
> > >  > >  >  >  >
> > >  > >  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
> > >  > >  <jc...@gmail.com>
> > >  > >  >  >  wrote:
> > >  > >  >  >  >  > the problem is that then you have to have all
> stateless
> > >  > pages.
> > >  > >  Else
> > >  > >  >  >  google
> > >  > >  >  >  >  >  can't crawl your website.
> > >  > >  >  >  >  >  And if that is the case then you could be completely
> > >  > stateless
> > >  > >  so
> > >  > >  >  you
> > >  > >  >  >  dont
> > >  > >  >  >  >  >  have a session (id) to worry about at all.
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >  johan
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry
> <
> > >  > >  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >  > When Google asks to not have special treatment for
> > their
> > >  > bot,
> > >  > >  >  they
> > >  > >  >  >  are
> > >  > >  >  >  >  >  > referring to content more than anything. Regarding
> > the
> > >  > session
> > >  > >  id
> > >  > >  >  >  being
> > >  > >  >  >  >  >  > coded in the URL, see the Technical guidelines
> > section of
> > >  > >  >  Google's
> > >  > >  >  >  >  >  > Webmaster Guidelines -
> > >  > >  >  >  >  >  >
> > >  > >  >  >
> > >  > >
> > >  >
> > http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
> > >  > >  >  >  >  >  > gn
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > It specifically recommends "allow(ing) search bots
> > to
> > >  > crawl
> > >  > >  your
> > >  > >  >  >  sites
> > >  > >  >  >  >  >  > without session IDs or arguments that track their
> > path
> > >  > through
> > >  > >  >  the
> > >  > >  >  >  >  >  > site."
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > -----Original Message-----
> > >  > >  >  >  >  >  > From: Johan Compagner [mailto:jcompagner@gmail.com
> ]
> > >  > >  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
> > >  > >  >  >  >  >  > To: users@wicket.apache.org
> > >  > >  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > isnt google always saying that you shouldn't alter
> > >  > behavior of
> > >  > >  >  your
> > >  > >  >  >  site
> > >  > >  >  >  >  >  > depending of it is there bot or not?
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
> > >  > <a_...@gazeta.pl>
> > >  > >  >  >  wrote:
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > Hi!
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > igor.vaynberg wrote:
> > >  > >  >  >  >  >  > > >
> > >  > >  >  >  >  >  > > > also by doing what you have done users with
> > cookies
> > >  > >  disabled
> > >  > >  >  >  wont be
> > >  > >  >  >  >  >  > > > able to use your site...
> > >  > >  >  >  >  >  > > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > In my opinion session id is a problem. Google
> > index the
> > >  > same
> > >  > >  >  page
> > >  > >  >  >  >  >  > again
> > >  > >  >  >  >  >  > > and
> > >  > >  >  >  >  >  > > again.
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > About the users without cookies we can do like
> > this:
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >        static class Unbuffered extends
> WebResponse
> > {
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >                 private static final String[]
> > botAgents
> > >  > = {
> > >  > >  >  >  >  >  > "onetszukaj",
> > >  > >  >  >  >  >  > > "googlebot",
> > >  > >  >  >  >  >  > > "appie", "architext",
> > >  > >  >  >  >  >  > >                        "jeeves", "bjaaland",
> > "ferret",
> > >  > >  >  "gulliver",
> > >  > >  >  >  >  >  > > "harvest", "htdig",
> > >  > >  >  >  >  >  > >                        "linkwalker", "lycos_",
> > "moget",
> > >  > >  >  >  >  >  > "muscatferret",
> > >  > >  >  >  >  >  > > "myweb", "nomad",
> > >  > >  >  >  >  >  > > "scooter",
> > >  > >  >  >  >  >  > >                        "yahoo!\\sslurp\\schina",
> > >  > "slurp",
> > >  > >  >  >  "weblayers",
> > >  > >  >  >  >  >  > > "antibot", "bruinbot",
> > >  > >  >  >  >  >  > > "digout4u",
> > >  > >  >  >  >  >  > >                        "echo!", "ia_archiver",
> > >  > "jennybot",
> > >  > >  >  >  "mercator",
> > >  > >  >  >  >  >  > > "netcraft", "msnbot",
> > >  > >  >  >  >  >  > > "petersnews",
> > >  > >  >  >  >  >  > >                        "unlost_web_crawler",
> > "voila",
> > >  > >  >  "webbase",
> > >  > >  >  >  >  >  > > "webcollage", "cfetch",
> > >  > >  >  >  >  >  > > "zyborg",
> > >  > >  >  >  >  >  > >                        "wisenutbot", "robot",
> > "crawl",
> > >  > >  "spider"
> > >  > >  >  };
> > >  > >  >  >  /*
> > >  > >  >  >  >  >  > and
> > >  > >  >  >  >  >  > > so on... */
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >                public Unbuffered(final
> > >  > HttpServletResponse
> > >  > >  res)
> > >  > >  >  {
> > >  > >  >  >  >  >  > >            super(res);
> > >  > >  >  >  >  >  > >         }
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >        @Override
> > >  > >  >  >  >  >  > >        public CharSequence encodeURL(final
> > CharSequence
> > >  > url)
> > >  > >  {
> > >  > >  >  >  >  >  > >             return isAgent() ? url :
> > >  > super.encodeURL(url);
> > >  > >  >  >  >  >  > >        }
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >                private static boolean isAgent()
> {
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >                        String agent =
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  >
> > >  > >  >  >
> > >  > >
> > >  >
> > ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
> > >  > >  >  >  >  >  > tHeader("User-Agent");
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >                        for(String bot :
> botAgents)
> > {
> > >  > >  >  >  >  >  > >                                if
> > >  > >  >  >  (agent.toLowerCase().indexOf(bot) !=
> > >  > >  >  >  >  >  > -1)
> > >  > >  >  >  >  >  > > {
> > >  > >  >  >  >  >  > >                                        return
> > true;
> > >  > >  >  >  >  >  > >                                }
> > >  > >  >  >  >  >  > >                        }
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >                        return false;
> > >  > >  >  >  >  >  > >                }
> > >  > >  >  >  >  >  > >    }
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > I didn't test this code but I do similar thing
> in
> > my
> > >  > old
> > >  > >  >  >  application
> > >  > >  >  >  >  >  > in
> > >  > >  >  >  >  >  > > Spring and it works.
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > Take care,
> > >  > >  >  >  >  >  > > Artur
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > --
> > >  > >  >  >  >  >  > > View this message in context:
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  >
> > >  > >  >  >
> > >  > >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> > >  > tp16464534p1646739
> > >  > >  >  >  >  >  >
> > >  > >  >  >
> > >  > >  >
> > >  > >  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> > >  > tp16464534p1646
> > >  > >  >  >  7396.html>
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >  > > > Sent from the Wicket - User mailing list archive
> at
> > >  > >  Nabble.com.
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >
>  ------------------------------------------------------------
> > -----
> > >  > ----
> > >  > >  >  >  >  >  > > To unsubscribe, e-mail: users-
> > >  > unsubscribe@wicket.apache.org
> > >  > >  >  >  >  >  > > For additional commands, e-mail:
> > >  > >  users-help@wicket.apache.org
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > ______________
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > The information contained in this message is
> > proprietary
> > >  > >  and/or
> > >  > >  >  >  >  >  > confidential. If you are not the
> > >  > >  >  >  >  >  > intended recipient, please: (i) delete the message
> > and
> > >  > all
> > >  > >  >  copies;
> > >  > >  >  >  (ii) do
> > >  > >  >  >  >  >  > not disclose,
> > >  > >  >  >  >  >  > distribute or use the message in any manner; and
> > (iii)
> > >  > notify
> > >  > >  the
> > >  > >  >  >  sender
> > >  > >  >  >  >  >  > immediately. In addition,
> > >  > >  >  >  >  >  > please be aware that any message addressed to our
> > domain
> > >  > is
> > >  > >  >  subject
> > >  > >  >  >  to
> > >  > >  >  >  >  >  > archiving and review by
> > >  > >  >  >  >  >  > persons other than the intended recipient. Thank
> > you.
> > >  > >  >  >  >  >  > _____________
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  >
> > >  > >  >  >
>  ------------------------------------------------------------
> > -----
> > >  > ----
> > >  > >  >  >  >  >  > To unsubscribe, e-mail: users-
> > >  > unsubscribe@wicket.apache.org
> > >  > >  >  >  >  >  > For additional commands, e-mail: users-
> > >  > help@wicket.apache.org
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >
> > >  > >  >  >  >
> > >  > >
>  ------------------------------------------------------------------
> > ---
> > >  > >  >  >  >  To unsubscribe, e-mail: users-
> > unsubscribe@wicket.apache.org
> > >  > >  >  >  >  For additional commands, e-mail: users-
> > help@wicket.apache.org
> > >  > >  >  >  >
> > >  > >  >  >  >
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >  --
> > >  > >  >  >  Resizable and reorderable grid components.
> > >  > >  >  >  http://www.inmethod.com
> > >  > >  >  >
> > >  > >  >  >
>  ------------------------------------------------------------
> > -----
> > >  > ----
> > >  > >  >  >  To unsubscribe, e-mail:
> users-unsubscribe@wicket.apache.org
> > >  > >  >  >  For additional commands, e-mail: users-
> > help@wicket.apache.org
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >
>  ------------------------------------------------------------
> > -----
> > >  > ----
> > >  > >  >  >  To unsubscribe, e-mail:
> users-unsubscribe@wicket.apache.org
> > >  > >  >  >  For additional commands, e-mail: users-
> > help@wicket.apache.org
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >
> > >  > >  >
>  ---------------------------------------------------------------
> > -----
> > >  > -
> > >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > >  > >  >  For additional commands, e-mail: users-help@wicket.apache.org
> > >  > >  >
> > >  > >  >
> > >  > >  >
>  ---------------------------------------------------------------
> > -----
> > >  > -
> > >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > >  > >  >  For additional commands, e-mail: users-help@wicket.apache.org
> > >  > >  >
> > >  > >  >
> > >  > >
> > >  > >
>  ------------------------------------------------------------------
> > ---
> > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > >  > >  For additional commands, e-mail: users-help@wicket.apache.org
> > >  > >
> > >  > >
> > >  > >
>  ------------------------------------------------------------------
> > ---
> > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > >  > >  For additional commands, e-mail: users-help@wicket.apache.org
> > >  > >
> > >  > >
> > >  >
> > >  >
> ---------------------------------------------------------------------
> > >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > >  > For additional commands, e-mail: users-help@wicket.apache.org
> > >
> > >
> > >  ---------------------------------------------------------------------
> > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > >  For additional commands, e-mail: users-help@wicket.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > For additional commands, e-mail: users-help@wicket.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

Re: Removing the jsessionid for SEO

Posted by Jeremy Levy <je...@meetmoi.com>.
We have a similar issue, and are trying the following out right now..

http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=40367

User-agent: *
Disallow: /*?




On Thu, Apr 3, 2008 at 9:09 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:

> Ok, at least I'm not missing anything.  I understand the benefits it's
> providing with its stateful framework.  Developing a site with Wicket is
> easier than with any other framework I've used.  But this statefulness,
> which makes websites so easy to develop, seems to be counter productive to
> SEO:
>
> GoogleBot will follow and index stateful links.  Worst case scenario,
> these
> actually become visible to google users and when they click the link it
> takes them to an "invalid session" page.  They think, "This site is
> broken"
> and move on to the next link of their search result.
>
> Another approach to solving this is to block all the stateful pages in my
> robots.txt file.  But how can I block these links in robots.txt since they
> change per session?  Is there any way to know what the url will resolve to
> when googlebot tries to visit my site so I can tell it to disallow:
> /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?
>
>
> > -----Original Message-----
> > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > Sent: Thursday, April 03, 2008 5:45 PM
> > To: users@wicket.apache.org
> > Subject: Re: Removing the jsessionid for SEO
> >
> > On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan <dk...@citizenhawk.com>
> > wrote:
> > > Ok I did a little preliminary research on this.  Right now
> > PagingNavigator
> > >  uses PagingNavigationLink's to represent its page.  This extends
> Link.
> > I'm
> > >  supposed to override PagingNavigator's newPagingNavigationLink()
> method
> > to
> > >  accomplish this (I think) but past that, this isn't very
> > straightforward to
> > >  me.
> > >
> > >  Do I need to create my own BookmarkablePagingNavigationLink?  When I
> > do...
> > >  what next?  I really don't know enough about bookmarkablePageLinks to
> > do
> > >  this.  Right now, all the magic happens inside PagingNavigationLink.
> > Won't
> > >  I have to move all that logic into the WebPage that I'm passing into
> > >  BookmarkablePagingNavigationLink?  This seems like a lot of work.  Am
> I
> > >  missing something critical?
> >
> > no, you are not missing anything. you see, when you go stateless, like
> > what you want, then you have to recreate all the magic stuff that
> > makes stateful links Just Work. Without state you are back to the
> > servlet/mvc programming model: you have to encode the state that you
> > want into the link, then on the trip back decode it, recreate
> > something from it, and then apply that something onto the components.
> > This is the crapwork that wicket does for you usually.
> >
> > -igor
> >
> >
> > >
> > >
> > >  > -----Original Message-----
> > >  > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > >
> > >
> > > > Sent: Thursday, April 03, 2008 3:40 PM
> > >  > To: users@wicket.apache.org
> > >  > Subject: Re: Removing the jsessionid for SEO
> > >  >
> > >  > you subclass the pagenavigator and make it use bookmarkable links
> > >  > also. it has factory methods for all the links it uses.
> > >  >
> > >  > -igor
> > >  >
> > >  >
> > >  > On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan <dkaplan@citizenhawk.com
> >
> > >  > wrote:
> > >  > > I wasn't talking about the links that are on the list (I already
> > make
> > >  > those
> > >  > >  bookmarkable).  I'm talking about the links that the Navigator
> > >  > generates.
> > >  > >  How do I make it so page 2 is bookmarkable?
> > >  > >
> > >  > >
> > >  > >  -----Original Message-----
> > >  > >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > >  > >
> > >  > >
> > >  > > Sent: Thursday, April 03, 2008 3:30 PM
> > >  > >  To: users@wicket.apache.org
> > >  > >  Subject: Re: Removing the jsessionid for SEO
> > >  > >
> > >  > >  instead of
> > >  > >
> > >  > >  item.add(new link("foo") { onclick() });
> > >  > >
> > >  > >  do
> > >  > >
> > >  > >  item.add(new bookmarkablepagelink("foo", page.class));
> > >  > >
> > >  > >  -igor
> > >  > >
> > >  > >
> > >  > >  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
> > <dk...@citizenhawk.com>
> > >  > wrote:
> > >  > >  > How?  I asked how to do it before and nobody suggested this as
> a
> > >  > >  >  possibility.
> > >  > >  >
> > >  > >  >
> > >  > >  >
> > >  > >  >  -----Original Message-----
> > >  > >  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> > >  > >  >  Sent: Thursday, April 03, 2008 3:26 PM
> > >  > >  >  To: users@wicket.apache.org
> > >  > >  >  Subject: Re: Removing the jsessionid for SEO
> > >  > >  >
> > >  > >  >  dataview can work in a stateless mode, just use bookmarkable
> > links
> > >  > inside
> > >  > >  it
> > >  > >  >
> > >  > >  >  -igor
> > >  > >  >
> > >  > >  >
> > >  > >  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
> > <dk...@citizenhawk.com>
> > >  > >  wrote:
> > >  > >  >  > Regardless, at the very least this makes your site look
> > "weird"
> > >  > and
> > >  > >  >  >  unprofessional when google puts a jsessionid on your url.
> > There
> > >  > has
> > >  > >  got
> > >  > >  >  to
> > >  > >  >  >  be some negative effect when google visits it the second
> > time and
> > >  > the
> > >  > >  >  >  jsessionid has changed but it sees the same exact content.
> > Worst
> > >  > >  case,
> > >  > >  >  >  it'll think you're trying to trick it.
> > >  > >  >  >
> > >  > >  >  >  About those 404s, I'm finding that with the fix I provided
> I
> > >  > don't get
> > >  > >  a
> > >  > >  >  >  404, but the links refresh the page I'm already on.  IE:
> If
> > I'm
> > >  > on A,
> > >  > >  and
> > >  > >  >  a
> > >  > >  >  >  link to B is non-bookmarkable, clicking B refreshes A.
> > >  > >  >  >
> > >  > >  >  >  This issue is very disconcerting to me.  It's one of the
> > reasons
> > >  > I
> > >  > >  wish
> > >  > >  >  that
> > >  > >  >  >  DataView had an option to work in stateless mode.  Cause
> if
> > I ban
> > >  > >  cookies
> > >  > >  >  >  and Googlebot visits my home page (with a navigator on
> it),
> > it'll
> > >  > try
> > >  > >  to
> > >  > >  >  >  follow all these page links and from its perspective, they
> > all
> > >  > lead
> > >  > >  back
> > >  > >  >  to
> > >  > >  >  >  the first page.  So it's kinda a catch-22: Include the
> > jsessionid
> > >  > in
> > >  > >  the
> > >  > >  >  >  urls and get bad SEO or remove the jsessionid and get bad
> > SEO :(
> > >  > >  >  >
> > >  > >  >  >  Perhaps the answer to my prayers is a combination of the
> > >  > >  noindex/nofollow
> > >  > >  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a
> > nofollow
> > >  > on the
> > >  > >  >  home
> > >  > >  >  >  page (so googlebot doesn't try to follow the navigator
> > links) and
> > >  > use
> > >  > >  the
> > >  > >  >  >  sitemap.xml to point out the individual pages I want it to
> > index.
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >  Matej: can you go into more detail about your hybrid URL
> > >  > statement?
> > >  > >  >  Won't
> > >  > >  >  >  google index, for example, /home and /home.1 if I use it?
> > When
> > >  > it
> > >  > >  >  follows
> > >  > >  >  >  the next page, won't the url become /home.1.2 or
> something?
> > That
> > >  > .2
> > >  > >  is a
> > >  > >  >  >  page version: If google indexes that and tries to visit it
> > again,
> > >  > >  won't
> > >  > >  >  it
> > >  > >  >  >  report about an invalid session?
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >  -----Original Message-----
> > >  > >  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
> > >  > >  >  >  Sent: Thursday, April 03, 2008 11:10 AM
> > >  > >  >  >  To: users@wicket.apache.org
> > >  > >  >  >  Subject: Re: Removing the jsessionid for SEO
> > >  > >  >  >
> > >  > >  >  >  On the other hand, crawling non-bookmarkable pages is not
> > very
> > >  > useful
> > >  > >  >  >  anyway, since ?wicket:interface url will always get page
> > expired
> > >  > when
> > >  > >  >  >  you click on the result.
> > >  > >  >  >
> > >  > >  >  >  However, preserving session makes lot of sense with hybrid
> > url.
> > >  > Google
> > >  > >  >  >  remembers the original url (without page instance) while
> > indexing
> > >  > the
> > >  > >  >  >  real page (after redirect).
> > >  > >  >  >
> > >  > >  >  >  I think though that the crawler is quite advanced. I'm
> would
> > >  > think  it
> > >  > >  >  >  supports cookies (at least JSESSIONID) as well as it
> > evaluates
> > >  > some of
> > >  > >  >  >  the javascript on page.
> > >  > >  >  >
> > >  > >  >  >  -Matej
> > >  > >  >  >
> > >  > >  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
> > >  > >  <ig...@gmail.com>
> > >  > >  >  >  wrote:
> > >  > >  >  >  > right. if you strip sessionid then all your
> > nonbookmarkable
> > >  > urls
> > >  > >  will
> > >  > >  >  >  >  resolve to a 404. that will probably drop your rank a
> lot
> > >  > >  faster....
> > >  > >  >  >  >
> > >  > >  >  >  >  -igor
> > >  > >  >  >  >
> > >  > >  >  >  >
> > >  > >  >  >  >
> > >  > >  >  >  >
> > >  > >  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
> > >  > >  <jc...@gmail.com>
> > >  > >  >  >  wrote:
> > >  > >  >  >  >  > the problem is that then you have to have all
> stateless
> > >  > pages.
> > >  > >  Else
> > >  > >  >  >  google
> > >  > >  >  >  >  >  can't crawl your website.
> > >  > >  >  >  >  >  And if that is the case then you could be completely
> > >  > stateless
> > >  > >  so
> > >  > >  >  you
> > >  > >  >  >  dont
> > >  > >  >  >  >  >  have a session (id) to worry about at all.
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >  johan
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry
> <
> > >  > >  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >  > When Google asks to not have special treatment for
> > their
> > >  > bot,
> > >  > >  >  they
> > >  > >  >  >  are
> > >  > >  >  >  >  >  > referring to content more than anything. Regarding
> > the
> > >  > session
> > >  > >  id
> > >  > >  >  >  being
> > >  > >  >  >  >  >  > coded in the URL, see the Technical guidelines
> > section of
> > >  > >  >  Google's
> > >  > >  >  >  >  >  > Webmaster Guidelines -
> > >  > >  >  >  >  >  >
> > >  > >  >  >
> > >  > >
> > >  >
> > http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
> > >  > >  >  >  >  >  > gn
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > It specifically recommends "allow(ing) search bots
> > to
> > >  > crawl
> > >  > >  your
> > >  > >  >  >  sites
> > >  > >  >  >  >  >  > without session IDs or arguments that track their
> > path
> > >  > through
> > >  > >  >  the
> > >  > >  >  >  >  >  > site."
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > -----Original Message-----
> > >  > >  >  >  >  >  > From: Johan Compagner [mailto:jcompagner@gmail.com
> ]
> > >  > >  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
> > >  > >  >  >  >  >  > To: users@wicket.apache.org
> > >  > >  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > isnt google always saying that you shouldn't alter
> > >  > behavior of
> > >  > >  >  your
> > >  > >  >  >  site
> > >  > >  >  >  >  >  > depending of it is there bot or not?
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
> > >  > <a_...@gazeta.pl>
> > >  > >  >  >  wrote:
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > Hi!
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > igor.vaynberg wrote:
> > >  > >  >  >  >  >  > > >
> > >  > >  >  >  >  >  > > > also by doing what you have done users with
> > cookies
> > >  > >  disabled
> > >  > >  >  >  wont be
> > >  > >  >  >  >  >  > > > able to use your site...
> > >  > >  >  >  >  >  > > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > In my opinion session id is a problem. Google
> > index the
> > >  > same
> > >  > >  >  page
> > >  > >  >  >  >  >  > again
> > >  > >  >  >  >  >  > > and
> > >  > >  >  >  >  >  > > again.
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > About the users without cookies we can do like
> > this:
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >        static class Unbuffered extends
> WebResponse
> > {
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >                 private static final String[]
> > botAgents
> > >  > = {
> > >  > >  >  >  >  >  > "onetszukaj",
> > >  > >  >  >  >  >  > > "googlebot",
> > >  > >  >  >  >  >  > > "appie", "architext",
> > >  > >  >  >  >  >  > >                        "jeeves", "bjaaland",
> > "ferret",
> > >  > >  >  "gulliver",
> > >  > >  >  >  >  >  > > "harvest", "htdig",
> > >  > >  >  >  >  >  > >                        "linkwalker", "lycos_",
> > "moget",
> > >  > >  >  >  >  >  > "muscatferret",
> > >  > >  >  >  >  >  > > "myweb", "nomad",
> > >  > >  >  >  >  >  > > "scooter",
> > >  > >  >  >  >  >  > >                        "yahoo!\\sslurp\\schina",
> > >  > "slurp",
> > >  > >  >  >  "weblayers",
> > >  > >  >  >  >  >  > > "antibot", "bruinbot",
> > >  > >  >  >  >  >  > > "digout4u",
> > >  > >  >  >  >  >  > >                        "echo!", "ia_archiver",
> > >  > "jennybot",
> > >  > >  >  >  "mercator",
> > >  > >  >  >  >  >  > > "netcraft", "msnbot",
> > >  > >  >  >  >  >  > > "petersnews",
> > >  > >  >  >  >  >  > >                        "unlost_web_crawler",
> > "voila",
> > >  > >  >  "webbase",
> > >  > >  >  >  >  >  > > "webcollage", "cfetch",
> > >  > >  >  >  >  >  > > "zyborg",
> > >  > >  >  >  >  >  > >                        "wisenutbot", "robot",
> > "crawl",
> > >  > >  "spider"
> > >  > >  >  };
> > >  > >  >  >  /*
> > >  > >  >  >  >  >  > and
> > >  > >  >  >  >  >  > > so on... */
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >                public Unbuffered(final
> > >  > HttpServletResponse
> > >  > >  res)
> > >  > >  >  {
> > >  > >  >  >  >  >  > >            super(res);
> > >  > >  >  >  >  >  > >         }
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >        @Override
> > >  > >  >  >  >  >  > >        public CharSequence encodeURL(final
> > CharSequence
> > >  > url)
> > >  > >  {
> > >  > >  >  >  >  >  > >             return isAgent() ? url :
> > >  > super.encodeURL(url);
> > >  > >  >  >  >  >  > >        }
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >                private static boolean isAgent()
> {
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >                        String agent =
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  >
> > >  > >  >  >
> > >  > >
> > >  >
> > ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
> > >  > >  >  >  >  >  > tHeader("User-Agent");
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >                        for(String bot :
> botAgents)
> > {
> > >  > >  >  >  >  >  > >                                if
> > >  > >  >  >  (agent.toLowerCase().indexOf(bot) !=
> > >  > >  >  >  >  >  > -1)
> > >  > >  >  >  >  >  > > {
> > >  > >  >  >  >  >  > >                                        return
> > true;
> > >  > >  >  >  >  >  > >                                }
> > >  > >  >  >  >  >  > >                        }
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >                        return false;
> > >  > >  >  >  >  >  > >                }
> > >  > >  >  >  >  >  > >    }
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > I didn't test this code but I do similar thing
> in
> > my
> > >  > old
> > >  > >  >  >  application
> > >  > >  >  >  >  >  > in
> > >  > >  >  >  >  >  > > Spring and it works.
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > Take care,
> > >  > >  >  >  >  >  > > Artur
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > > --
> > >  > >  >  >  >  >  > > View this message in context:
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  >
> > >  > >  >  >
> > >  > >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> > >  > tp16464534p1646739
> > >  > >  >  >  >  >  >
> > >  > >  >  >
> > >  > >  >
> > >  > >  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> > >  > tp16464534p1646
> > >  > >  >  >  7396.html>
> > >  > >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >  > > > Sent from the Wicket - User mailing list archive
> at
> > >  > >  Nabble.com.
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >
>  ------------------------------------------------------------
> > -----
> > >  > ----
> > >  > >  >  >  >  >  > > To unsubscribe, e-mail: users-
> > >  > unsubscribe@wicket.apache.org
> > >  > >  >  >  >  >  > > For additional commands, e-mail:
> > >  > >  users-help@wicket.apache.org
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  > >
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > ______________
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  > The information contained in this message is
> > proprietary
> > >  > >  and/or
> > >  > >  >  >  >  >  > confidential. If you are not the
> > >  > >  >  >  >  >  > intended recipient, please: (i) delete the message
> > and
> > >  > all
> > >  > >  >  copies;
> > >  > >  >  >  (ii) do
> > >  > >  >  >  >  >  > not disclose,
> > >  > >  >  >  >  >  > distribute or use the message in any manner; and
> > (iii)
> > >  > notify
> > >  > >  the
> > >  > >  >  >  sender
> > >  > >  >  >  >  >  > immediately. In addition,
> > >  > >  >  >  >  >  > please be aware that any message addressed to our
> > domain
> > >  > is
> > >  > >  >  subject
> > >  > >  >  >  to
> > >  > >  >  >  >  >  > archiving and review by
> > >  > >  >  >  >  >  > persons other than the intended recipient. Thank
> > you.
> > >  > >  >  >  >  >  > _____________
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  >
> > >  > >  >  >
>  ------------------------------------------------------------
> > -----
> > >  > ----
> > >  > >  >  >  >  >  > To unsubscribe, e-mail: users-
> > >  > unsubscribe@wicket.apache.org
> > >  > >  >  >  >  >  > For additional commands, e-mail: users-
> > >  > help@wicket.apache.org
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >  >
> > >  > >  >  >  >  >
> > >  > >  >  >  >
> > >  > >  >  >  >
> > >  > >
>  ------------------------------------------------------------------
> > ---
> > >  > >  >  >  >  To unsubscribe, e-mail: users-
> > unsubscribe@wicket.apache.org
> > >  > >  >  >  >  For additional commands, e-mail: users-
> > help@wicket.apache.org
> > >  > >  >  >  >
> > >  > >  >  >  >
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >  --
> > >  > >  >  >  Resizable and reorderable grid components.
> > >  > >  >  >  http://www.inmethod.com
> > >  > >  >  >
> > >  > >  >  >
>  ------------------------------------------------------------
> > -----
> > >  > ----
> > >  > >  >  >  To unsubscribe, e-mail:
> users-unsubscribe@wicket.apache.org
> > >  > >  >  >  For additional commands, e-mail: users-
> > help@wicket.apache.org
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >  >
>  ------------------------------------------------------------
> > -----
> > >  > ----
> > >  > >  >  >  To unsubscribe, e-mail:
> users-unsubscribe@wicket.apache.org
> > >  > >  >  >  For additional commands, e-mail: users-
> > help@wicket.apache.org
> > >  > >  >  >
> > >  > >  >  >
> > >  > >  >
> > >  > >  >
>  ---------------------------------------------------------------
> > -----
> > >  > -
> > >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > >  > >  >  For additional commands, e-mail: users-help@wicket.apache.org
> > >  > >  >
> > >  > >  >
> > >  > >  >
>  ---------------------------------------------------------------
> > -----
> > >  > -
> > >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > >  > >  >  For additional commands, e-mail: users-help@wicket.apache.org
> > >  > >  >
> > >  > >  >
> > >  > >
> > >  > >
>  ------------------------------------------------------------------
> > ---
> > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > >  > >  For additional commands, e-mail: users-help@wicket.apache.org
> > >  > >
> > >  > >
> > >  > >
>  ------------------------------------------------------------------
> > ---
> > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > >  > >  For additional commands, e-mail: users-help@wicket.apache.org
> > >  > >
> > >  > >
> > >  >
> > >  >
> ---------------------------------------------------------------------
> > >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > >  > For additional commands, e-mail: users-help@wicket.apache.org
> > >
> > >
> > >  ---------------------------------------------------------------------
> > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > >  For additional commands, e-mail: users-help@wicket.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > For additional commands, e-mail: users-help@wicket.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

Re: Removing the jsessionid for SEO

Posted by Igor Vaynberg <ig...@gmail.com>.
On Thu, Apr 3, 2008 at 6:09 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
> Ok, at least I'm not missing anything.  I understand the benefits it's
>  providing with its stateful framework.  Developing a site with Wicket is
>  easier than with any other framework I've used.  But this statefulness,
>  which makes websites so easy to develop, seems to be counter productive to
>  SEO:

well, perhaps the differentiator here is that wicket is made for web
applications not web sites.

>  GoogleBot will follow and index stateful links.  Worst case scenario, these
>  actually become visible to google users and when they click the link it
>  takes them to an "invalid session" page.  They think, "This site is broken"
>  and move on to the next link of their search result.

yep, you need to make sure that all stateful links are behind a login
or something similar that the bot cant get passed.

>  Another approach to solving this is to block all the stateful pages in my
>  robots.txt file.  But how can I block these links in robots.txt since they
>  change per session?  Is there any way to know what the url will resolve to
>  when googlebot tries to visit my site so I can tell it to disallow:
>  /?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?

no there isnt a way, you have to use wildmasks

on the other hand it is not that difficult to develop the stateless
paging navigator, it will take a bit of work though.

-igor




>
>
>
>  > -----Original Message-----
>  > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>
>
> > Sent: Thursday, April 03, 2008 5:45 PM
>  > To: users@wicket.apache.org
>  > Subject: Re: Removing the jsessionid for SEO
>  >
>  > On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan <dk...@citizenhawk.com>
>  > wrote:
>  > > Ok I did a little preliminary research on this.  Right now
>  > PagingNavigator
>  > >  uses PagingNavigationLink's to represent its page.  This extends Link.
>  > I'm
>  > >  supposed to override PagingNavigator's newPagingNavigationLink() method
>  > to
>  > >  accomplish this (I think) but past that, this isn't very
>  > straightforward to
>  > >  me.
>  > >
>  > >  Do I need to create my own BookmarkablePagingNavigationLink?  When I
>  > do...
>  > >  what next?  I really don't know enough about bookmarkablePageLinks to
>  > do
>  > >  this.  Right now, all the magic happens inside PagingNavigationLink.
>  > Won't
>  > >  I have to move all that logic into the WebPage that I'm passing into
>  > >  BookmarkablePagingNavigationLink?  This seems like a lot of work.  Am I
>  > >  missing something critical?
>  >
>  > no, you are not missing anything. you see, when you go stateless, like
>  > what you want, then you have to recreate all the magic stuff that
>  > makes stateful links Just Work. Without state you are back to the
>  > servlet/mvc programming model: you have to encode the state that you
>  > want into the link, then on the trip back decode it, recreate
>  > something from it, and then apply that something onto the components.
>  > This is the crapwork that wicket does for you usually.
>  >
>  > -igor
>  >
>  >
>  > >
>  > >
>  > >  > -----Original Message-----
>  > >  > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>  > >
>  > >
>  > > > Sent: Thursday, April 03, 2008 3:40 PM
>  > >  > To: users@wicket.apache.org
>  > >  > Subject: Re: Removing the jsessionid for SEO
>  > >  >
>  > >  > you subclass the pagenavigator and make it use bookmarkable links
>  > >  > also. it has factory methods for all the links it uses.
>  > >  >
>  > >  > -igor
>  > >  >
>  > >  >
>  > >  > On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan <dk...@citizenhawk.com>
>  > >  > wrote:
>  > >  > > I wasn't talking about the links that are on the list (I already
>  > make
>  > >  > those
>  > >  > >  bookmarkable).  I'm talking about the links that the Navigator
>  > >  > generates.
>  > >  > >  How do I make it so page 2 is bookmarkable?
>  > >  > >
>  > >  > >
>  > >  > >  -----Original Message-----
>  > >  > >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>  > >  > >
>  > >  > >
>  > >  > > Sent: Thursday, April 03, 2008 3:30 PM
>  > >  > >  To: users@wicket.apache.org
>  > >  > >  Subject: Re: Removing the jsessionid for SEO
>  > >  > >
>  > >  > >  instead of
>  > >  > >
>  > >  > >  item.add(new link("foo") { onclick() });
>  > >  > >
>  > >  > >  do
>  > >  > >
>  > >  > >  item.add(new bookmarkablepagelink("foo", page.class));
>  > >  > >
>  > >  > >  -igor
>  > >  > >
>  > >  > >
>  > >  > >  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
>  > <dk...@citizenhawk.com>
>  > >  > wrote:
>  > >  > >  > How?  I asked how to do it before and nobody suggested this as a
>  > >  > >  >  possibility.
>  > >  > >  >
>  > >  > >  >
>  > >  > >  >
>  > >  > >  >  -----Original Message-----
>  > >  > >  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>  > >  > >  >  Sent: Thursday, April 03, 2008 3:26 PM
>  > >  > >  >  To: users@wicket.apache.org
>  > >  > >  >  Subject: Re: Removing the jsessionid for SEO
>  > >  > >  >
>  > >  > >  >  dataview can work in a stateless mode, just use bookmarkable
>  > links
>  > >  > inside
>  > >  > >  it
>  > >  > >  >
>  > >  > >  >  -igor
>  > >  > >  >
>  > >  > >  >
>  > >  > >  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
>  > <dk...@citizenhawk.com>
>  > >  > >  wrote:
>  > >  > >  >  > Regardless, at the very least this makes your site look
>  > "weird"
>  > >  > and
>  > >  > >  >  >  unprofessional when google puts a jsessionid on your url.
>  > There
>  > >  > has
>  > >  > >  got
>  > >  > >  >  to
>  > >  > >  >  >  be some negative effect when google visits it the second
>  > time and
>  > >  > the
>  > >  > >  >  >  jsessionid has changed but it sees the same exact content.
>  > Worst
>  > >  > >  case,
>  > >  > >  >  >  it'll think you're trying to trick it.
>  > >  > >  >  >
>  > >  > >  >  >  About those 404s, I'm finding that with the fix I provided I
>  > >  > don't get
>  > >  > >  a
>  > >  > >  >  >  404, but the links refresh the page I'm already on.  IE: If
>  > I'm
>  > >  > on A,
>  > >  > >  and
>  > >  > >  >  a
>  > >  > >  >  >  link to B is non-bookmarkable, clicking B refreshes A.
>  > >  > >  >  >
>  > >  > >  >  >  This issue is very disconcerting to me.  It's one of the
>  > reasons
>  > >  > I
>  > >  > >  wish
>  > >  > >  >  that
>  > >  > >  >  >  DataView had an option to work in stateless mode.  Cause if
>  > I ban
>  > >  > >  cookies
>  > >  > >  >  >  and Googlebot visits my home page (with a navigator on it),
>  > it'll
>  > >  > try
>  > >  > >  to
>  > >  > >  >  >  follow all these page links and from its perspective, they
>  > all
>  > >  > lead
>  > >  > >  back
>  > >  > >  >  to
>  > >  > >  >  >  the first page.  So it's kinda a catch-22: Include the
>  > jsessionid
>  > >  > in
>  > >  > >  the
>  > >  > >  >  >  urls and get bad SEO or remove the jsessionid and get bad
>  > SEO :(
>  > >  > >  >  >
>  > >  > >  >  >  Perhaps the answer to my prayers is a combination of the
>  > >  > >  noindex/nofollow
>  > >  > >  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a
>  > nofollow
>  > >  > on the
>  > >  > >  >  home
>  > >  > >  >  >  page (so googlebot doesn't try to follow the navigator
>  > links) and
>  > >  > use
>  > >  > >  the
>  > >  > >  >  >  sitemap.xml to point out the individual pages I want it to
>  > index.
>  > >  > >  >  >
>  > >  > >  >  >
>  > >  > >  >  >  Matej: can you go into more detail about your hybrid URL
>  > >  > statement?
>  > >  > >  >  Won't
>  > >  > >  >  >  google index, for example, /home and /home.1 if I use it?
>  > When
>  > >  > it
>  > >  > >  >  follows
>  > >  > >  >  >  the next page, won't the url become /home.1.2 or something?
>  > That
>  > >  > .2
>  > >  > >  is a
>  > >  > >  >  >  page version: If google indexes that and tries to visit it
>  > again,
>  > >  > >  won't
>  > >  > >  >  it
>  > >  > >  >  >  report about an invalid session?
>  > >  > >  >  >
>  > >  > >  >  >
>  > >  > >  >  >
>  > >  > >  >  >  -----Original Message-----
>  > >  > >  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
>  > >  > >  >  >  Sent: Thursday, April 03, 2008 11:10 AM
>  > >  > >  >  >  To: users@wicket.apache.org
>  > >  > >  >  >  Subject: Re: Removing the jsessionid for SEO
>  > >  > >  >  >
>  > >  > >  >  >  On the other hand, crawling non-bookmarkable pages is not
>  > very
>  > >  > useful
>  > >  > >  >  >  anyway, since ?wicket:interface url will always get page
>  > expired
>  > >  > when
>  > >  > >  >  >  you click on the result.
>  > >  > >  >  >
>  > >  > >  >  >  However, preserving session makes lot of sense with hybrid
>  > url.
>  > >  > Google
>  > >  > >  >  >  remembers the original url (without page instance) while
>  > indexing
>  > >  > the
>  > >  > >  >  >  real page (after redirect).
>  > >  > >  >  >
>  > >  > >  >  >  I think though that the crawler is quite advanced. I'm would
>  > >  > think  it
>  > >  > >  >  >  supports cookies (at least JSESSIONID) as well as it
>  > evaluates
>  > >  > some of
>  > >  > >  >  >  the javascript on page.
>  > >  > >  >  >
>  > >  > >  >  >  -Matej
>  > >  > >  >  >
>  > >  > >  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
>  > >  > >  <ig...@gmail.com>
>  > >  > >  >  >  wrote:
>  > >  > >  >  >  > right. if you strip sessionid then all your
>  > nonbookmarkable
>  > >  > urls
>  > >  > >  will
>  > >  > >  >  >  >  resolve to a 404. that will probably drop your rank a lot
>  > >  > >  faster....
>  > >  > >  >  >  >
>  > >  > >  >  >  >  -igor
>  > >  > >  >  >  >
>  > >  > >  >  >  >
>  > >  > >  >  >  >
>  > >  > >  >  >  >
>  > >  > >  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
>  > >  > >  <jc...@gmail.com>
>  > >  > >  >  >  wrote:
>  > >  > >  >  >  >  > the problem is that then you have to have all stateless
>  > >  > pages.
>  > >  > >  Else
>  > >  > >  >  >  google
>  > >  > >  >  >  >  >  can't crawl your website.
>  > >  > >  >  >  >  >  And if that is the case then you could be completely
>  > >  > stateless
>  > >  > >  so
>  > >  > >  >  you
>  > >  > >  >  >  dont
>  > >  > >  >  >  >  >  have a session (id) to worry about at all.
>  > >  > >  >  >  >  >
>  > >  > >  >  >  >  >  johan
>  > >  > >  >  >  >  >
>  > >  > >  >  >  >  >
>  > >  > >  >  >  >  >
>  > >  > >  >  >  >  >
>  > >  > >  >  >  >  >
>  > >  > >  >  >  >  >
>  > >  > >  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
>  > >  > >  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
>  > >  > >  >  >  >  >
>  > >  > >  >  >  >  >  > When Google asks to not have special treatment for
>  > their
>  > >  > bot,
>  > >  > >  >  they
>  > >  > >  >  >  are
>  > >  > >  >  >  >  >  > referring to content more than anything. Regarding
>  > the
>  > >  > session
>  > >  > >  id
>  > >  > >  >  >  being
>  > >  > >  >  >  >  >  > coded in the URL, see the Technical guidelines
>  > section of
>  > >  > >  >  Google's
>  > >  > >  >  >  >  >  > Webmaster Guidelines -
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >
>  > >  > >
>  > >  >
>  > http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>  > >  > >  >  >  >  >  > gn
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >  >  >  > It specifically recommends "allow(ing) search bots
>  > to
>  > >  > crawl
>  > >  > >  your
>  > >  > >  >  >  sites
>  > >  > >  >  >  >  >  > without session IDs or arguments that track their
>  > path
>  > >  > through
>  > >  > >  >  the
>  > >  > >  >  >  >  >  > site."
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >  >  >  > -----Original Message-----
>  > >  > >  >  >  >  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
>  > >  > >  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
>  > >  > >  >  >  >  >  > To: users@wicket.apache.org
>  > >  > >  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >  >  >  > isnt google always saying that you shouldn't alter
>  > >  > behavior of
>  > >  > >  >  your
>  > >  > >  >  >  site
>  > >  > >  >  >  >  >  > depending of it is there bot or not?
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
>  > >  > <a_...@gazeta.pl>
>  > >  > >  >  >  wrote:
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > > Hi!
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > > igor.vaynberg wrote:
>  > >  > >  >  >  >  >  > > >
>  > >  > >  >  >  >  >  > > > also by doing what you have done users with
>  > cookies
>  > >  > >  disabled
>  > >  > >  >  >  wont be
>  > >  > >  >  >  >  >  > > > able to use your site...
>  > >  > >  >  >  >  >  > > >
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > > In my opinion session id is a problem. Google
>  > index the
>  > >  > same
>  > >  > >  >  page
>  > >  > >  >  >  >  >  > again
>  > >  > >  >  >  >  >  > > and
>  > >  > >  >  >  >  >  > > again.
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > > About the users without cookies we can do like
>  > this:
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >        static class Unbuffered extends WebResponse
>  > {
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >                 private static final String[]
>  > botAgents
>  > >  > = {
>  > >  > >  >  >  >  >  > "onetszukaj",
>  > >  > >  >  >  >  >  > > "googlebot",
>  > >  > >  >  >  >  >  > > "appie", "architext",
>  > >  > >  >  >  >  >  > >                        "jeeves", "bjaaland",
>  > "ferret",
>  > >  > >  >  "gulliver",
>  > >  > >  >  >  >  >  > > "harvest", "htdig",
>  > >  > >  >  >  >  >  > >                        "linkwalker", "lycos_",
>  > "moget",
>  > >  > >  >  >  >  >  > "muscatferret",
>  > >  > >  >  >  >  >  > > "myweb", "nomad",
>  > >  > >  >  >  >  >  > > "scooter",
>  > >  > >  >  >  >  >  > >                        "yahoo!\\sslurp\\schina",
>  > >  > "slurp",
>  > >  > >  >  >  "weblayers",
>  > >  > >  >  >  >  >  > > "antibot", "bruinbot",
>  > >  > >  >  >  >  >  > > "digout4u",
>  > >  > >  >  >  >  >  > >                        "echo!", "ia_archiver",
>  > >  > "jennybot",
>  > >  > >  >  >  "mercator",
>  > >  > >  >  >  >  >  > > "netcraft", "msnbot",
>  > >  > >  >  >  >  >  > > "petersnews",
>  > >  > >  >  >  >  >  > >                        "unlost_web_crawler",
>  > "voila",
>  > >  > >  >  "webbase",
>  > >  > >  >  >  >  >  > > "webcollage", "cfetch",
>  > >  > >  >  >  >  >  > > "zyborg",
>  > >  > >  >  >  >  >  > >                        "wisenutbot", "robot",
>  > "crawl",
>  > >  > >  "spider"
>  > >  > >  >  };
>  > >  > >  >  >  /*
>  > >  > >  >  >  >  >  > and
>  > >  > >  >  >  >  >  > > so on... */
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >                public Unbuffered(final
>  > >  > HttpServletResponse
>  > >  > >  res)
>  > >  > >  >  {
>  > >  > >  >  >  >  >  > >            super(res);
>  > >  > >  >  >  >  >  > >         }
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >        @Override
>  > >  > >  >  >  >  >  > >        public CharSequence encodeURL(final
>  > CharSequence
>  > >  > url)
>  > >  > >  {
>  > >  > >  >  >  >  >  > >             return isAgent() ? url :
>  > >  > super.encodeURL(url);
>  > >  > >  >  >  >  >  > >        }
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >                private static boolean isAgent() {
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >                        String agent =
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >
>  > >  > >
>  > >  >
>  > ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>  > >  > >  >  >  >  >  > tHeader("User-Agent");
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >                        for(String bot : botAgents)
>  > {
>  > >  > >  >  >  >  >  > >                                if
>  > >  > >  >  >  (agent.toLowerCase().indexOf(bot) !=
>  > >  > >  >  >  >  >  > -1)
>  > >  > >  >  >  >  >  > > {
>  > >  > >  >  >  >  >  > >                                        return
>  > true;
>  > >  > >  >  >  >  >  > >                                }
>  > >  > >  >  >  >  >  > >                        }
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >                        return false;
>  > >  > >  >  >  >  >  > >                }
>  > >  > >  >  >  >  >  > >    }
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > > I didn't test this code but I do similar thing in
>  > my
>  > >  > old
>  > >  > >  >  >  application
>  > >  > >  >  >  >  >  > in
>  > >  > >  >  >  >  >  > > Spring and it works.
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > > Take care,
>  > >  > >  >  >  >  >  > > Artur
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > > --
>  > >  > >  >  >  >  >  > > View this message in context:
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >
>  > >  > >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-
>  > >  > tp16464534p1646739
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >
>  > >  > >  >
>  > >  > >  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-
>  > >  > tp16464534p1646
>  > >  > >  >  >  7396.html>
>  > >  > >  >  >  >  >
>  > >  > >  >  >  >  >
>  > >  > >  >  >  >  > > > Sent from the Wicket - User mailing list archive at
>  > >  > >  Nabble.com.
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  ------------------------------------------------------------
>  > -----
>  > >  > ----
>  > >  > >  >  >  >  >  > > To unsubscribe, e-mail: users-
>  > >  > unsubscribe@wicket.apache.org
>  > >  > >  >  >  >  >  > > For additional commands, e-mail:
>  > >  > >  users-help@wicket.apache.org
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  > >
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >  >  >  > ______________
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >  >  >  > The information contained in this message is
>  > proprietary
>  > >  > >  and/or
>  > >  > >  >  >  >  >  > confidential. If you are not the
>  > >  > >  >  >  >  >  > intended recipient, please: (i) delete the message
>  > and
>  > >  > all
>  > >  > >  >  copies;
>  > >  > >  >  >  (ii) do
>  > >  > >  >  >  >  >  > not disclose,
>  > >  > >  >  >  >  >  > distribute or use the message in any manner; and
>  > (iii)
>  > >  > notify
>  > >  > >  the
>  > >  > >  >  >  sender
>  > >  > >  >  >  >  >  > immediately. In addition,
>  > >  > >  >  >  >  >  > please be aware that any message addressed to our
>  > domain
>  > >  > is
>  > >  > >  >  subject
>  > >  > >  >  >  to
>  > >  > >  >  >  >  >  > archiving and review by
>  > >  > >  >  >  >  >  > persons other than the intended recipient. Thank
>  > you.
>  > >  > >  >  >  >  >  > _____________
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >  ------------------------------------------------------------
>  > -----
>  > >  > ----
>  > >  > >  >  >  >  >  > To unsubscribe, e-mail: users-
>  > >  > unsubscribe@wicket.apache.org
>  > >  > >  >  >  >  >  > For additional commands, e-mail: users-
>  > >  > help@wicket.apache.org
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >  >  >  >
>  > >  > >  >  >  >  >
>  > >  > >  >  >  >
>  > >  > >  >  >  >
>  > >  > >  ------------------------------------------------------------------
>  > ---
>  > >  > >  >  >  >  To unsubscribe, e-mail: users-
>  > unsubscribe@wicket.apache.org
>  > >  > >  >  >  >  For additional commands, e-mail: users-
>  > help@wicket.apache.org
>  > >  > >  >  >  >
>  > >  > >  >  >  >
>  > >  > >  >  >
>  > >  > >  >  >
>  > >  > >  >  >
>  > >  > >  >  >  --
>  > >  > >  >  >  Resizable and reorderable grid components.
>  > >  > >  >  >  http://www.inmethod.com
>  > >  > >  >  >
>  > >  > >  >  >  ------------------------------------------------------------
>  > -----
>  > >  > ----
>  > >  > >  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  > >  >  >  For additional commands, e-mail: users-
>  > help@wicket.apache.org
>  > >  > >  >  >
>  > >  > >  >  >
>  > >  > >  >  >  ------------------------------------------------------------
>  > -----
>  > >  > ----
>  > >  > >  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  > >  >  >  For additional commands, e-mail: users-
>  > help@wicket.apache.org
>  > >  > >  >  >
>  > >  > >  >  >
>  > >  > >  >
>  > >  > >  >  ---------------------------------------------------------------
>  > -----
>  > >  > -
>  > >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  > >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  > >  > >  >
>  > >  > >  >
>  > >  > >  >  ---------------------------------------------------------------
>  > -----
>  > >  > -
>  > >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  > >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  > >  > >  >
>  > >  > >  >
>  > >  > >
>  > >  > >  ------------------------------------------------------------------
>  > ---
>  > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  > >  For additional commands, e-mail: users-help@wicket.apache.org
>  > >  > >
>  > >  > >
>  > >  > >  ------------------------------------------------------------------
>  > ---
>  > >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  > >  For additional commands, e-mail: users-help@wicket.apache.org
>  > >  > >
>  > >  > >
>  > >  >
>  > >  > ---------------------------------------------------------------------
>  > >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  > For additional commands, e-mail: users-help@wicket.apache.org
>  > >
>  > >
>  > >  ---------------------------------------------------------------------
>  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  For additional commands, e-mail: users-help@wicket.apache.org
>  > >
>  > >
>  >
>  > ---------------------------------------------------------------------
>
>
> > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > For additional commands, e-mail: users-help@wicket.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


RE: Removing the jsessionid for SEO

Posted by Dan Kaplan <dk...@citizenhawk.com>.
Ok, at least I'm not missing anything.  I understand the benefits it's
providing with its stateful framework.  Developing a site with Wicket is
easier than with any other framework I've used.  But this statefulness,
which makes websites so easy to develop, seems to be counter productive to
SEO:  

GoogleBot will follow and index stateful links.  Worst case scenario, these
actually become visible to google users and when they click the link it
takes them to an "invalid session" page.  They think, "This site is broken"
and move on to the next link of their search result.  

Another approach to solving this is to block all the stateful pages in my
robots.txt file.  But how can I block these links in robots.txt since they
change per session?  Is there any way to know what the url will resolve to
when googlebot tries to visit my site so I can tell it to disallow:
/?wicket:interface=:10:1::: and ?wicket:interface=:0:1::: and ...?  


> -----Original Message-----
> From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> Sent: Thursday, April 03, 2008 5:45 PM
> To: users@wicket.apache.org
> Subject: Re: Removing the jsessionid for SEO
> 
> On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan <dk...@citizenhawk.com>
> wrote:
> > Ok I did a little preliminary research on this.  Right now
> PagingNavigator
> >  uses PagingNavigationLink's to represent its page.  This extends Link.
> I'm
> >  supposed to override PagingNavigator's newPagingNavigationLink() method
> to
> >  accomplish this (I think) but past that, this isn't very
> straightforward to
> >  me.
> >
> >  Do I need to create my own BookmarkablePagingNavigationLink?  When I
> do...
> >  what next?  I really don't know enough about bookmarkablePageLinks to
> do
> >  this.  Right now, all the magic happens inside PagingNavigationLink.
> Won't
> >  I have to move all that logic into the WebPage that I'm passing into
> >  BookmarkablePagingNavigationLink?  This seems like a lot of work.  Am I
> >  missing something critical?
> 
> no, you are not missing anything. you see, when you go stateless, like
> what you want, then you have to recreate all the magic stuff that
> makes stateful links Just Work. Without state you are back to the
> servlet/mvc programming model: you have to encode the state that you
> want into the link, then on the trip back decode it, recreate
> something from it, and then apply that something onto the components.
> This is the crapwork that wicket does for you usually.
> 
> -igor
> 
> 
> >
> >
> >  > -----Original Message-----
> >  > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> >
> >
> > > Sent: Thursday, April 03, 2008 3:40 PM
> >  > To: users@wicket.apache.org
> >  > Subject: Re: Removing the jsessionid for SEO
> >  >
> >  > you subclass the pagenavigator and make it use bookmarkable links
> >  > also. it has factory methods for all the links it uses.
> >  >
> >  > -igor
> >  >
> >  >
> >  > On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan <dk...@citizenhawk.com>
> >  > wrote:
> >  > > I wasn't talking about the links that are on the list (I already
> make
> >  > those
> >  > >  bookmarkable).  I'm talking about the links that the Navigator
> >  > generates.
> >  > >  How do I make it so page 2 is bookmarkable?
> >  > >
> >  > >
> >  > >  -----Original Message-----
> >  > >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> >  > >
> >  > >
> >  > > Sent: Thursday, April 03, 2008 3:30 PM
> >  > >  To: users@wicket.apache.org
> >  > >  Subject: Re: Removing the jsessionid for SEO
> >  > >
> >  > >  instead of
> >  > >
> >  > >  item.add(new link("foo") { onclick() });
> >  > >
> >  > >  do
> >  > >
> >  > >  item.add(new bookmarkablepagelink("foo", page.class));
> >  > >
> >  > >  -igor
> >  > >
> >  > >
> >  > >  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan
> <dk...@citizenhawk.com>
> >  > wrote:
> >  > >  > How?  I asked how to do it before and nobody suggested this as a
> >  > >  >  possibility.
> >  > >  >
> >  > >  >
> >  > >  >
> >  > >  >  -----Original Message-----
> >  > >  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> >  > >  >  Sent: Thursday, April 03, 2008 3:26 PM
> >  > >  >  To: users@wicket.apache.org
> >  > >  >  Subject: Re: Removing the jsessionid for SEO
> >  > >  >
> >  > >  >  dataview can work in a stateless mode, just use bookmarkable
> links
> >  > inside
> >  > >  it
> >  > >  >
> >  > >  >  -igor
> >  > >  >
> >  > >  >
> >  > >  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan
> <dk...@citizenhawk.com>
> >  > >  wrote:
> >  > >  >  > Regardless, at the very least this makes your site look
> "weird"
> >  > and
> >  > >  >  >  unprofessional when google puts a jsessionid on your url.
> There
> >  > has
> >  > >  got
> >  > >  >  to
> >  > >  >  >  be some negative effect when google visits it the second
> time and
> >  > the
> >  > >  >  >  jsessionid has changed but it sees the same exact content.
> Worst
> >  > >  case,
> >  > >  >  >  it'll think you're trying to trick it.
> >  > >  >  >
> >  > >  >  >  About those 404s, I'm finding that with the fix I provided I
> >  > don't get
> >  > >  a
> >  > >  >  >  404, but the links refresh the page I'm already on.  IE: If
> I'm
> >  > on A,
> >  > >  and
> >  > >  >  a
> >  > >  >  >  link to B is non-bookmarkable, clicking B refreshes A.
> >  > >  >  >
> >  > >  >  >  This issue is very disconcerting to me.  It's one of the
> reasons
> >  > I
> >  > >  wish
> >  > >  >  that
> >  > >  >  >  DataView had an option to work in stateless mode.  Cause if
> I ban
> >  > >  cookies
> >  > >  >  >  and Googlebot visits my home page (with a navigator on it),
> it'll
> >  > try
> >  > >  to
> >  > >  >  >  follow all these page links and from its perspective, they
> all
> >  > lead
> >  > >  back
> >  > >  >  to
> >  > >  >  >  the first page.  So it's kinda a catch-22: Include the
> jsessionid
> >  > in
> >  > >  the
> >  > >  >  >  urls and get bad SEO or remove the jsessionid and get bad
> SEO :(
> >  > >  >  >
> >  > >  >  >  Perhaps the answer to my prayers is a combination of the
> >  > >  noindex/nofollow
> >  > >  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a
> nofollow
> >  > on the
> >  > >  >  home
> >  > >  >  >  page (so googlebot doesn't try to follow the navigator
> links) and
> >  > use
> >  > >  the
> >  > >  >  >  sitemap.xml to point out the individual pages I want it to
> index.
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >  Matej: can you go into more detail about your hybrid URL
> >  > statement?
> >  > >  >  Won't
> >  > >  >  >  google index, for example, /home and /home.1 if I use it?
> When
> >  > it
> >  > >  >  follows
> >  > >  >  >  the next page, won't the url become /home.1.2 or something?
> That
> >  > .2
> >  > >  is a
> >  > >  >  >  page version: If google indexes that and tries to visit it
> again,
> >  > >  won't
> >  > >  >  it
> >  > >  >  >  report about an invalid session?
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >  -----Original Message-----
> >  > >  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
> >  > >  >  >  Sent: Thursday, April 03, 2008 11:10 AM
> >  > >  >  >  To: users@wicket.apache.org
> >  > >  >  >  Subject: Re: Removing the jsessionid for SEO
> >  > >  >  >
> >  > >  >  >  On the other hand, crawling non-bookmarkable pages is not
> very
> >  > useful
> >  > >  >  >  anyway, since ?wicket:interface url will always get page
> expired
> >  > when
> >  > >  >  >  you click on the result.
> >  > >  >  >
> >  > >  >  >  However, preserving session makes lot of sense with hybrid
> url.
> >  > Google
> >  > >  >  >  remembers the original url (without page instance) while
> indexing
> >  > the
> >  > >  >  >  real page (after redirect).
> >  > >  >  >
> >  > >  >  >  I think though that the crawler is quite advanced. I'm would
> >  > think  it
> >  > >  >  >  supports cookies (at least JSESSIONID) as well as it
> evaluates
> >  > some of
> >  > >  >  >  the javascript on page.
> >  > >  >  >
> >  > >  >  >  -Matej
> >  > >  >  >
> >  > >  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
> >  > >  <ig...@gmail.com>
> >  > >  >  >  wrote:
> >  > >  >  >  > right. if you strip sessionid then all your
> nonbookmarkable
> >  > urls
> >  > >  will
> >  > >  >  >  >  resolve to a 404. that will probably drop your rank a lot
> >  > >  faster....
> >  > >  >  >  >
> >  > >  >  >  >  -igor
> >  > >  >  >  >
> >  > >  >  >  >
> >  > >  >  >  >
> >  > >  >  >  >
> >  > >  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
> >  > >  <jc...@gmail.com>
> >  > >  >  >  wrote:
> >  > >  >  >  >  > the problem is that then you have to have all stateless
> >  > pages.
> >  > >  Else
> >  > >  >  >  google
> >  > >  >  >  >  >  can't crawl your website.
> >  > >  >  >  >  >  And if that is the case then you could be completely
> >  > stateless
> >  > >  so
> >  > >  >  you
> >  > >  >  >  dont
> >  > >  >  >  >  >  have a session (id) to worry about at all.
> >  > >  >  >  >  >
> >  > >  >  >  >  >  johan
> >  > >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
> >  > >  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
> >  > >  >  >  >  >
> >  > >  >  >  >  >  > When Google asks to not have special treatment for
> their
> >  > bot,
> >  > >  >  they
> >  > >  >  >  are
> >  > >  >  >  >  >  > referring to content more than anything. Regarding
> the
> >  > session
> >  > >  id
> >  > >  >  >  being
> >  > >  >  >  >  >  > coded in the URL, see the Technical guidelines
> section of
> >  > >  >  Google's
> >  > >  >  >  >  >  > Webmaster Guidelines -
> >  > >  >  >  >  >  >
> >  > >  >  >
> >  > >
> >  >
> http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
> >  > >  >  >  >  >  > gn
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > It specifically recommends "allow(ing) search bots
> to
> >  > crawl
> >  > >  your
> >  > >  >  >  sites
> >  > >  >  >  >  >  > without session IDs or arguments that track their
> path
> >  > through
> >  > >  >  the
> >  > >  >  >  >  >  > site."
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > -----Original Message-----
> >  > >  >  >  >  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
> >  > >  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
> >  > >  >  >  >  >  > To: users@wicket.apache.org
> >  > >  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > isnt google always saying that you shouldn't alter
> >  > behavior of
> >  > >  >  your
> >  > >  >  >  site
> >  > >  >  >  >  >  > depending of it is there bot or not?
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
> >  > <a_...@gazeta.pl>
> >  > >  >  >  wrote:
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > Hi!
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > igor.vaynberg wrote:
> >  > >  >  >  >  >  > > >
> >  > >  >  >  >  >  > > > also by doing what you have done users with
> cookies
> >  > >  disabled
> >  > >  >  >  wont be
> >  > >  >  >  >  >  > > > able to use your site...
> >  > >  >  >  >  >  > > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > In my opinion session id is a problem. Google
> index the
> >  > same
> >  > >  >  page
> >  > >  >  >  >  >  > again
> >  > >  >  >  >  >  > > and
> >  > >  >  >  >  >  > > again.
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > About the users without cookies we can do like
> this:
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >        static class Unbuffered extends WebResponse
> {
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >                 private static final String[]
> botAgents
> >  > = {
> >  > >  >  >  >  >  > "onetszukaj",
> >  > >  >  >  >  >  > > "googlebot",
> >  > >  >  >  >  >  > > "appie", "architext",
> >  > >  >  >  >  >  > >                        "jeeves", "bjaaland",
> "ferret",
> >  > >  >  "gulliver",
> >  > >  >  >  >  >  > > "harvest", "htdig",
> >  > >  >  >  >  >  > >                        "linkwalker", "lycos_",
> "moget",
> >  > >  >  >  >  >  > "muscatferret",
> >  > >  >  >  >  >  > > "myweb", "nomad",
> >  > >  >  >  >  >  > > "scooter",
> >  > >  >  >  >  >  > >                        "yahoo!\\sslurp\\schina",
> >  > "slurp",
> >  > >  >  >  "weblayers",
> >  > >  >  >  >  >  > > "antibot", "bruinbot",
> >  > >  >  >  >  >  > > "digout4u",
> >  > >  >  >  >  >  > >                        "echo!", "ia_archiver",
> >  > "jennybot",
> >  > >  >  >  "mercator",
> >  > >  >  >  >  >  > > "netcraft", "msnbot",
> >  > >  >  >  >  >  > > "petersnews",
> >  > >  >  >  >  >  > >                        "unlost_web_crawler",
> "voila",
> >  > >  >  "webbase",
> >  > >  >  >  >  >  > > "webcollage", "cfetch",
> >  > >  >  >  >  >  > > "zyborg",
> >  > >  >  >  >  >  > >                        "wisenutbot", "robot",
> "crawl",
> >  > >  "spider"
> >  > >  >  };
> >  > >  >  >  /*
> >  > >  >  >  >  >  > and
> >  > >  >  >  >  >  > > so on... */
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >                public Unbuffered(final
> >  > HttpServletResponse
> >  > >  res)
> >  > >  >  {
> >  > >  >  >  >  >  > >            super(res);
> >  > >  >  >  >  >  > >         }
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >        @Override
> >  > >  >  >  >  >  > >        public CharSequence encodeURL(final
> CharSequence
> >  > url)
> >  > >  {
> >  > >  >  >  >  >  > >             return isAgent() ? url :
> >  > super.encodeURL(url);
> >  > >  >  >  >  >  > >        }
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >                private static boolean isAgent() {
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >                        String agent =
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  >
> >  > >  >  >
> >  > >
> >  >
> ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
> >  > >  >  >  >  >  > tHeader("User-Agent");
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >                        for(String bot : botAgents)
> {
> >  > >  >  >  >  >  > >                                if
> >  > >  >  >  (agent.toLowerCase().indexOf(bot) !=
> >  > >  >  >  >  >  > -1)
> >  > >  >  >  >  >  > > {
> >  > >  >  >  >  >  > >                                        return
> true;
> >  > >  >  >  >  >  > >                                }
> >  > >  >  >  >  >  > >                        }
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >                        return false;
> >  > >  >  >  >  >  > >                }
> >  > >  >  >  >  >  > >    }
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > I didn't test this code but I do similar thing in
> my
> >  > old
> >  > >  >  >  application
> >  > >  >  >  >  >  > in
> >  > >  >  >  >  >  > > Spring and it works.
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > Take care,
> >  > >  >  >  >  >  > > Artur
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > > --
> >  > >  >  >  >  >  > > View this message in context:
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  >
> >  > >  >  >
> >  > >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> >  > tp16464534p1646739
> >  > >  >  >  >  >  >
> >  > >  >  >
> >  > >  >
> >  > >  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> >  > tp16464534p1646
> >  > >  >  >  7396.html>
> >  > >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >  > > > Sent from the Wicket - User mailing list archive at
> >  > >  Nabble.com.
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  ------------------------------------------------------------
> -----
> >  > ----
> >  > >  >  >  >  >  > > To unsubscribe, e-mail: users-
> >  > unsubscribe@wicket.apache.org
> >  > >  >  >  >  >  > > For additional commands, e-mail:
> >  > >  users-help@wicket.apache.org
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  > >
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > ______________
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  > The information contained in this message is
> proprietary
> >  > >  and/or
> >  > >  >  >  >  >  > confidential. If you are not the
> >  > >  >  >  >  >  > intended recipient, please: (i) delete the message
> and
> >  > all
> >  > >  >  copies;
> >  > >  >  >  (ii) do
> >  > >  >  >  >  >  > not disclose,
> >  > >  >  >  >  >  > distribute or use the message in any manner; and
> (iii)
> >  > notify
> >  > >  the
> >  > >  >  >  sender
> >  > >  >  >  >  >  > immediately. In addition,
> >  > >  >  >  >  >  > please be aware that any message addressed to our
> domain
> >  > is
> >  > >  >  subject
> >  > >  >  >  to
> >  > >  >  >  >  >  > archiving and review by
> >  > >  >  >  >  >  > persons other than the intended recipient. Thank
> you.
> >  > >  >  >  >  >  > _____________
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  >
> >  > >  >  >  ------------------------------------------------------------
> -----
> >  > ----
> >  > >  >  >  >  >  > To unsubscribe, e-mail: users-
> >  > unsubscribe@wicket.apache.org
> >  > >  >  >  >  >  > For additional commands, e-mail: users-
> >  > help@wicket.apache.org
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >  >
> >  > >  >  >  >  >
> >  > >  >  >  >
> >  > >  >  >  >
> >  > >  ------------------------------------------------------------------
> ---
> >  > >  >  >  >  To unsubscribe, e-mail: users-
> unsubscribe@wicket.apache.org
> >  > >  >  >  >  For additional commands, e-mail: users-
> help@wicket.apache.org
> >  > >  >  >  >
> >  > >  >  >  >
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >  --
> >  > >  >  >  Resizable and reorderable grid components.
> >  > >  >  >  http://www.inmethod.com
> >  > >  >  >
> >  > >  >  >  ------------------------------------------------------------
> -----
> >  > ----
> >  > >  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  > >  >  >  For additional commands, e-mail: users-
> help@wicket.apache.org
> >  > >  >  >
> >  > >  >  >
> >  > >  >  >  ------------------------------------------------------------
> -----
> >  > ----
> >  > >  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  > >  >  >  For additional commands, e-mail: users-
> help@wicket.apache.org
> >  > >  >  >
> >  > >  >  >
> >  > >  >
> >  > >  >  ---------------------------------------------------------------
> -----
> >  > -
> >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  > >  >  For additional commands, e-mail: users-help@wicket.apache.org
> >  > >  >
> >  > >  >
> >  > >  >  ---------------------------------------------------------------
> -----
> >  > -
> >  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  > >  >  For additional commands, e-mail: users-help@wicket.apache.org
> >  > >  >
> >  > >  >
> >  > >
> >  > >  ------------------------------------------------------------------
> ---
> >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  > >  For additional commands, e-mail: users-help@wicket.apache.org
> >  > >
> >  > >
> >  > >  ------------------------------------------------------------------
> ---
> >  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  > >  For additional commands, e-mail: users-help@wicket.apache.org
> >  > >
> >  > >
> >  >
> >  > ---------------------------------------------------------------------
> >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  > For additional commands, e-mail: users-help@wicket.apache.org
> >
> >
> >  ---------------------------------------------------------------------
> >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  For additional commands, e-mail: users-help@wicket.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Igor Vaynberg <ig...@gmail.com>.
On Thu, Apr 3, 2008 at 5:31 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
> Ok I did a little preliminary research on this.  Right now PagingNavigator
>  uses PagingNavigationLink's to represent its page.  This extends Link.  I'm
>  supposed to override PagingNavigator's newPagingNavigationLink() method to
>  accomplish this (I think) but past that, this isn't very straightforward to
>  me.
>
>  Do I need to create my own BookmarkablePagingNavigationLink?  When I do...
>  what next?  I really don't know enough about bookmarkablePageLinks to do
>  this.  Right now, all the magic happens inside PagingNavigationLink.  Won't
>  I have to move all that logic into the WebPage that I'm passing into
>  BookmarkablePagingNavigationLink?  This seems like a lot of work.  Am I
>  missing something critical?

no, you are not missing anything. you see, when you go stateless, like
what you want, then you have to recreate all the magic stuff that
makes stateful links Just Work. Without state you are back to the
servlet/mvc programming model: you have to encode the state that you
want into the link, then on the trip back decode it, recreate
something from it, and then apply that something onto the components.
This is the crapwork that wicket does for you usually.

-igor


>
>
>  > -----Original Message-----
>  > From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>
>
> > Sent: Thursday, April 03, 2008 3:40 PM
>  > To: users@wicket.apache.org
>  > Subject: Re: Removing the jsessionid for SEO
>  >
>  > you subclass the pagenavigator and make it use bookmarkable links
>  > also. it has factory methods for all the links it uses.
>  >
>  > -igor
>  >
>  >
>  > On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan <dk...@citizenhawk.com>
>  > wrote:
>  > > I wasn't talking about the links that are on the list (I already make
>  > those
>  > >  bookmarkable).  I'm talking about the links that the Navigator
>  > generates.
>  > >  How do I make it so page 2 is bookmarkable?
>  > >
>  > >
>  > >  -----Original Message-----
>  > >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>  > >
>  > >
>  > > Sent: Thursday, April 03, 2008 3:30 PM
>  > >  To: users@wicket.apache.org
>  > >  Subject: Re: Removing the jsessionid for SEO
>  > >
>  > >  instead of
>  > >
>  > >  item.add(new link("foo") { onclick() });
>  > >
>  > >  do
>  > >
>  > >  item.add(new bookmarkablepagelink("foo", page.class));
>  > >
>  > >  -igor
>  > >
>  > >
>  > >  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan <dk...@citizenhawk.com>
>  > wrote:
>  > >  > How?  I asked how to do it before and nobody suggested this as a
>  > >  >  possibility.
>  > >  >
>  > >  >
>  > >  >
>  > >  >  -----Original Message-----
>  > >  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>  > >  >  Sent: Thursday, April 03, 2008 3:26 PM
>  > >  >  To: users@wicket.apache.org
>  > >  >  Subject: Re: Removing the jsessionid for SEO
>  > >  >
>  > >  >  dataview can work in a stateless mode, just use bookmarkable links
>  > inside
>  > >  it
>  > >  >
>  > >  >  -igor
>  > >  >
>  > >  >
>  > >  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan <dk...@citizenhawk.com>
>  > >  wrote:
>  > >  >  > Regardless, at the very least this makes your site look "weird"
>  > and
>  > >  >  >  unprofessional when google puts a jsessionid on your url.  There
>  > has
>  > >  got
>  > >  >  to
>  > >  >  >  be some negative effect when google visits it the second time and
>  > the
>  > >  >  >  jsessionid has changed but it sees the same exact content.  Worst
>  > >  case,
>  > >  >  >  it'll think you're trying to trick it.
>  > >  >  >
>  > >  >  >  About those 404s, I'm finding that with the fix I provided I
>  > don't get
>  > >  a
>  > >  >  >  404, but the links refresh the page I'm already on.  IE: If I'm
>  > on A,
>  > >  and
>  > >  >  a
>  > >  >  >  link to B is non-bookmarkable, clicking B refreshes A.
>  > >  >  >
>  > >  >  >  This issue is very disconcerting to me.  It's one of the reasons
>  > I
>  > >  wish
>  > >  >  that
>  > >  >  >  DataView had an option to work in stateless mode.  Cause if I ban
>  > >  cookies
>  > >  >  >  and Googlebot visits my home page (with a navigator on it), it'll
>  > try
>  > >  to
>  > >  >  >  follow all these page links and from its perspective, they all
>  > lead
>  > >  back
>  > >  >  to
>  > >  >  >  the first page.  So it's kinda a catch-22: Include the jsessionid
>  > in
>  > >  the
>  > >  >  >  urls and get bad SEO or remove the jsessionid and get bad SEO :(
>  > >  >  >
>  > >  >  >  Perhaps the answer to my prayers is a combination of the
>  > >  noindex/nofollow
>  > >  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a nofollow
>  > on the
>  > >  >  home
>  > >  >  >  page (so googlebot doesn't try to follow the navigator links) and
>  > use
>  > >  the
>  > >  >  >  sitemap.xml to point out the individual pages I want it to index.
>  > >  >  >
>  > >  >  >
>  > >  >  >  Matej: can you go into more detail about your hybrid URL
>  > statement?
>  > >  >  Won't
>  > >  >  >  google index, for example, /home and /home.1 if I use it?  When
>  > it
>  > >  >  follows
>  > >  >  >  the next page, won't the url become /home.1.2 or something?  That
>  > .2
>  > >  is a
>  > >  >  >  page version: If google indexes that and tries to visit it again,
>  > >  won't
>  > >  >  it
>  > >  >  >  report about an invalid session?
>  > >  >  >
>  > >  >  >
>  > >  >  >
>  > >  >  >  -----Original Message-----
>  > >  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
>  > >  >  >  Sent: Thursday, April 03, 2008 11:10 AM
>  > >  >  >  To: users@wicket.apache.org
>  > >  >  >  Subject: Re: Removing the jsessionid for SEO
>  > >  >  >
>  > >  >  >  On the other hand, crawling non-bookmarkable pages is not very
>  > useful
>  > >  >  >  anyway, since ?wicket:interface url will always get page expired
>  > when
>  > >  >  >  you click on the result.
>  > >  >  >
>  > >  >  >  However, preserving session makes lot of sense with hybrid url.
>  > Google
>  > >  >  >  remembers the original url (without page instance) while indexing
>  > the
>  > >  >  >  real page (after redirect).
>  > >  >  >
>  > >  >  >  I think though that the crawler is quite advanced. I'm would
>  > think  it
>  > >  >  >  supports cookies (at least JSESSIONID) as well as it evaluates
>  > some of
>  > >  >  >  the javascript on page.
>  > >  >  >
>  > >  >  >  -Matej
>  > >  >  >
>  > >  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
>  > >  <ig...@gmail.com>
>  > >  >  >  wrote:
>  > >  >  >  > right. if you strip sessionid then all your nonbookmarkable
>  > urls
>  > >  will
>  > >  >  >  >  resolve to a 404. that will probably drop your rank a lot
>  > >  faster....
>  > >  >  >  >
>  > >  >  >  >  -igor
>  > >  >  >  >
>  > >  >  >  >
>  > >  >  >  >
>  > >  >  >  >
>  > >  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
>  > >  <jc...@gmail.com>
>  > >  >  >  wrote:
>  > >  >  >  >  > the problem is that then you have to have all stateless
>  > pages.
>  > >  Else
>  > >  >  >  google
>  > >  >  >  >  >  can't crawl your website.
>  > >  >  >  >  >  And if that is the case then you could be completely
>  > stateless
>  > >  so
>  > >  >  you
>  > >  >  >  dont
>  > >  >  >  >  >  have a session (id) to worry about at all.
>  > >  >  >  >  >
>  > >  >  >  >  >  johan
>  > >  >  >  >  >
>  > >  >  >  >  >
>  > >  >  >  >  >
>  > >  >  >  >  >
>  > >  >  >  >  >
>  > >  >  >  >  >
>  > >  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
>  > >  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
>  > >  >  >  >  >
>  > >  >  >  >  >  > When Google asks to not have special treatment for their
>  > bot,
>  > >  >  they
>  > >  >  >  are
>  > >  >  >  >  >  > referring to content more than anything. Regarding the
>  > session
>  > >  id
>  > >  >  >  being
>  > >  >  >  >  >  > coded in the URL, see the Technical guidelines section of
>  > >  >  Google's
>  > >  >  >  >  >  > Webmaster Guidelines -
>  > >  >  >  >  >  >
>  > >  >  >
>  > >
>  > http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>  > >  >  >  >  >  > gn
>  > >  >  >  >  >  >
>  > >  >  >  >  >  > It specifically recommends "allow(ing) search bots to
>  > crawl
>  > >  your
>  > >  >  >  sites
>  > >  >  >  >  >  > without session IDs or arguments that track their path
>  > through
>  > >  >  the
>  > >  >  >  >  >  > site."
>  > >  >  >  >  >  >
>  > >  >  >  >  >  > -----Original Message-----
>  > >  >  >  >  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
>  > >  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
>  > >  >  >  >  >  > To: users@wicket.apache.org
>  > >  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
>  > >  >  >  >  >  >
>  > >  >  >  >  >  > isnt google always saying that you shouldn't alter
>  > behavior of
>  > >  >  your
>  > >  >  >  site
>  > >  >  >  >  >  > depending of it is there bot or not?
>  > >  >  >  >  >  >
>  > >  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
>  > <a_...@gazeta.pl>
>  > >  >  >  wrote:
>  > >  >  >  >  >  >
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > > Hi!
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > > igor.vaynberg wrote:
>  > >  >  >  >  >  > > >
>  > >  >  >  >  >  > > > also by doing what you have done users with cookies
>  > >  disabled
>  > >  >  >  wont be
>  > >  >  >  >  >  > > > able to use your site...
>  > >  >  >  >  >  > > >
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > > In my opinion session id is a problem. Google index the
>  > same
>  > >  >  page
>  > >  >  >  >  >  > again
>  > >  >  >  >  >  > > and
>  > >  >  >  >  >  > > again.
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > > About the users without cookies we can do like this:
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >        static class Unbuffered extends WebResponse {
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >                 private static final String[] botAgents
>  > = {
>  > >  >  >  >  >  > "onetszukaj",
>  > >  >  >  >  >  > > "googlebot",
>  > >  >  >  >  >  > > "appie", "architext",
>  > >  >  >  >  >  > >                        "jeeves", "bjaaland", "ferret",
>  > >  >  "gulliver",
>  > >  >  >  >  >  > > "harvest", "htdig",
>  > >  >  >  >  >  > >                        "linkwalker", "lycos_", "moget",
>  > >  >  >  >  >  > "muscatferret",
>  > >  >  >  >  >  > > "myweb", "nomad",
>  > >  >  >  >  >  > > "scooter",
>  > >  >  >  >  >  > >                        "yahoo!\\sslurp\\schina",
>  > "slurp",
>  > >  >  >  "weblayers",
>  > >  >  >  >  >  > > "antibot", "bruinbot",
>  > >  >  >  >  >  > > "digout4u",
>  > >  >  >  >  >  > >                        "echo!", "ia_archiver",
>  > "jennybot",
>  > >  >  >  "mercator",
>  > >  >  >  >  >  > > "netcraft", "msnbot",
>  > >  >  >  >  >  > > "petersnews",
>  > >  >  >  >  >  > >                        "unlost_web_crawler", "voila",
>  > >  >  "webbase",
>  > >  >  >  >  >  > > "webcollage", "cfetch",
>  > >  >  >  >  >  > > "zyborg",
>  > >  >  >  >  >  > >                        "wisenutbot", "robot", "crawl",
>  > >  "spider"
>  > >  >  };
>  > >  >  >  /*
>  > >  >  >  >  >  > and
>  > >  >  >  >  >  > > so on... */
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >                public Unbuffered(final
>  > HttpServletResponse
>  > >  res)
>  > >  >  {
>  > >  >  >  >  >  > >            super(res);
>  > >  >  >  >  >  > >         }
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >        @Override
>  > >  >  >  >  >  > >        public CharSequence encodeURL(final CharSequence
>  > url)
>  > >  {
>  > >  >  >  >  >  > >             return isAgent() ? url :
>  > super.encodeURL(url);
>  > >  >  >  >  >  > >        }
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >                private static boolean isAgent() {
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >                        String agent =
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  >
>  > >  >  >
>  > >
>  > ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>  > >  >  >  >  >  > tHeader("User-Agent");
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >                        for(String bot : botAgents) {
>  > >  >  >  >  >  > >                                if
>  > >  >  >  (agent.toLowerCase().indexOf(bot) !=
>  > >  >  >  >  >  > -1)
>  > >  >  >  >  >  > > {
>  > >  >  >  >  >  > >                                        return true;
>  > >  >  >  >  >  > >                                }
>  > >  >  >  >  >  > >                        }
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >                        return false;
>  > >  >  >  >  >  > >                }
>  > >  >  >  >  >  > >    }
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > > I didn't test this code but I do similar thing in my
>  > old
>  > >  >  >  application
>  > >  >  >  >  >  > in
>  > >  >  >  >  >  > > Spring and it works.
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > > Take care,
>  > >  >  >  >  >  > > Artur
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > > --
>  > >  >  >  >  >  > > View this message in context:
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  >
>  > >  >  >
>  > >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-
>  > tp16464534p1646739
>  > >  >  >  >  >  >
>  > >  >  >
>  > >  >
>  > >  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-
>  > tp16464534p1646
>  > >  >  >  7396.html>
>  > >  >  >  >  >
>  > >  >  >  >  >
>  > >  >  >  >  > > > Sent from the Wicket - User mailing list archive at
>  > >  Nabble.com.
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >
>  > >  >  >  -----------------------------------------------------------------
>  > ----
>  > >  >  >  >  >  > > To unsubscribe, e-mail: users-
>  > unsubscribe@wicket.apache.org
>  > >  >  >  >  >  > > For additional commands, e-mail:
>  > >  users-help@wicket.apache.org
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  > >
>  > >  >  >  >  >  >
>  > >  >  >  >  >  > ______________
>  > >  >  >  >  >  >
>  > >  >  >  >  >  > The information contained in this message is proprietary
>  > >  and/or
>  > >  >  >  >  >  > confidential. If you are not the
>  > >  >  >  >  >  > intended recipient, please: (i) delete the message and
>  > all
>  > >  >  copies;
>  > >  >  >  (ii) do
>  > >  >  >  >  >  > not disclose,
>  > >  >  >  >  >  > distribute or use the message in any manner; and (iii)
>  > notify
>  > >  the
>  > >  >  >  sender
>  > >  >  >  >  >  > immediately. In addition,
>  > >  >  >  >  >  > please be aware that any message addressed to our domain
>  > is
>  > >  >  subject
>  > >  >  >  to
>  > >  >  >  >  >  > archiving and review by
>  > >  >  >  >  >  > persons other than the intended recipient. Thank you.
>  > >  >  >  >  >  > _____________
>  > >  >  >  >  >  >
>  > >  >  >  >  >  >
>  > >  >  >  -----------------------------------------------------------------
>  > ----
>  > >  >  >  >  >  > To unsubscribe, e-mail: users-
>  > unsubscribe@wicket.apache.org
>  > >  >  >  >  >  > For additional commands, e-mail: users-
>  > help@wicket.apache.org
>  > >  >  >  >  >  >
>  > >  >  >  >  >  >
>  > >  >  >  >  >
>  > >  >  >  >
>  > >  >  >  >
>  > >  ---------------------------------------------------------------------
>  > >  >  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  >  >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  > >  >  >  >
>  > >  >  >  >
>  > >  >  >
>  > >  >  >
>  > >  >  >
>  > >  >  >  --
>  > >  >  >  Resizable and reorderable grid components.
>  > >  >  >  http://www.inmethod.com
>  > >  >  >
>  > >  >  >  -----------------------------------------------------------------
>  > ----
>  > >  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  > >  >  >
>  > >  >  >
>  > >  >  >  -----------------------------------------------------------------
>  > ----
>  > >  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  > >  >  >
>  > >  >  >
>  > >  >
>  > >  >  --------------------------------------------------------------------
>  > -
>  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  > >  >
>  > >  >
>  > >  >  --------------------------------------------------------------------
>  > -
>  > >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  > >  >
>  > >  >
>  > >
>  > >  ---------------------------------------------------------------------
>  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  For additional commands, e-mail: users-help@wicket.apache.org
>  > >
>  > >
>  > >  ---------------------------------------------------------------------
>  > >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > >  For additional commands, e-mail: users-help@wicket.apache.org
>  > >
>  > >
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > For additional commands, e-mail: users-help@wicket.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


RE: Removing the jsessionid for SEO

Posted by Dan Kaplan <dk...@citizenhawk.com>.
Ok I did a little preliminary research on this.  Right now PagingNavigator
uses PagingNavigationLink's to represent its page.  This extends Link.  I'm
supposed to override PagingNavigator's newPagingNavigationLink() method to
accomplish this (I think) but past that, this isn't very straightforward to
me.  

Do I need to create my own BookmarkablePagingNavigationLink?  When I do...
what next?  I really don't know enough about bookmarkablePageLinks to do
this.  Right now, all the magic happens inside PagingNavigationLink.  Won't
I have to move all that logic into the WebPage that I'm passing into
BookmarkablePagingNavigationLink?  This seems like a lot of work.  Am I
missing something critical?

> -----Original Message-----
> From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> Sent: Thursday, April 03, 2008 3:40 PM
> To: users@wicket.apache.org
> Subject: Re: Removing the jsessionid for SEO
> 
> you subclass the pagenavigator and make it use bookmarkable links
> also. it has factory methods for all the links it uses.
> 
> -igor
> 
> 
> On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan <dk...@citizenhawk.com>
> wrote:
> > I wasn't talking about the links that are on the list (I already make
> those
> >  bookmarkable).  I'm talking about the links that the Navigator
> generates.
> >  How do I make it so page 2 is bookmarkable?
> >
> >
> >  -----Original Message-----
> >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> >
> >
> > Sent: Thursday, April 03, 2008 3:30 PM
> >  To: users@wicket.apache.org
> >  Subject: Re: Removing the jsessionid for SEO
> >
> >  instead of
> >
> >  item.add(new link("foo") { onclick() });
> >
> >  do
> >
> >  item.add(new bookmarkablepagelink("foo", page.class));
> >
> >  -igor
> >
> >
> >  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan <dk...@citizenhawk.com>
> wrote:
> >  > How?  I asked how to do it before and nobody suggested this as a
> >  >  possibility.
> >  >
> >  >
> >  >
> >  >  -----Original Message-----
> >  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
> >  >  Sent: Thursday, April 03, 2008 3:26 PM
> >  >  To: users@wicket.apache.org
> >  >  Subject: Re: Removing the jsessionid for SEO
> >  >
> >  >  dataview can work in a stateless mode, just use bookmarkable links
> inside
> >  it
> >  >
> >  >  -igor
> >  >
> >  >
> >  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan <dk...@citizenhawk.com>
> >  wrote:
> >  >  > Regardless, at the very least this makes your site look "weird"
> and
> >  >  >  unprofessional when google puts a jsessionid on your url.  There
> has
> >  got
> >  >  to
> >  >  >  be some negative effect when google visits it the second time and
> the
> >  >  >  jsessionid has changed but it sees the same exact content.  Worst
> >  case,
> >  >  >  it'll think you're trying to trick it.
> >  >  >
> >  >  >  About those 404s, I'm finding that with the fix I provided I
> don't get
> >  a
> >  >  >  404, but the links refresh the page I'm already on.  IE: If I'm
> on A,
> >  and
> >  >  a
> >  >  >  link to B is non-bookmarkable, clicking B refreshes A.
> >  >  >
> >  >  >  This issue is very disconcerting to me.  It's one of the reasons
> I
> >  wish
> >  >  that
> >  >  >  DataView had an option to work in stateless mode.  Cause if I ban
> >  cookies
> >  >  >  and Googlebot visits my home page (with a navigator on it), it'll
> try
> >  to
> >  >  >  follow all these page links and from its perspective, they all
> lead
> >  back
> >  >  to
> >  >  >  the first page.  So it's kinda a catch-22: Include the jsessionid
> in
> >  the
> >  >  >  urls and get bad SEO or remove the jsessionid and get bad SEO :(
> >  >  >
> >  >  >  Perhaps the answer to my prayers is a combination of the
> >  noindex/nofollow
> >  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a nofollow
> on the
> >  >  home
> >  >  >  page (so googlebot doesn't try to follow the navigator links) and
> use
> >  the
> >  >  >  sitemap.xml to point out the individual pages I want it to index.
> >  >  >
> >  >  >
> >  >  >  Matej: can you go into more detail about your hybrid URL
> statement?
> >  >  Won't
> >  >  >  google index, for example, /home and /home.1 if I use it?  When
> it
> >  >  follows
> >  >  >  the next page, won't the url become /home.1.2 or something?  That
> .2
> >  is a
> >  >  >  page version: If google indexes that and tries to visit it again,
> >  won't
> >  >  it
> >  >  >  report about an invalid session?
> >  >  >
> >  >  >
> >  >  >
> >  >  >  -----Original Message-----
> >  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
> >  >  >  Sent: Thursday, April 03, 2008 11:10 AM
> >  >  >  To: users@wicket.apache.org
> >  >  >  Subject: Re: Removing the jsessionid for SEO
> >  >  >
> >  >  >  On the other hand, crawling non-bookmarkable pages is not very
> useful
> >  >  >  anyway, since ?wicket:interface url will always get page expired
> when
> >  >  >  you click on the result.
> >  >  >
> >  >  >  However, preserving session makes lot of sense with hybrid url.
> Google
> >  >  >  remembers the original url (without page instance) while indexing
> the
> >  >  >  real page (after redirect).
> >  >  >
> >  >  >  I think though that the crawler is quite advanced. I'm would
> think  it
> >  >  >  supports cookies (at least JSESSIONID) as well as it evaluates
> some of
> >  >  >  the javascript on page.
> >  >  >
> >  >  >  -Matej
> >  >  >
> >  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
> >  <ig...@gmail.com>
> >  >  >  wrote:
> >  >  >  > right. if you strip sessionid then all your nonbookmarkable
> urls
> >  will
> >  >  >  >  resolve to a 404. that will probably drop your rank a lot
> >  faster....
> >  >  >  >
> >  >  >  >  -igor
> >  >  >  >
> >  >  >  >
> >  >  >  >
> >  >  >  >
> >  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
> >  <jc...@gmail.com>
> >  >  >  wrote:
> >  >  >  >  > the problem is that then you have to have all stateless
> pages.
> >  Else
> >  >  >  google
> >  >  >  >  >  can't crawl your website.
> >  >  >  >  >  And if that is the case then you could be completely
> stateless
> >  so
> >  >  you
> >  >  >  dont
> >  >  >  >  >  have a session (id) to worry about at all.
> >  >  >  >  >
> >  >  >  >  >  johan
> >  >  >  >  >
> >  >  >  >  >
> >  >  >  >  >
> >  >  >  >  >
> >  >  >  >  >
> >  >  >  >  >
> >  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
> >  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
> >  >  >  >  >
> >  >  >  >  >  > When Google asks to not have special treatment for their
> bot,
> >  >  they
> >  >  >  are
> >  >  >  >  >  > referring to content more than anything. Regarding the
> session
> >  id
> >  >  >  being
> >  >  >  >  >  > coded in the URL, see the Technical guidelines section of
> >  >  Google's
> >  >  >  >  >  > Webmaster Guidelines -
> >  >  >  >  >  >
> >  >  >
> >
> http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
> >  >  >  >  >  > gn
> >  >  >  >  >  >
> >  >  >  >  >  > It specifically recommends "allow(ing) search bots to
> crawl
> >  your
> >  >  >  sites
> >  >  >  >  >  > without session IDs or arguments that track their path
> through
> >  >  the
> >  >  >  >  >  > site."
> >  >  >  >  >  >
> >  >  >  >  >  > -----Original Message-----
> >  >  >  >  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
> >  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
> >  >  >  >  >  > To: users@wicket.apache.org
> >  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
> >  >  >  >  >  >
> >  >  >  >  >  > isnt google always saying that you shouldn't alter
> behavior of
> >  >  your
> >  >  >  site
> >  >  >  >  >  > depending of it is there bot or not?
> >  >  >  >  >  >
> >  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
> <a_...@gazeta.pl>
> >  >  >  wrote:
> >  >  >  >  >  >
> >  >  >  >  >  > >
> >  >  >  >  >  > > Hi!
> >  >  >  >  >  > >
> >  >  >  >  >  > >
> >  >  >  >  >  > > igor.vaynberg wrote:
> >  >  >  >  >  > > >
> >  >  >  >  >  > > > also by doing what you have done users with cookies
> >  disabled
> >  >  >  wont be
> >  >  >  >  >  > > > able to use your site...
> >  >  >  >  >  > > >
> >  >  >  >  >  > >
> >  >  >  >  >  > > In my opinion session id is a problem. Google index the
> same
> >  >  page
> >  >  >  >  >  > again
> >  >  >  >  >  > > and
> >  >  >  >  >  > > again.
> >  >  >  >  >  > >
> >  >  >  >  >  > > About the users without cookies we can do like this:
> >  >  >  >  >  > >
> >  >  >  >  >  > >
> >  >  >  >  >  > >        static class Unbuffered extends WebResponse {
> >  >  >  >  >  > >
> >  >  >  >  >  > >                 private static final String[] botAgents
> = {
> >  >  >  >  >  > "onetszukaj",
> >  >  >  >  >  > > "googlebot",
> >  >  >  >  >  > > "appie", "architext",
> >  >  >  >  >  > >                        "jeeves", "bjaaland", "ferret",
> >  >  "gulliver",
> >  >  >  >  >  > > "harvest", "htdig",
> >  >  >  >  >  > >                        "linkwalker", "lycos_", "moget",
> >  >  >  >  >  > "muscatferret",
> >  >  >  >  >  > > "myweb", "nomad",
> >  >  >  >  >  > > "scooter",
> >  >  >  >  >  > >                        "yahoo!\\sslurp\\schina",
> "slurp",
> >  >  >  "weblayers",
> >  >  >  >  >  > > "antibot", "bruinbot",
> >  >  >  >  >  > > "digout4u",
> >  >  >  >  >  > >                        "echo!", "ia_archiver",
> "jennybot",
> >  >  >  "mercator",
> >  >  >  >  >  > > "netcraft", "msnbot",
> >  >  >  >  >  > > "petersnews",
> >  >  >  >  >  > >                        "unlost_web_crawler", "voila",
> >  >  "webbase",
> >  >  >  >  >  > > "webcollage", "cfetch",
> >  >  >  >  >  > > "zyborg",
> >  >  >  >  >  > >                        "wisenutbot", "robot", "crawl",
> >  "spider"
> >  >  };
> >  >  >  /*
> >  >  >  >  >  > and
> >  >  >  >  >  > > so on... */
> >  >  >  >  >  > >
> >  >  >  >  >  > >                public Unbuffered(final
> HttpServletResponse
> >  res)
> >  >  {
> >  >  >  >  >  > >            super(res);
> >  >  >  >  >  > >         }
> >  >  >  >  >  > >
> >  >  >  >  >  > >        @Override
> >  >  >  >  >  > >        public CharSequence encodeURL(final CharSequence
> url)
> >  {
> >  >  >  >  >  > >             return isAgent() ? url :
> super.encodeURL(url);
> >  >  >  >  >  > >        }
> >  >  >  >  >  > >
> >  >  >  >  >  > >                private static boolean isAgent() {
> >  >  >  >  >  > >
> >  >  >  >  >  > >                        String agent =
> >  >  >  >  >  > >
> >  >  >  >  >  > >
> >  >  >  >  >  >
> >  >  >
> >
> ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
> >  >  >  >  >  > tHeader("User-Agent");
> >  >  >  >  >  > >
> >  >  >  >  >  > >                        for(String bot : botAgents) {
> >  >  >  >  >  > >                                if
> >  >  >  (agent.toLowerCase().indexOf(bot) !=
> >  >  >  >  >  > -1)
> >  >  >  >  >  > > {
> >  >  >  >  >  > >                                        return true;
> >  >  >  >  >  > >                                }
> >  >  >  >  >  > >                        }
> >  >  >  >  >  > >
> >  >  >  >  >  > >                        return false;
> >  >  >  >  >  > >                }
> >  >  >  >  >  > >    }
> >  >  >  >  >  > >
> >  >  >  >  >  > >
> >  >  >  >  >  > > I didn't test this code but I do similar thing in my
> old
> >  >  >  application
> >  >  >  >  >  > in
> >  >  >  >  >  > > Spring and it works.
> >  >  >  >  >  > >
> >  >  >  >  >  > > Take care,
> >  >  >  >  >  > > Artur
> >  >  >  >  >  > >
> >  >  >  >  >  > >
> >  >  >  >  >  > > --
> >  >  >  >  >  > > View this message in context:
> >  >  >  >  >  > >
> >  >  >  >  >  >
> >  >  >
> >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> tp16464534p1646739
> >  >  >  >  >  >
> >  >  >
> >  >
> >  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-
> tp16464534p1646
> >  >  >  7396.html>
> >  >  >  >  >
> >  >  >  >  >
> >  >  >  >  > > > Sent from the Wicket - User mailing list archive at
> >  Nabble.com.
> >  >  >  >  >  > >
> >  >  >  >  >  > >
> >  >  >  >  >  > >
> >  >  >  -----------------------------------------------------------------
> ----
> >  >  >  >  >  > > To unsubscribe, e-mail: users-
> unsubscribe@wicket.apache.org
> >  >  >  >  >  > > For additional commands, e-mail:
> >  users-help@wicket.apache.org
> >  >  >  >  >  > >
> >  >  >  >  >  > >
> >  >  >  >  >  >
> >  >  >  >  >  > ______________
> >  >  >  >  >  >
> >  >  >  >  >  > The information contained in this message is proprietary
> >  and/or
> >  >  >  >  >  > confidential. If you are not the
> >  >  >  >  >  > intended recipient, please: (i) delete the message and
> all
> >  >  copies;
> >  >  >  (ii) do
> >  >  >  >  >  > not disclose,
> >  >  >  >  >  > distribute or use the message in any manner; and (iii)
> notify
> >  the
> >  >  >  sender
> >  >  >  >  >  > immediately. In addition,
> >  >  >  >  >  > please be aware that any message addressed to our domain
> is
> >  >  subject
> >  >  >  to
> >  >  >  >  >  > archiving and review by
> >  >  >  >  >  > persons other than the intended recipient. Thank you.
> >  >  >  >  >  > _____________
> >  >  >  >  >  >
> >  >  >  >  >  >
> >  >  >  -----------------------------------------------------------------
> ----
> >  >  >  >  >  > To unsubscribe, e-mail: users-
> unsubscribe@wicket.apache.org
> >  >  >  >  >  > For additional commands, e-mail: users-
> help@wicket.apache.org
> >  >  >  >  >  >
> >  >  >  >  >  >
> >  >  >  >  >
> >  >  >  >
> >  >  >  >
> >  ---------------------------------------------------------------------
> >  >  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  >  >  >  For additional commands, e-mail: users-help@wicket.apache.org
> >  >  >  >
> >  >  >  >
> >  >  >
> >  >  >
> >  >  >
> >  >  >  --
> >  >  >  Resizable and reorderable grid components.
> >  >  >  http://www.inmethod.com
> >  >  >
> >  >  >  -----------------------------------------------------------------
> ----
> >  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  >  >  For additional commands, e-mail: users-help@wicket.apache.org
> >  >  >
> >  >  >
> >  >  >  -----------------------------------------------------------------
> ----
> >  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  >  >  For additional commands, e-mail: users-help@wicket.apache.org
> >  >  >
> >  >  >
> >  >
> >  >  --------------------------------------------------------------------
> -
> >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  >  For additional commands, e-mail: users-help@wicket.apache.org
> >  >
> >  >
> >  >  --------------------------------------------------------------------
> -
> >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  >  For additional commands, e-mail: users-help@wicket.apache.org
> >  >
> >  >
> >
> >  ---------------------------------------------------------------------
> >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  For additional commands, e-mail: users-help@wicket.apache.org
> >
> >
> >  ---------------------------------------------------------------------
> >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> >  For additional commands, e-mail: users-help@wicket.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


RE: Removing the jsessionid for SEO

Posted by Dan Kaplan <dk...@citizenhawk.com>.
Awesome, thanks

-----Original Message-----
From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com] 
Sent: Thursday, April 03, 2008 3:40 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

you subclass the pagenavigator and make it use bookmarkable links
also. it has factory methods for all the links it uses.

-igor


On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
> I wasn't talking about the links that are on the list (I already make
those
>  bookmarkable).  I'm talking about the links that the Navigator generates.
>  How do I make it so page 2 is bookmarkable?
>
>
>  -----Original Message-----
>  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>
>
> Sent: Thursday, April 03, 2008 3:30 PM
>  To: users@wicket.apache.org
>  Subject: Re: Removing the jsessionid for SEO
>
>  instead of
>
>  item.add(new link("foo") { onclick() });
>
>  do
>
>  item.add(new bookmarkablepagelink("foo", page.class));
>
>  -igor
>
>
>  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan <dk...@citizenhawk.com>
wrote:
>  > How?  I asked how to do it before and nobody suggested this as a
>  >  possibility.
>  >
>  >
>  >
>  >  -----Original Message-----
>  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>  >  Sent: Thursday, April 03, 2008 3:26 PM
>  >  To: users@wicket.apache.org
>  >  Subject: Re: Removing the jsessionid for SEO
>  >
>  >  dataview can work in a stateless mode, just use bookmarkable links
inside
>  it
>  >
>  >  -igor
>  >
>  >
>  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan <dk...@citizenhawk.com>
>  wrote:
>  >  > Regardless, at the very least this makes your site look "weird" and
>  >  >  unprofessional when google puts a jsessionid on your url.  There
has
>  got
>  >  to
>  >  >  be some negative effect when google visits it the second time and
the
>  >  >  jsessionid has changed but it sees the same exact content.  Worst
>  case,
>  >  >  it'll think you're trying to trick it.
>  >  >
>  >  >  About those 404s, I'm finding that with the fix I provided I don't
get
>  a
>  >  >  404, but the links refresh the page I'm already on.  IE: If I'm on
A,
>  and
>  >  a
>  >  >  link to B is non-bookmarkable, clicking B refreshes A.
>  >  >
>  >  >  This issue is very disconcerting to me.  It's one of the reasons I
>  wish
>  >  that
>  >  >  DataView had an option to work in stateless mode.  Cause if I ban
>  cookies
>  >  >  and Googlebot visits my home page (with a navigator on it), it'll
try
>  to
>  >  >  follow all these page links and from its perspective, they all lead
>  back
>  >  to
>  >  >  the first page.  So it's kinda a catch-22: Include the jsessionid
in
>  the
>  >  >  urls and get bad SEO or remove the jsessionid and get bad SEO :(
>  >  >
>  >  >  Perhaps the answer to my prayers is a combination of the
>  noindex/nofollow
>  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on
the
>  >  home
>  >  >  page (so googlebot doesn't try to follow the navigator links) and
use
>  the
>  >  >  sitemap.xml to point out the individual pages I want it to index.
>  >  >
>  >  >
>  >  >  Matej: can you go into more detail about your hybrid URL statement?
>  >  Won't
>  >  >  google index, for example, /home and /home.1 if I use it?  When it
>  >  follows
>  >  >  the next page, won't the url become /home.1.2 or something?  That
.2
>  is a
>  >  >  page version: If google indexes that and tries to visit it again,
>  won't
>  >  it
>  >  >  report about an invalid session?
>  >  >
>  >  >
>  >  >
>  >  >  -----Original Message-----
>  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
>  >  >  Sent: Thursday, April 03, 2008 11:10 AM
>  >  >  To: users@wicket.apache.org
>  >  >  Subject: Re: Removing the jsessionid for SEO
>  >  >
>  >  >  On the other hand, crawling non-bookmarkable pages is not very
useful
>  >  >  anyway, since ?wicket:interface url will always get page expired
when
>  >  >  you click on the result.
>  >  >
>  >  >  However, preserving session makes lot of sense with hybrid url.
Google
>  >  >  remembers the original url (without page instance) while indexing
the
>  >  >  real page (after redirect).
>  >  >
>  >  >  I think though that the crawler is quite advanced. I'm would think
it
>  >  >  supports cookies (at least JSESSIONID) as well as it evaluates some
of
>  >  >  the javascript on page.
>  >  >
>  >  >  -Matej
>  >  >
>  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
>  <ig...@gmail.com>
>  >  >  wrote:
>  >  >  > right. if you strip sessionid then all your nonbookmarkable urls
>  will
>  >  >  >  resolve to a 404. that will probably drop your rank a lot
>  faster....
>  >  >  >
>  >  >  >  -igor
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
>  <jc...@gmail.com>
>  >  >  wrote:
>  >  >  >  > the problem is that then you have to have all stateless pages.
>  Else
>  >  >  google
>  >  >  >  >  can't crawl your website.
>  >  >  >  >  And if that is the case then you could be completely
stateless
>  so
>  >  you
>  >  >  dont
>  >  >  >  >  have a session (id) to worry about at all.
>  >  >  >  >
>  >  >  >  >  johan
>  >  >  >  >
>  >  >  >  >
>  >  >  >  >
>  >  >  >  >
>  >  >  >  >
>  >  >  >  >
>  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
>  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
>  >  >  >  >
>  >  >  >  >  > When Google asks to not have special treatment for their
bot,
>  >  they
>  >  >  are
>  >  >  >  >  > referring to content more than anything. Regarding the
session
>  id
>  >  >  being
>  >  >  >  >  > coded in the URL, see the Technical guidelines section of
>  >  Google's
>  >  >  >  >  > Webmaster Guidelines -
>  >  >  >  >  >
>  >  >
>  http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>  >  >  >  >  > gn
>  >  >  >  >  >
>  >  >  >  >  > It specifically recommends "allow(ing) search bots to crawl
>  your
>  >  >  sites
>  >  >  >  >  > without session IDs or arguments that track their path
through
>  >  the
>  >  >  >  >  > site."
>  >  >  >  >  >
>  >  >  >  >  > -----Original Message-----
>  >  >  >  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
>  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
>  >  >  >  >  > To: users@wicket.apache.org
>  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
>  >  >  >  >  >
>  >  >  >  >  > isnt google always saying that you shouldn't alter behavior
of
>  >  your
>  >  >  site
>  >  >  >  >  > depending of it is there bot or not?
>  >  >  >  >  >
>  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W.
<a_...@gazeta.pl>
>  >  >  wrote:
>  >  >  >  >  >
>  >  >  >  >  > >
>  >  >  >  >  > > Hi!
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  > > igor.vaynberg wrote:
>  >  >  >  >  > > >
>  >  >  >  >  > > > also by doing what you have done users with cookies
>  disabled
>  >  >  wont be
>  >  >  >  >  > > > able to use your site...
>  >  >  >  >  > > >
>  >  >  >  >  > >
>  >  >  >  >  > > In my opinion session id is a problem. Google index the
same
>  >  page
>  >  >  >  >  > again
>  >  >  >  >  > > and
>  >  >  >  >  > > again.
>  >  >  >  >  > >
>  >  >  >  >  > > About the users without cookies we can do like this:
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  > >        static class Unbuffered extends WebResponse {
>  >  >  >  >  > >
>  >  >  >  >  > >                 private static final String[] botAgents =
{
>  >  >  >  >  > "onetszukaj",
>  >  >  >  >  > > "googlebot",
>  >  >  >  >  > > "appie", "architext",
>  >  >  >  >  > >                        "jeeves", "bjaaland", "ferret",
>  >  "gulliver",
>  >  >  >  >  > > "harvest", "htdig",
>  >  >  >  >  > >                        "linkwalker", "lycos_", "moget",
>  >  >  >  >  > "muscatferret",
>  >  >  >  >  > > "myweb", "nomad",
>  >  >  >  >  > > "scooter",
>  >  >  >  >  > >                        "yahoo!\\sslurp\\schina", "slurp",
>  >  >  "weblayers",
>  >  >  >  >  > > "antibot", "bruinbot",
>  >  >  >  >  > > "digout4u",
>  >  >  >  >  > >                        "echo!", "ia_archiver",
"jennybot",
>  >  >  "mercator",
>  >  >  >  >  > > "netcraft", "msnbot",
>  >  >  >  >  > > "petersnews",
>  >  >  >  >  > >                        "unlost_web_crawler", "voila",
>  >  "webbase",
>  >  >  >  >  > > "webcollage", "cfetch",
>  >  >  >  >  > > "zyborg",
>  >  >  >  >  > >                        "wisenutbot", "robot", "crawl",
>  "spider"
>  >  };
>  >  >  /*
>  >  >  >  >  > and
>  >  >  >  >  > > so on... */
>  >  >  >  >  > >
>  >  >  >  >  > >                public Unbuffered(final
HttpServletResponse
>  res)
>  >  {
>  >  >  >  >  > >            super(res);
>  >  >  >  >  > >         }
>  >  >  >  >  > >
>  >  >  >  >  > >        @Override
>  >  >  >  >  > >        public CharSequence encodeURL(final CharSequence
url)
>  {
>  >  >  >  >  > >             return isAgent() ? url :
super.encodeURL(url);
>  >  >  >  >  > >        }
>  >  >  >  >  > >
>  >  >  >  >  > >                private static boolean isAgent() {
>  >  >  >  >  > >
>  >  >  >  >  > >                        String agent =
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  >
>  >  >
>  ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>  >  >  >  >  > tHeader("User-Agent");
>  >  >  >  >  > >
>  >  >  >  >  > >                        for(String bot : botAgents) {
>  >  >  >  >  > >                                if
>  >  >  (agent.toLowerCase().indexOf(bot) !=
>  >  >  >  >  > -1)
>  >  >  >  >  > > {
>  >  >  >  >  > >                                        return true;
>  >  >  >  >  > >                                }
>  >  >  >  >  > >                        }
>  >  >  >  >  > >
>  >  >  >  >  > >                        return false;
>  >  >  >  >  > >                }
>  >  >  >  >  > >    }
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  > > I didn't test this code but I do similar thing in my old
>  >  >  application
>  >  >  >  >  > in
>  >  >  >  >  > > Spring and it works.
>  >  >  >  >  > >
>  >  >  >  >  > > Take care,
>  >  >  >  >  > > Artur
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  > > --
>  >  >  >  >  > > View this message in context:
>  >  >  >  >  > >
>  >  >  >  >  >
>  >  >
>  http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
>  >  >  >  >  >
>  >  >
>  >
>
6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646
>  >  >  7396.html>
>  >  >  >  >
>  >  >  >  >
>  >  >  >  > > > Sent from the Wicket - User mailing list archive at
>  Nabble.com.
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >
---------------------------------------------------------------------
>  >  >  >  >  > > To unsubscribe, e-mail:
users-unsubscribe@wicket.apache.org
>  >  >  >  >  > > For additional commands, e-mail:
>  users-help@wicket.apache.org
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  >
>  >  >  >  >  > ______________
>  >  >  >  >  >
>  >  >  >  >  > The information contained in this message is proprietary
>  and/or
>  >  >  >  >  > confidential. If you are not the
>  >  >  >  >  > intended recipient, please: (i) delete the message and all
>  >  copies;
>  >  >  (ii) do
>  >  >  >  >  > not disclose,
>  >  >  >  >  > distribute or use the message in any manner; and (iii)
notify
>  the
>  >  >  sender
>  >  >  >  >  > immediately. In addition,
>  >  >  >  >  > please be aware that any message addressed to our domain is
>  >  subject
>  >  >  to
>  >  >  >  >  > archiving and review by
>  >  >  >  >  > persons other than the intended recipient. Thank you.
>  >  >  >  >  > _____________
>  >  >  >  >  >
>  >  >  >  >  >
>  >  >
---------------------------------------------------------------------
>  >  >  >  >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  >  >  > For additional commands, e-mail:
users-help@wicket.apache.org
>  >  >  >  >  >
>  >  >  >  >  >
>  >  >  >  >
>  >  >  >
>  >  >  >
>  ---------------------------------------------------------------------
>  >  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >  >  >
>  >  >  >
>  >  >
>  >  >
>  >  >
>  >  >  --
>  >  >  Resizable and reorderable grid components.
>  >  >  http://www.inmethod.com
>  >  >
>  >  >
---------------------------------------------------------------------
>  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >  >
>  >  >
>  >  >
---------------------------------------------------------------------
>  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >  >
>  >  >
>  >
>  >  ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >
>  >
>  >  ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >
>  >
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Igor Vaynberg <ig...@gmail.com>.
you subclass the pagenavigator and make it use bookmarkable links
also. it has factory methods for all the links it uses.

-igor


On Thu, Apr 3, 2008 at 3:36 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
> I wasn't talking about the links that are on the list (I already make those
>  bookmarkable).  I'm talking about the links that the Navigator generates.
>  How do I make it so page 2 is bookmarkable?
>
>
>  -----Original Message-----
>  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>
>
> Sent: Thursday, April 03, 2008 3:30 PM
>  To: users@wicket.apache.org
>  Subject: Re: Removing the jsessionid for SEO
>
>  instead of
>
>  item.add(new link("foo") { onclick() });
>
>  do
>
>  item.add(new bookmarkablepagelink("foo", page.class));
>
>  -igor
>
>
>  On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
>  > How?  I asked how to do it before and nobody suggested this as a
>  >  possibility.
>  >
>  >
>  >
>  >  -----Original Message-----
>  >  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>  >  Sent: Thursday, April 03, 2008 3:26 PM
>  >  To: users@wicket.apache.org
>  >  Subject: Re: Removing the jsessionid for SEO
>  >
>  >  dataview can work in a stateless mode, just use bookmarkable links inside
>  it
>  >
>  >  -igor
>  >
>  >
>  >  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan <dk...@citizenhawk.com>
>  wrote:
>  >  > Regardless, at the very least this makes your site look "weird" and
>  >  >  unprofessional when google puts a jsessionid on your url.  There has
>  got
>  >  to
>  >  >  be some negative effect when google visits it the second time and the
>  >  >  jsessionid has changed but it sees the same exact content.  Worst
>  case,
>  >  >  it'll think you're trying to trick it.
>  >  >
>  >  >  About those 404s, I'm finding that with the fix I provided I don't get
>  a
>  >  >  404, but the links refresh the page I'm already on.  IE: If I'm on A,
>  and
>  >  a
>  >  >  link to B is non-bookmarkable, clicking B refreshes A.
>  >  >
>  >  >  This issue is very disconcerting to me.  It's one of the reasons I
>  wish
>  >  that
>  >  >  DataView had an option to work in stateless mode.  Cause if I ban
>  cookies
>  >  >  and Googlebot visits my home page (with a navigator on it), it'll try
>  to
>  >  >  follow all these page links and from its perspective, they all lead
>  back
>  >  to
>  >  >  the first page.  So it's kinda a catch-22: Include the jsessionid in
>  the
>  >  >  urls and get bad SEO or remove the jsessionid and get bad SEO :(
>  >  >
>  >  >  Perhaps the answer to my prayers is a combination of the
>  noindex/nofollow
>  >  >  meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the
>  >  home
>  >  >  page (so googlebot doesn't try to follow the navigator links) and use
>  the
>  >  >  sitemap.xml to point out the individual pages I want it to index.
>  >  >
>  >  >
>  >  >  Matej: can you go into more detail about your hybrid URL statement?
>  >  Won't
>  >  >  google index, for example, /home and /home.1 if I use it?  When it
>  >  follows
>  >  >  the next page, won't the url become /home.1.2 or something?  That .2
>  is a
>  >  >  page version: If google indexes that and tries to visit it again,
>  won't
>  >  it
>  >  >  report about an invalid session?
>  >  >
>  >  >
>  >  >
>  >  >  -----Original Message-----
>  >  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
>  >  >  Sent: Thursday, April 03, 2008 11:10 AM
>  >  >  To: users@wicket.apache.org
>  >  >  Subject: Re: Removing the jsessionid for SEO
>  >  >
>  >  >  On the other hand, crawling non-bookmarkable pages is not very useful
>  >  >  anyway, since ?wicket:interface url will always get page expired when
>  >  >  you click on the result.
>  >  >
>  >  >  However, preserving session makes lot of sense with hybrid url. Google
>  >  >  remembers the original url (without page instance) while indexing the
>  >  >  real page (after redirect).
>  >  >
>  >  >  I think though that the crawler is quite advanced. I'm would think  it
>  >  >  supports cookies (at least JSESSIONID) as well as it evaluates some of
>  >  >  the javascript on page.
>  >  >
>  >  >  -Matej
>  >  >
>  >  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
>  <ig...@gmail.com>
>  >  >  wrote:
>  >  >  > right. if you strip sessionid then all your nonbookmarkable urls
>  will
>  >  >  >  resolve to a 404. that will probably drop your rank a lot
>  faster....
>  >  >  >
>  >  >  >  -igor
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
>  <jc...@gmail.com>
>  >  >  wrote:
>  >  >  >  > the problem is that then you have to have all stateless pages.
>  Else
>  >  >  google
>  >  >  >  >  can't crawl your website.
>  >  >  >  >  And if that is the case then you could be completely stateless
>  so
>  >  you
>  >  >  dont
>  >  >  >  >  have a session (id) to worry about at all.
>  >  >  >  >
>  >  >  >  >  johan
>  >  >  >  >
>  >  >  >  >
>  >  >  >  >
>  >  >  >  >
>  >  >  >  >
>  >  >  >  >
>  >  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
>  >  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
>  >  >  >  >
>  >  >  >  >  > When Google asks to not have special treatment for their bot,
>  >  they
>  >  >  are
>  >  >  >  >  > referring to content more than anything. Regarding the session
>  id
>  >  >  being
>  >  >  >  >  > coded in the URL, see the Technical guidelines section of
>  >  Google's
>  >  >  >  >  > Webmaster Guidelines -
>  >  >  >  >  >
>  >  >
>  http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>  >  >  >  >  > gn
>  >  >  >  >  >
>  >  >  >  >  > It specifically recommends "allow(ing) search bots to crawl
>  your
>  >  >  sites
>  >  >  >  >  > without session IDs or arguments that track their path through
>  >  the
>  >  >  >  >  > site."
>  >  >  >  >  >
>  >  >  >  >  > -----Original Message-----
>  >  >  >  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
>  >  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
>  >  >  >  >  > To: users@wicket.apache.org
>  >  >  >  >  > Subject: Re: Removing the jsessionid for SEO
>  >  >  >  >  >
>  >  >  >  >  > isnt google always saying that you shouldn't alter behavior of
>  >  your
>  >  >  site
>  >  >  >  >  > depending of it is there bot or not?
>  >  >  >  >  >
>  >  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W. <a_...@gazeta.pl>
>  >  >  wrote:
>  >  >  >  >  >
>  >  >  >  >  > >
>  >  >  >  >  > > Hi!
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  > > igor.vaynberg wrote:
>  >  >  >  >  > > >
>  >  >  >  >  > > > also by doing what you have done users with cookies
>  disabled
>  >  >  wont be
>  >  >  >  >  > > > able to use your site...
>  >  >  >  >  > > >
>  >  >  >  >  > >
>  >  >  >  >  > > In my opinion session id is a problem. Google index the same
>  >  page
>  >  >  >  >  > again
>  >  >  >  >  > > and
>  >  >  >  >  > > again.
>  >  >  >  >  > >
>  >  >  >  >  > > About the users without cookies we can do like this:
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  > >        static class Unbuffered extends WebResponse {
>  >  >  >  >  > >
>  >  >  >  >  > >                 private static final String[] botAgents = {
>  >  >  >  >  > "onetszukaj",
>  >  >  >  >  > > "googlebot",
>  >  >  >  >  > > "appie", "architext",
>  >  >  >  >  > >                        "jeeves", "bjaaland", "ferret",
>  >  "gulliver",
>  >  >  >  >  > > "harvest", "htdig",
>  >  >  >  >  > >                        "linkwalker", "lycos_", "moget",
>  >  >  >  >  > "muscatferret",
>  >  >  >  >  > > "myweb", "nomad",
>  >  >  >  >  > > "scooter",
>  >  >  >  >  > >                        "yahoo!\\sslurp\\schina", "slurp",
>  >  >  "weblayers",
>  >  >  >  >  > > "antibot", "bruinbot",
>  >  >  >  >  > > "digout4u",
>  >  >  >  >  > >                        "echo!", "ia_archiver", "jennybot",
>  >  >  "mercator",
>  >  >  >  >  > > "netcraft", "msnbot",
>  >  >  >  >  > > "petersnews",
>  >  >  >  >  > >                        "unlost_web_crawler", "voila",
>  >  "webbase",
>  >  >  >  >  > > "webcollage", "cfetch",
>  >  >  >  >  > > "zyborg",
>  >  >  >  >  > >                        "wisenutbot", "robot", "crawl",
>  "spider"
>  >  };
>  >  >  /*
>  >  >  >  >  > and
>  >  >  >  >  > > so on... */
>  >  >  >  >  > >
>  >  >  >  >  > >                public Unbuffered(final HttpServletResponse
>  res)
>  >  {
>  >  >  >  >  > >            super(res);
>  >  >  >  >  > >         }
>  >  >  >  >  > >
>  >  >  >  >  > >        @Override
>  >  >  >  >  > >        public CharSequence encodeURL(final CharSequence url)
>  {
>  >  >  >  >  > >             return isAgent() ? url : super.encodeURL(url);
>  >  >  >  >  > >        }
>  >  >  >  >  > >
>  >  >  >  >  > >                private static boolean isAgent() {
>  >  >  >  >  > >
>  >  >  >  >  > >                        String agent =
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  >
>  >  >
>  ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>  >  >  >  >  > tHeader("User-Agent");
>  >  >  >  >  > >
>  >  >  >  >  > >                        for(String bot : botAgents) {
>  >  >  >  >  > >                                if
>  >  >  (agent.toLowerCase().indexOf(bot) !=
>  >  >  >  >  > -1)
>  >  >  >  >  > > {
>  >  >  >  >  > >                                        return true;
>  >  >  >  >  > >                                }
>  >  >  >  >  > >                        }
>  >  >  >  >  > >
>  >  >  >  >  > >                        return false;
>  >  >  >  >  > >                }
>  >  >  >  >  > >    }
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  > > I didn't test this code but I do similar thing in my old
>  >  >  application
>  >  >  >  >  > in
>  >  >  >  >  > > Spring and it works.
>  >  >  >  >  > >
>  >  >  >  >  > > Take care,
>  >  >  >  >  > > Artur
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  > > --
>  >  >  >  >  > > View this message in context:
>  >  >  >  >  > >
>  >  >  >  >  >
>  >  >
>  http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
>  >  >  >  >  >
>  >  >
>  >
>  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646
>  >  >  7396.html>
>  >  >  >  >
>  >  >  >  >
>  >  >  >  > > > Sent from the Wicket - User mailing list archive at
>  Nabble.com.
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  ---------------------------------------------------------------------
>  >  >  >  >  > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  >  >  > > For additional commands, e-mail:
>  users-help@wicket.apache.org
>  >  >  >  >  > >
>  >  >  >  >  > >
>  >  >  >  >  >
>  >  >  >  >  > ______________
>  >  >  >  >  >
>  >  >  >  >  > The information contained in this message is proprietary
>  and/or
>  >  >  >  >  > confidential. If you are not the
>  >  >  >  >  > intended recipient, please: (i) delete the message and all
>  >  copies;
>  >  >  (ii) do
>  >  >  >  >  > not disclose,
>  >  >  >  >  > distribute or use the message in any manner; and (iii) notify
>  the
>  >  >  sender
>  >  >  >  >  > immediately. In addition,
>  >  >  >  >  > please be aware that any message addressed to our domain is
>  >  subject
>  >  >  to
>  >  >  >  >  > archiving and review by
>  >  >  >  >  > persons other than the intended recipient. Thank you.
>  >  >  >  >  > _____________
>  >  >  >  >  >
>  >  >  >  >  >
>  >  >  ---------------------------------------------------------------------
>  >  >  >  >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  >  >  > For additional commands, e-mail: users-help@wicket.apache.org
>  >  >  >  >  >
>  >  >  >  >  >
>  >  >  >  >
>  >  >  >
>  >  >  >
>  ---------------------------------------------------------------------
>  >  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >  >  >
>  >  >  >
>  >  >
>  >  >
>  >  >
>  >  >  --
>  >  >  Resizable and reorderable grid components.
>  >  >  http://www.inmethod.com
>  >  >
>  >  >  ---------------------------------------------------------------------
>  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >  >
>  >  >
>  >  >  ---------------------------------------------------------------------
>  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >  >
>  >  >
>  >
>  >  ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >
>  >
>  >  ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >
>  >
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


RE: Removing the jsessionid for SEO

Posted by Dan Kaplan <dk...@citizenhawk.com>.
I wasn't talking about the links that are on the list (I already make those
bookmarkable).  I'm talking about the links that the Navigator generates.
How do I make it so page 2 is bookmarkable?

-----Original Message-----
From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com] 
Sent: Thursday, April 03, 2008 3:30 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

instead of

item.add(new link("foo") { onclick() });

do

item.add(new bookmarkablepagelink("foo", page.class));

-igor


On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
> How?  I asked how to do it before and nobody suggested this as a
>  possibility.
>
>
>
>  -----Original Message-----
>  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>  Sent: Thursday, April 03, 2008 3:26 PM
>  To: users@wicket.apache.org
>  Subject: Re: Removing the jsessionid for SEO
>
>  dataview can work in a stateless mode, just use bookmarkable links inside
it
>
>  -igor
>
>
>  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan <dk...@citizenhawk.com>
wrote:
>  > Regardless, at the very least this makes your site look "weird" and
>  >  unprofessional when google puts a jsessionid on your url.  There has
got
>  to
>  >  be some negative effect when google visits it the second time and the
>  >  jsessionid has changed but it sees the same exact content.  Worst
case,
>  >  it'll think you're trying to trick it.
>  >
>  >  About those 404s, I'm finding that with the fix I provided I don't get
a
>  >  404, but the links refresh the page I'm already on.  IE: If I'm on A,
and
>  a
>  >  link to B is non-bookmarkable, clicking B refreshes A.
>  >
>  >  This issue is very disconcerting to me.  It's one of the reasons I
wish
>  that
>  >  DataView had an option to work in stateless mode.  Cause if I ban
cookies
>  >  and Googlebot visits my home page (with a navigator on it), it'll try
to
>  >  follow all these page links and from its perspective, they all lead
back
>  to
>  >  the first page.  So it's kinda a catch-22: Include the jsessionid in
the
>  >  urls and get bad SEO or remove the jsessionid and get bad SEO :(
>  >
>  >  Perhaps the answer to my prayers is a combination of the
noindex/nofollow
>  >  meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the
>  home
>  >  page (so googlebot doesn't try to follow the navigator links) and use
the
>  >  sitemap.xml to point out the individual pages I want it to index.
>  >
>  >
>  >  Matej: can you go into more detail about your hybrid URL statement?
>  Won't
>  >  google index, for example, /home and /home.1 if I use it?  When it
>  follows
>  >  the next page, won't the url become /home.1.2 or something?  That .2
is a
>  >  page version: If google indexes that and tries to visit it again,
won't
>  it
>  >  report about an invalid session?
>  >
>  >
>  >
>  >  -----Original Message-----
>  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
>  >  Sent: Thursday, April 03, 2008 11:10 AM
>  >  To: users@wicket.apache.org
>  >  Subject: Re: Removing the jsessionid for SEO
>  >
>  >  On the other hand, crawling non-bookmarkable pages is not very useful
>  >  anyway, since ?wicket:interface url will always get page expired when
>  >  you click on the result.
>  >
>  >  However, preserving session makes lot of sense with hybrid url. Google
>  >  remembers the original url (without page instance) while indexing the
>  >  real page (after redirect).
>  >
>  >  I think though that the crawler is quite advanced. I'm would think  it
>  >  supports cookies (at least JSESSIONID) as well as it evaluates some of
>  >  the javascript on page.
>  >
>  >  -Matej
>  >
>  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg
<ig...@gmail.com>
>  >  wrote:
>  >  > right. if you strip sessionid then all your nonbookmarkable urls
will
>  >  >  resolve to a 404. that will probably drop your rank a lot
faster....
>  >  >
>  >  >  -igor
>  >  >
>  >  >
>  >  >
>  >  >
>  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner
<jc...@gmail.com>
>  >  wrote:
>  >  >  > the problem is that then you have to have all stateless pages.
Else
>  >  google
>  >  >  >  can't crawl your website.
>  >  >  >  And if that is the case then you could be completely stateless
so
>  you
>  >  dont
>  >  >  >  have a session (id) to worry about at all.
>  >  >  >
>  >  >  >  johan
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
>  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
>  >  >  >
>  >  >  >  > When Google asks to not have special treatment for their bot,
>  they
>  >  are
>  >  >  >  > referring to content more than anything. Regarding the session
id
>  >  being
>  >  >  >  > coded in the URL, see the Technical guidelines section of
>  Google's
>  >  >  >  > Webmaster Guidelines -
>  >  >  >  >
>  >
http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>  >  >  >  > gn
>  >  >  >  >
>  >  >  >  > It specifically recommends "allow(ing) search bots to crawl
your
>  >  sites
>  >  >  >  > without session IDs or arguments that track their path through
>  the
>  >  >  >  > site."
>  >  >  >  >
>  >  >  >  > -----Original Message-----
>  >  >  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
>  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
>  >  >  >  > To: users@wicket.apache.org
>  >  >  >  > Subject: Re: Removing the jsessionid for SEO
>  >  >  >  >
>  >  >  >  > isnt google always saying that you shouldn't alter behavior of
>  your
>  >  site
>  >  >  >  > depending of it is there bot or not?
>  >  >  >  >
>  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W. <a_...@gazeta.pl>
>  >  wrote:
>  >  >  >  >
>  >  >  >  > >
>  >  >  >  > > Hi!
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  > > igor.vaynberg wrote:
>  >  >  >  > > >
>  >  >  >  > > > also by doing what you have done users with cookies
disabled
>  >  wont be
>  >  >  >  > > > able to use your site...
>  >  >  >  > > >
>  >  >  >  > >
>  >  >  >  > > In my opinion session id is a problem. Google index the same
>  page
>  >  >  >  > again
>  >  >  >  > > and
>  >  >  >  > > again.
>  >  >  >  > >
>  >  >  >  > > About the users without cookies we can do like this:
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  > >        static class Unbuffered extends WebResponse {
>  >  >  >  > >
>  >  >  >  > >                 private static final String[] botAgents = {
>  >  >  >  > "onetszukaj",
>  >  >  >  > > "googlebot",
>  >  >  >  > > "appie", "architext",
>  >  >  >  > >                        "jeeves", "bjaaland", "ferret",
>  "gulliver",
>  >  >  >  > > "harvest", "htdig",
>  >  >  >  > >                        "linkwalker", "lycos_", "moget",
>  >  >  >  > "muscatferret",
>  >  >  >  > > "myweb", "nomad",
>  >  >  >  > > "scooter",
>  >  >  >  > >                        "yahoo!\\sslurp\\schina", "slurp",
>  >  "weblayers",
>  >  >  >  > > "antibot", "bruinbot",
>  >  >  >  > > "digout4u",
>  >  >  >  > >                        "echo!", "ia_archiver", "jennybot",
>  >  "mercator",
>  >  >  >  > > "netcraft", "msnbot",
>  >  >  >  > > "petersnews",
>  >  >  >  > >                        "unlost_web_crawler", "voila",
>  "webbase",
>  >  >  >  > > "webcollage", "cfetch",
>  >  >  >  > > "zyborg",
>  >  >  >  > >                        "wisenutbot", "robot", "crawl",
"spider"
>  };
>  >  /*
>  >  >  >  > and
>  >  >  >  > > so on... */
>  >  >  >  > >
>  >  >  >  > >                public Unbuffered(final HttpServletResponse
res)
>  {
>  >  >  >  > >            super(res);
>  >  >  >  > >         }
>  >  >  >  > >
>  >  >  >  > >        @Override
>  >  >  >  > >        public CharSequence encodeURL(final CharSequence url)
{
>  >  >  >  > >             return isAgent() ? url : super.encodeURL(url);
>  >  >  >  > >        }
>  >  >  >  > >
>  >  >  >  > >                private static boolean isAgent() {
>  >  >  >  > >
>  >  >  >  > >                        String agent =
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  >
>  >
((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>  >  >  >  > tHeader("User-Agent");
>  >  >  >  > >
>  >  >  >  > >                        for(String bot : botAgents) {
>  >  >  >  > >                                if
>  >  (agent.toLowerCase().indexOf(bot) !=
>  >  >  >  > -1)
>  >  >  >  > > {
>  >  >  >  > >                                        return true;
>  >  >  >  > >                                }
>  >  >  >  > >                        }
>  >  >  >  > >
>  >  >  >  > >                        return false;
>  >  >  >  > >                }
>  >  >  >  > >    }
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  > > I didn't test this code but I do similar thing in my old
>  >  application
>  >  >  >  > in
>  >  >  >  > > Spring and it works.
>  >  >  >  > >
>  >  >  >  > > Take care,
>  >  >  >  > > Artur
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  > > --
>  >  >  >  > > View this message in context:
>  >  >  >  > >
>  >  >  >  >
>  >
http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
>  >  >  >  >
>  >
>
6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646
>  >  7396.html>
>  >  >  >
>  >  >  >
>  >  >  > > > Sent from the Wicket - User mailing list archive at
Nabble.com.
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  > >
>  >  ---------------------------------------------------------------------
>  >  >  >  > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  >  > > For additional commands, e-mail:
users-help@wicket.apache.org
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  >
>  >  >  >  > ______________
>  >  >  >  >
>  >  >  >  > The information contained in this message is proprietary
and/or
>  >  >  >  > confidential. If you are not the
>  >  >  >  > intended recipient, please: (i) delete the message and all
>  copies;
>  >  (ii) do
>  >  >  >  > not disclose,
>  >  >  >  > distribute or use the message in any manner; and (iii) notify
the
>  >  sender
>  >  >  >  > immediately. In addition,
>  >  >  >  > please be aware that any message addressed to our domain is
>  subject
>  >  to
>  >  >  >  > archiving and review by
>  >  >  >  > persons other than the intended recipient. Thank you.
>  >  >  >  > _____________
>  >  >  >  >
>  >  >  >  >
>  >  ---------------------------------------------------------------------
>  >  >  >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  >  > For additional commands, e-mail: users-help@wicket.apache.org
>  >  >  >  >
>  >  >  >  >
>  >  >  >
>  >  >
>  >  >
---------------------------------------------------------------------
>  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >  >
>  >  >
>  >
>  >
>  >
>  >  --
>  >  Resizable and reorderable grid components.
>  >  http://www.inmethod.com
>  >
>  >  ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >
>  >
>  >  ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >
>  >
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Igor Vaynberg <ig...@gmail.com>.
instead of

item.add(new link("foo") { onclick() });

do

item.add(new bookmarkablepagelink("foo", page.class));

-igor


On Thu, Apr 3, 2008 at 3:28 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
> How?  I asked how to do it before and nobody suggested this as a
>  possibility.
>
>
>
>  -----Original Message-----
>  From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com]
>  Sent: Thursday, April 03, 2008 3:26 PM
>  To: users@wicket.apache.org
>  Subject: Re: Removing the jsessionid for SEO
>
>  dataview can work in a stateless mode, just use bookmarkable links inside it
>
>  -igor
>
>
>  On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
>  > Regardless, at the very least this makes your site look "weird" and
>  >  unprofessional when google puts a jsessionid on your url.  There has got
>  to
>  >  be some negative effect when google visits it the second time and the
>  >  jsessionid has changed but it sees the same exact content.  Worst case,
>  >  it'll think you're trying to trick it.
>  >
>  >  About those 404s, I'm finding that with the fix I provided I don't get a
>  >  404, but the links refresh the page I'm already on.  IE: If I'm on A, and
>  a
>  >  link to B is non-bookmarkable, clicking B refreshes A.
>  >
>  >  This issue is very disconcerting to me.  It's one of the reasons I wish
>  that
>  >  DataView had an option to work in stateless mode.  Cause if I ban cookies
>  >  and Googlebot visits my home page (with a navigator on it), it'll try to
>  >  follow all these page links and from its perspective, they all lead back
>  to
>  >  the first page.  So it's kinda a catch-22: Include the jsessionid in the
>  >  urls and get bad SEO or remove the jsessionid and get bad SEO :(
>  >
>  >  Perhaps the answer to my prayers is a combination of the noindex/nofollow
>  >  meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the
>  home
>  >  page (so googlebot doesn't try to follow the navigator links) and use the
>  >  sitemap.xml to point out the individual pages I want it to index.
>  >
>  >
>  >  Matej: can you go into more detail about your hybrid URL statement?
>  Won't
>  >  google index, for example, /home and /home.1 if I use it?  When it
>  follows
>  >  the next page, won't the url become /home.1.2 or something?  That .2 is a
>  >  page version: If google indexes that and tries to visit it again, won't
>  it
>  >  report about an invalid session?
>  >
>  >
>  >
>  >  -----Original Message-----
>  >  From: Matej Knopp [mailto:matej.knopp@gmail.com]
>  >  Sent: Thursday, April 03, 2008 11:10 AM
>  >  To: users@wicket.apache.org
>  >  Subject: Re: Removing the jsessionid for SEO
>  >
>  >  On the other hand, crawling non-bookmarkable pages is not very useful
>  >  anyway, since ?wicket:interface url will always get page expired when
>  >  you click on the result.
>  >
>  >  However, preserving session makes lot of sense with hybrid url. Google
>  >  remembers the original url (without page instance) while indexing the
>  >  real page (after redirect).
>  >
>  >  I think though that the crawler is quite advanced. I'm would think  it
>  >  supports cookies (at least JSESSIONID) as well as it evaluates some of
>  >  the javascript on page.
>  >
>  >  -Matej
>  >
>  >  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg <ig...@gmail.com>
>  >  wrote:
>  >  > right. if you strip sessionid then all your nonbookmarkable urls will
>  >  >  resolve to a 404. that will probably drop your rank a lot faster....
>  >  >
>  >  >  -igor
>  >  >
>  >  >
>  >  >
>  >  >
>  >  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner <jc...@gmail.com>
>  >  wrote:
>  >  >  > the problem is that then you have to have all stateless pages. Else
>  >  google
>  >  >  >  can't crawl your website.
>  >  >  >  And if that is the case then you could be completely stateless so
>  you
>  >  dont
>  >  >  >  have a session (id) to worry about at all.
>  >  >  >
>  >  >  >  johan
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >
>  >  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
>  >  >  >  Larry.Zappaterrini@fnis.com> wrote:
>  >  >  >
>  >  >  >  > When Google asks to not have special treatment for their bot,
>  they
>  >  are
>  >  >  >  > referring to content more than anything. Regarding the session id
>  >  being
>  >  >  >  > coded in the URL, see the Technical guidelines section of
>  Google's
>  >  >  >  > Webmaster Guidelines -
>  >  >  >  >
>  >  http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>  >  >  >  > gn
>  >  >  >  >
>  >  >  >  > It specifically recommends "allow(ing) search bots to crawl your
>  >  sites
>  >  >  >  > without session IDs or arguments that track their path through
>  the
>  >  >  >  > site."
>  >  >  >  >
>  >  >  >  > -----Original Message-----
>  >  >  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
>  >  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
>  >  >  >  > To: users@wicket.apache.org
>  >  >  >  > Subject: Re: Removing the jsessionid for SEO
>  >  >  >  >
>  >  >  >  > isnt google always saying that you shouldn't alter behavior of
>  your
>  >  site
>  >  >  >  > depending of it is there bot or not?
>  >  >  >  >
>  >  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W. <a_...@gazeta.pl>
>  >  wrote:
>  >  >  >  >
>  >  >  >  > >
>  >  >  >  > > Hi!
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  > > igor.vaynberg wrote:
>  >  >  >  > > >
>  >  >  >  > > > also by doing what you have done users with cookies disabled
>  >  wont be
>  >  >  >  > > > able to use your site...
>  >  >  >  > > >
>  >  >  >  > >
>  >  >  >  > > In my opinion session id is a problem. Google index the same
>  page
>  >  >  >  > again
>  >  >  >  > > and
>  >  >  >  > > again.
>  >  >  >  > >
>  >  >  >  > > About the users without cookies we can do like this:
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  > >        static class Unbuffered extends WebResponse {
>  >  >  >  > >
>  >  >  >  > >                 private static final String[] botAgents = {
>  >  >  >  > "onetszukaj",
>  >  >  >  > > "googlebot",
>  >  >  >  > > "appie", "architext",
>  >  >  >  > >                        "jeeves", "bjaaland", "ferret",
>  "gulliver",
>  >  >  >  > > "harvest", "htdig",
>  >  >  >  > >                        "linkwalker", "lycos_", "moget",
>  >  >  >  > "muscatferret",
>  >  >  >  > > "myweb", "nomad",
>  >  >  >  > > "scooter",
>  >  >  >  > >                        "yahoo!\\sslurp\\schina", "slurp",
>  >  "weblayers",
>  >  >  >  > > "antibot", "bruinbot",
>  >  >  >  > > "digout4u",
>  >  >  >  > >                        "echo!", "ia_archiver", "jennybot",
>  >  "mercator",
>  >  >  >  > > "netcraft", "msnbot",
>  >  >  >  > > "petersnews",
>  >  >  >  > >                        "unlost_web_crawler", "voila",
>  "webbase",
>  >  >  >  > > "webcollage", "cfetch",
>  >  >  >  > > "zyborg",
>  >  >  >  > >                        "wisenutbot", "robot", "crawl", "spider"
>  };
>  >  /*
>  >  >  >  > and
>  >  >  >  > > so on... */
>  >  >  >  > >
>  >  >  >  > >                public Unbuffered(final HttpServletResponse res)
>  {
>  >  >  >  > >            super(res);
>  >  >  >  > >         }
>  >  >  >  > >
>  >  >  >  > >        @Override
>  >  >  >  > >        public CharSequence encodeURL(final CharSequence url) {
>  >  >  >  > >             return isAgent() ? url : super.encodeURL(url);
>  >  >  >  > >        }
>  >  >  >  > >
>  >  >  >  > >                private static boolean isAgent() {
>  >  >  >  > >
>  >  >  >  > >                        String agent =
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  >
>  >  ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>  >  >  >  > tHeader("User-Agent");
>  >  >  >  > >
>  >  >  >  > >                        for(String bot : botAgents) {
>  >  >  >  > >                                if
>  >  (agent.toLowerCase().indexOf(bot) !=
>  >  >  >  > -1)
>  >  >  >  > > {
>  >  >  >  > >                                        return true;
>  >  >  >  > >                                }
>  >  >  >  > >                        }
>  >  >  >  > >
>  >  >  >  > >                        return false;
>  >  >  >  > >                }
>  >  >  >  > >    }
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  > > I didn't test this code but I do similar thing in my old
>  >  application
>  >  >  >  > in
>  >  >  >  > > Spring and it works.
>  >  >  >  > >
>  >  >  >  > > Take care,
>  >  >  >  > > Artur
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  > > --
>  >  >  >  > > View this message in context:
>  >  >  >  > >
>  >  >  >  >
>  >  http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
>  >  >  >  >
>  >
>  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646
>  >  7396.html>
>  >  >  >
>  >  >  >
>  >  >  > > > Sent from the Wicket - User mailing list archive at Nabble.com.
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  > >
>  >  ---------------------------------------------------------------------
>  >  >  >  > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  >  > > For additional commands, e-mail: users-help@wicket.apache.org
>  >  >  >  > >
>  >  >  >  > >
>  >  >  >  >
>  >  >  >  > ______________
>  >  >  >  >
>  >  >  >  > The information contained in this message is proprietary and/or
>  >  >  >  > confidential. If you are not the
>  >  >  >  > intended recipient, please: (i) delete the message and all
>  copies;
>  >  (ii) do
>  >  >  >  > not disclose,
>  >  >  >  > distribute or use the message in any manner; and (iii) notify the
>  >  sender
>  >  >  >  > immediately. In addition,
>  >  >  >  > please be aware that any message addressed to our domain is
>  subject
>  >  to
>  >  >  >  > archiving and review by
>  >  >  >  > persons other than the intended recipient. Thank you.
>  >  >  >  > _____________
>  >  >  >  >
>  >  >  >  >
>  >  ---------------------------------------------------------------------
>  >  >  >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  >  > For additional commands, e-mail: users-help@wicket.apache.org
>  >  >  >  >
>  >  >  >  >
>  >  >  >
>  >  >
>  >  >  ---------------------------------------------------------------------
>  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >  >
>  >  >
>  >
>  >
>  >
>  >  --
>  >  Resizable and reorderable grid components.
>  >  http://www.inmethod.com
>  >
>  >  ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >
>  >
>  >  ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >
>  >
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


RE: Removing the jsessionid for SEO

Posted by Dan Kaplan <dk...@citizenhawk.com>.
How?  I asked how to do it before and nobody suggested this as a
possibility.  

-----Original Message-----
From: Igor Vaynberg [mailto:igor.vaynberg@gmail.com] 
Sent: Thursday, April 03, 2008 3:26 PM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

dataview can work in a stateless mode, just use bookmarkable links inside it

-igor


On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
> Regardless, at the very least this makes your site look "weird" and
>  unprofessional when google puts a jsessionid on your url.  There has got
to
>  be some negative effect when google visits it the second time and the
>  jsessionid has changed but it sees the same exact content.  Worst case,
>  it'll think you're trying to trick it.
>
>  About those 404s, I'm finding that with the fix I provided I don't get a
>  404, but the links refresh the page I'm already on.  IE: If I'm on A, and
a
>  link to B is non-bookmarkable, clicking B refreshes A.
>
>  This issue is very disconcerting to me.  It's one of the reasons I wish
that
>  DataView had an option to work in stateless mode.  Cause if I ban cookies
>  and Googlebot visits my home page (with a navigator on it), it'll try to
>  follow all these page links and from its perspective, they all lead back
to
>  the first page.  So it's kinda a catch-22: Include the jsessionid in the
>  urls and get bad SEO or remove the jsessionid and get bad SEO :(
>
>  Perhaps the answer to my prayers is a combination of the noindex/nofollow
>  meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the
home
>  page (so googlebot doesn't try to follow the navigator links) and use the
>  sitemap.xml to point out the individual pages I want it to index.
>
>
>  Matej: can you go into more detail about your hybrid URL statement?
Won't
>  google index, for example, /home and /home.1 if I use it?  When it
follows
>  the next page, won't the url become /home.1.2 or something?  That .2 is a
>  page version: If google indexes that and tries to visit it again, won't
it
>  report about an invalid session?
>
>
>
>  -----Original Message-----
>  From: Matej Knopp [mailto:matej.knopp@gmail.com]
>  Sent: Thursday, April 03, 2008 11:10 AM
>  To: users@wicket.apache.org
>  Subject: Re: Removing the jsessionid for SEO
>
>  On the other hand, crawling non-bookmarkable pages is not very useful
>  anyway, since ?wicket:interface url will always get page expired when
>  you click on the result.
>
>  However, preserving session makes lot of sense with hybrid url. Google
>  remembers the original url (without page instance) while indexing the
>  real page (after redirect).
>
>  I think though that the crawler is quite advanced. I'm would think  it
>  supports cookies (at least JSESSIONID) as well as it evaluates some of
>  the javascript on page.
>
>  -Matej
>
>  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg <ig...@gmail.com>
>  wrote:
>  > right. if you strip sessionid then all your nonbookmarkable urls will
>  >  resolve to a 404. that will probably drop your rank a lot faster....
>  >
>  >  -igor
>  >
>  >
>  >
>  >
>  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner <jc...@gmail.com>
>  wrote:
>  >  > the problem is that then you have to have all stateless pages. Else
>  google
>  >  >  can't crawl your website.
>  >  >  And if that is the case then you could be completely stateless so
you
>  dont
>  >  >  have a session (id) to worry about at all.
>  >  >
>  >  >  johan
>  >  >
>  >  >
>  >  >
>  >  >
>  >  >
>  >  >
>  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
>  >  >  Larry.Zappaterrini@fnis.com> wrote:
>  >  >
>  >  >  > When Google asks to not have special treatment for their bot,
they
>  are
>  >  >  > referring to content more than anything. Regarding the session id
>  being
>  >  >  > coded in the URL, see the Technical guidelines section of
Google's
>  >  >  > Webmaster Guidelines -
>  >  >  >
>  http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>  >  >  > gn
>  >  >  >
>  >  >  > It specifically recommends "allow(ing) search bots to crawl your
>  sites
>  >  >  > without session IDs or arguments that track their path through
the
>  >  >  > site."
>  >  >  >
>  >  >  > -----Original Message-----
>  >  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
>  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
>  >  >  > To: users@wicket.apache.org
>  >  >  > Subject: Re: Removing the jsessionid for SEO
>  >  >  >
>  >  >  > isnt google always saying that you shouldn't alter behavior of
your
>  site
>  >  >  > depending of it is there bot or not?
>  >  >  >
>  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W. <a_...@gazeta.pl>
>  wrote:
>  >  >  >
>  >  >  > >
>  >  >  > > Hi!
>  >  >  > >
>  >  >  > >
>  >  >  > > igor.vaynberg wrote:
>  >  >  > > >
>  >  >  > > > also by doing what you have done users with cookies disabled
>  wont be
>  >  >  > > > able to use your site...
>  >  >  > > >
>  >  >  > >
>  >  >  > > In my opinion session id is a problem. Google index the same
page
>  >  >  > again
>  >  >  > > and
>  >  >  > > again.
>  >  >  > >
>  >  >  > > About the users without cookies we can do like this:
>  >  >  > >
>  >  >  > >
>  >  >  > >        static class Unbuffered extends WebResponse {
>  >  >  > >
>  >  >  > >                 private static final String[] botAgents = {
>  >  >  > "onetszukaj",
>  >  >  > > "googlebot",
>  >  >  > > "appie", "architext",
>  >  >  > >                        "jeeves", "bjaaland", "ferret",
"gulliver",
>  >  >  > > "harvest", "htdig",
>  >  >  > >                        "linkwalker", "lycos_", "moget",
>  >  >  > "muscatferret",
>  >  >  > > "myweb", "nomad",
>  >  >  > > "scooter",
>  >  >  > >                        "yahoo!\\sslurp\\schina", "slurp",
>  "weblayers",
>  >  >  > > "antibot", "bruinbot",
>  >  >  > > "digout4u",
>  >  >  > >                        "echo!", "ia_archiver", "jennybot",
>  "mercator",
>  >  >  > > "netcraft", "msnbot",
>  >  >  > > "petersnews",
>  >  >  > >                        "unlost_web_crawler", "voila",
"webbase",
>  >  >  > > "webcollage", "cfetch",
>  >  >  > > "zyborg",
>  >  >  > >                        "wisenutbot", "robot", "crawl", "spider"
};
>  /*
>  >  >  > and
>  >  >  > > so on... */
>  >  >  > >
>  >  >  > >                public Unbuffered(final HttpServletResponse res)
{
>  >  >  > >            super(res);
>  >  >  > >         }
>  >  >  > >
>  >  >  > >        @Override
>  >  >  > >        public CharSequence encodeURL(final CharSequence url) {
>  >  >  > >             return isAgent() ? url : super.encodeURL(url);
>  >  >  > >        }
>  >  >  > >
>  >  >  > >                private static boolean isAgent() {
>  >  >  > >
>  >  >  > >                        String agent =
>  >  >  > >
>  >  >  > >
>  >  >  >
>  ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>  >  >  > tHeader("User-Agent");
>  >  >  > >
>  >  >  > >                        for(String bot : botAgents) {
>  >  >  > >                                if
>  (agent.toLowerCase().indexOf(bot) !=
>  >  >  > -1)
>  >  >  > > {
>  >  >  > >                                        return true;
>  >  >  > >                                }
>  >  >  > >                        }
>  >  >  > >
>  >  >  > >                        return false;
>  >  >  > >                }
>  >  >  > >    }
>  >  >  > >
>  >  >  > >
>  >  >  > > I didn't test this code but I do similar thing in my old
>  application
>  >  >  > in
>  >  >  > > Spring and it works.
>  >  >  > >
>  >  >  > > Take care,
>  >  >  > > Artur
>  >  >  > >
>  >  >  > >
>  >  >  > > --
>  >  >  > > View this message in context:
>  >  >  > >
>  >  >  >
>  http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
>  >  >  >
>
6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646
>  7396.html>
>  >  >
>  >  >
>  >  > > > Sent from the Wicket - User mailing list archive at Nabble.com.
>  >  >  > >
>  >  >  > >
>  >  >  > >
>  ---------------------------------------------------------------------
>  >  >  > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  > > For additional commands, e-mail: users-help@wicket.apache.org
>  >  >  > >
>  >  >  > >
>  >  >  >
>  >  >  > ______________
>  >  >  >
>  >  >  > The information contained in this message is proprietary and/or
>  >  >  > confidential. If you are not the
>  >  >  > intended recipient, please: (i) delete the message and all
copies;
>  (ii) do
>  >  >  > not disclose,
>  >  >  > distribute or use the message in any manner; and (iii) notify the
>  sender
>  >  >  > immediately. In addition,
>  >  >  > please be aware that any message addressed to our domain is
subject
>  to
>  >  >  > archiving and review by
>  >  >  > persons other than the intended recipient. Thank you.
>  >  >  > _____________
>  >  >  >
>  >  >  >
>  ---------------------------------------------------------------------
>  >  >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  > For additional commands, e-mail: users-help@wicket.apache.org
>  >  >  >
>  >  >  >
>  >  >
>  >
>  >  ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >
>  >
>
>
>
>  --
>  Resizable and reorderable grid components.
>  http://www.inmethod.com
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Igor Vaynberg <ig...@gmail.com>.
dataview can work in a stateless mode, just use bookmarkable links inside it

-igor


On Thu, Apr 3, 2008 at 3:22 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
> Regardless, at the very least this makes your site look "weird" and
>  unprofessional when google puts a jsessionid on your url.  There has got to
>  be some negative effect when google visits it the second time and the
>  jsessionid has changed but it sees the same exact content.  Worst case,
>  it'll think you're trying to trick it.
>
>  About those 404s, I'm finding that with the fix I provided I don't get a
>  404, but the links refresh the page I'm already on.  IE: If I'm on A, and a
>  link to B is non-bookmarkable, clicking B refreshes A.
>
>  This issue is very disconcerting to me.  It's one of the reasons I wish that
>  DataView had an option to work in stateless mode.  Cause if I ban cookies
>  and Googlebot visits my home page (with a navigator on it), it'll try to
>  follow all these page links and from its perspective, they all lead back to
>  the first page.  So it's kinda a catch-22: Include the jsessionid in the
>  urls and get bad SEO or remove the jsessionid and get bad SEO :(
>
>  Perhaps the answer to my prayers is a combination of the noindex/nofollow
>  meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the home
>  page (so googlebot doesn't try to follow the navigator links) and use the
>  sitemap.xml to point out the individual pages I want it to index.
>
>
>  Matej: can you go into more detail about your hybrid URL statement?  Won't
>  google index, for example, /home and /home.1 if I use it?  When it follows
>  the next page, won't the url become /home.1.2 or something?  That .2 is a
>  page version: If google indexes that and tries to visit it again, won't it
>  report about an invalid session?
>
>
>
>  -----Original Message-----
>  From: Matej Knopp [mailto:matej.knopp@gmail.com]
>  Sent: Thursday, April 03, 2008 11:10 AM
>  To: users@wicket.apache.org
>  Subject: Re: Removing the jsessionid for SEO
>
>  On the other hand, crawling non-bookmarkable pages is not very useful
>  anyway, since ?wicket:interface url will always get page expired when
>  you click on the result.
>
>  However, preserving session makes lot of sense with hybrid url. Google
>  remembers the original url (without page instance) while indexing the
>  real page (after redirect).
>
>  I think though that the crawler is quite advanced. I'm would think  it
>  supports cookies (at least JSESSIONID) as well as it evaluates some of
>  the javascript on page.
>
>  -Matej
>
>  On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg <ig...@gmail.com>
>  wrote:
>  > right. if you strip sessionid then all your nonbookmarkable urls will
>  >  resolve to a 404. that will probably drop your rank a lot faster....
>  >
>  >  -igor
>  >
>  >
>  >
>  >
>  >  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner <jc...@gmail.com>
>  wrote:
>  >  > the problem is that then you have to have all stateless pages. Else
>  google
>  >  >  can't crawl your website.
>  >  >  And if that is the case then you could be completely stateless so you
>  dont
>  >  >  have a session (id) to worry about at all.
>  >  >
>  >  >  johan
>  >  >
>  >  >
>  >  >
>  >  >
>  >  >
>  >  >
>  >  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
>  >  >  Larry.Zappaterrini@fnis.com> wrote:
>  >  >
>  >  >  > When Google asks to not have special treatment for their bot, they
>  are
>  >  >  > referring to content more than anything. Regarding the session id
>  being
>  >  >  > coded in the URL, see the Technical guidelines section of Google's
>  >  >  > Webmaster Guidelines -
>  >  >  >
>  http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>  >  >  > gn
>  >  >  >
>  >  >  > It specifically recommends "allow(ing) search bots to crawl your
>  sites
>  >  >  > without session IDs or arguments that track their path through the
>  >  >  > site."
>  >  >  >
>  >  >  > -----Original Message-----
>  >  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
>  >  >  > Sent: Thursday, April 03, 2008 7:35 AM
>  >  >  > To: users@wicket.apache.org
>  >  >  > Subject: Re: Removing the jsessionid for SEO
>  >  >  >
>  >  >  > isnt google always saying that you shouldn't alter behavior of your
>  site
>  >  >  > depending of it is there bot or not?
>  >  >  >
>  >  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W. <a_...@gazeta.pl>
>  wrote:
>  >  >  >
>  >  >  > >
>  >  >  > > Hi!
>  >  >  > >
>  >  >  > >
>  >  >  > > igor.vaynberg wrote:
>  >  >  > > >
>  >  >  > > > also by doing what you have done users with cookies disabled
>  wont be
>  >  >  > > > able to use your site...
>  >  >  > > >
>  >  >  > >
>  >  >  > > In my opinion session id is a problem. Google index the same page
>  >  >  > again
>  >  >  > > and
>  >  >  > > again.
>  >  >  > >
>  >  >  > > About the users without cookies we can do like this:
>  >  >  > >
>  >  >  > >
>  >  >  > >        static class Unbuffered extends WebResponse {
>  >  >  > >
>  >  >  > >                 private static final String[] botAgents = {
>  >  >  > "onetszukaj",
>  >  >  > > "googlebot",
>  >  >  > > "appie", "architext",
>  >  >  > >                        "jeeves", "bjaaland", "ferret", "gulliver",
>  >  >  > > "harvest", "htdig",
>  >  >  > >                        "linkwalker", "lycos_", "moget",
>  >  >  > "muscatferret",
>  >  >  > > "myweb", "nomad",
>  >  >  > > "scooter",
>  >  >  > >                        "yahoo!\\sslurp\\schina", "slurp",
>  "weblayers",
>  >  >  > > "antibot", "bruinbot",
>  >  >  > > "digout4u",
>  >  >  > >                        "echo!", "ia_archiver", "jennybot",
>  "mercator",
>  >  >  > > "netcraft", "msnbot",
>  >  >  > > "petersnews",
>  >  >  > >                        "unlost_web_crawler", "voila", "webbase",
>  >  >  > > "webcollage", "cfetch",
>  >  >  > > "zyborg",
>  >  >  > >                        "wisenutbot", "robot", "crawl", "spider" };
>  /*
>  >  >  > and
>  >  >  > > so on... */
>  >  >  > >
>  >  >  > >                public Unbuffered(final HttpServletResponse res) {
>  >  >  > >            super(res);
>  >  >  > >         }
>  >  >  > >
>  >  >  > >        @Override
>  >  >  > >        public CharSequence encodeURL(final CharSequence url) {
>  >  >  > >             return isAgent() ? url : super.encodeURL(url);
>  >  >  > >        }
>  >  >  > >
>  >  >  > >                private static boolean isAgent() {
>  >  >  > >
>  >  >  > >                        String agent =
>  >  >  > >
>  >  >  > >
>  >  >  >
>  ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>  >  >  > tHeader("User-Agent");
>  >  >  > >
>  >  >  > >                        for(String bot : botAgents) {
>  >  >  > >                                if
>  (agent.toLowerCase().indexOf(bot) !=
>  >  >  > -1)
>  >  >  > > {
>  >  >  > >                                        return true;
>  >  >  > >                                }
>  >  >  > >                        }
>  >  >  > >
>  >  >  > >                        return false;
>  >  >  > >                }
>  >  >  > >    }
>  >  >  > >
>  >  >  > >
>  >  >  > > I didn't test this code but I do similar thing in my old
>  application
>  >  >  > in
>  >  >  > > Spring and it works.
>  >  >  > >
>  >  >  > > Take care,
>  >  >  > > Artur
>  >  >  > >
>  >  >  > >
>  >  >  > > --
>  >  >  > > View this message in context:
>  >  >  > >
>  >  >  >
>  http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
>  >  >  >
>  6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646
>  7396.html>
>  >  >
>  >  >
>  >  > > > Sent from the Wicket - User mailing list archive at Nabble.com.
>  >  >  > >
>  >  >  > >
>  >  >  > >
>  ---------------------------------------------------------------------
>  >  >  > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  > > For additional commands, e-mail: users-help@wicket.apache.org
>  >  >  > >
>  >  >  > >
>  >  >  >
>  >  >  > ______________
>  >  >  >
>  >  >  > The information contained in this message is proprietary and/or
>  >  >  > confidential. If you are not the
>  >  >  > intended recipient, please: (i) delete the message and all copies;
>  (ii) do
>  >  >  > not disclose,
>  >  >  > distribute or use the message in any manner; and (iii) notify the
>  sender
>  >  >  > immediately. In addition,
>  >  >  > please be aware that any message addressed to our domain is subject
>  to
>  >  >  > archiving and review by
>  >  >  > persons other than the intended recipient. Thank you.
>  >  >  > _____________
>  >  >  >
>  >  >  >
>  ---------------------------------------------------------------------
>  >  >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  > For additional commands, e-mail: users-help@wicket.apache.org
>  >  >  >
>  >  >  >
>  >  >
>  >
>  >  ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >
>  >
>
>
>
>  --
>  Resizable and reorderable grid components.
>  http://www.inmethod.com
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


RE: Removing the jsessionid for SEO

Posted by Dan Kaplan <dk...@citizenhawk.com>.
Regardless, at the very least this makes your site look "weird" and
unprofessional when google puts a jsessionid on your url.  There has got to
be some negative effect when google visits it the second time and the
jsessionid has changed but it sees the same exact content.  Worst case,
it'll think you're trying to trick it.

About those 404s, I'm finding that with the fix I provided I don't get a
404, but the links refresh the page I'm already on.  IE: If I'm on A, and a
link to B is non-bookmarkable, clicking B refreshes A.  

This issue is very disconcerting to me.  It's one of the reasons I wish that
DataView had an option to work in stateless mode.  Cause if I ban cookies
and Googlebot visits my home page (with a navigator on it), it'll try to
follow all these page links and from its perspective, they all lead back to
the first page.  So it's kinda a catch-22: Include the jsessionid in the
urls and get bad SEO or remove the jsessionid and get bad SEO :(

Perhaps the answer to my prayers is a combination of the noindex/nofollow
meta tag with a sitemap.xml.  I'm thinking I can put a nofollow on the home
page (so googlebot doesn't try to follow the navigator links) and use the
sitemap.xml to point out the individual pages I want it to index.  


Matej: can you go into more detail about your hybrid URL statement?  Won't
google index, for example, /home and /home.1 if I use it?  When it follows
the next page, won't the url become /home.1.2 or something?  That .2 is a
page version: If google indexes that and tries to visit it again, won't it
report about an invalid session?  
 
-----Original Message-----
From: Matej Knopp [mailto:matej.knopp@gmail.com] 
Sent: Thursday, April 03, 2008 11:10 AM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

On the other hand, crawling non-bookmarkable pages is not very useful
anyway, since ?wicket:interface url will always get page expired when
you click on the result.

However, preserving session makes lot of sense with hybrid url. Google
remembers the original url (without page instance) while indexing the
real page (after redirect).

I think though that the crawler is quite advanced. I'm would think  it
supports cookies (at least JSESSIONID) as well as it evaluates some of
the javascript on page.

-Matej

On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg <ig...@gmail.com>
wrote:
> right. if you strip sessionid then all your nonbookmarkable urls will
>  resolve to a 404. that will probably drop your rank a lot faster....
>
>  -igor
>
>
>
>
>  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner <jc...@gmail.com>
wrote:
>  > the problem is that then you have to have all stateless pages. Else
google
>  >  can't crawl your website.
>  >  And if that is the case then you could be completely stateless so you
dont
>  >  have a session (id) to worry about at all.
>  >
>  >  johan
>  >
>  >
>  >
>  >
>  >
>  >
>  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
>  >  Larry.Zappaterrini@fnis.com> wrote:
>  >
>  >  > When Google asks to not have special treatment for their bot, they
are
>  >  > referring to content more than anything. Regarding the session id
being
>  >  > coded in the URL, see the Technical guidelines section of Google's
>  >  > Webmaster Guidelines -
>  >  >
http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>  >  > gn
>  >  >
>  >  > It specifically recommends "allow(ing) search bots to crawl your
sites
>  >  > without session IDs or arguments that track their path through the
>  >  > site."
>  >  >
>  >  > -----Original Message-----
>  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
>  >  > Sent: Thursday, April 03, 2008 7:35 AM
>  >  > To: users@wicket.apache.org
>  >  > Subject: Re: Removing the jsessionid for SEO
>  >  >
>  >  > isnt google always saying that you shouldn't alter behavior of your
site
>  >  > depending of it is there bot or not?
>  >  >
>  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W. <a_...@gazeta.pl>
wrote:
>  >  >
>  >  > >
>  >  > > Hi!
>  >  > >
>  >  > >
>  >  > > igor.vaynberg wrote:
>  >  > > >
>  >  > > > also by doing what you have done users with cookies disabled
wont be
>  >  > > > able to use your site...
>  >  > > >
>  >  > >
>  >  > > In my opinion session id is a problem. Google index the same page
>  >  > again
>  >  > > and
>  >  > > again.
>  >  > >
>  >  > > About the users without cookies we can do like this:
>  >  > >
>  >  > >
>  >  > >        static class Unbuffered extends WebResponse {
>  >  > >
>  >  > >                 private static final String[] botAgents = {
>  >  > "onetszukaj",
>  >  > > "googlebot",
>  >  > > "appie", "architext",
>  >  > >                        "jeeves", "bjaaland", "ferret", "gulliver",
>  >  > > "harvest", "htdig",
>  >  > >                        "linkwalker", "lycos_", "moget",
>  >  > "muscatferret",
>  >  > > "myweb", "nomad",
>  >  > > "scooter",
>  >  > >                        "yahoo!\\sslurp\\schina", "slurp",
"weblayers",
>  >  > > "antibot", "bruinbot",
>  >  > > "digout4u",
>  >  > >                        "echo!", "ia_archiver", "jennybot",
"mercator",
>  >  > > "netcraft", "msnbot",
>  >  > > "petersnews",
>  >  > >                        "unlost_web_crawler", "voila", "webbase",
>  >  > > "webcollage", "cfetch",
>  >  > > "zyborg",
>  >  > >                        "wisenutbot", "robot", "crawl", "spider" };
/*
>  >  > and
>  >  > > so on... */
>  >  > >
>  >  > >                public Unbuffered(final HttpServletResponse res) {
>  >  > >            super(res);
>  >  > >         }
>  >  > >
>  >  > >        @Override
>  >  > >        public CharSequence encodeURL(final CharSequence url) {
>  >  > >             return isAgent() ? url : super.encodeURL(url);
>  >  > >        }
>  >  > >
>  >  > >                private static boolean isAgent() {
>  >  > >
>  >  > >                        String agent =
>  >  > >
>  >  > >
>  >  >
((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>  >  > tHeader("User-Agent");
>  >  > >
>  >  > >                        for(String bot : botAgents) {
>  >  > >                                if
(agent.toLowerCase().indexOf(bot) !=
>  >  > -1)
>  >  > > {
>  >  > >                                        return true;
>  >  > >                                }
>  >  > >                        }
>  >  > >
>  >  > >                        return false;
>  >  > >                }
>  >  > >    }
>  >  > >
>  >  > >
>  >  > > I didn't test this code but I do similar thing in my old
application
>  >  > in
>  >  > > Spring and it works.
>  >  > >
>  >  > > Take care,
>  >  > > Artur
>  >  > >
>  >  > >
>  >  > > --
>  >  > > View this message in context:
>  >  > >
>  >  >
http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
>  >  >
6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646
7396.html>
>  >
>  >
>  > > > Sent from the Wicket - User mailing list archive at Nabble.com.
>  >  > >
>  >  > >
>  >  > >
---------------------------------------------------------------------
>  >  > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  > > For additional commands, e-mail: users-help@wicket.apache.org
>  >  > >
>  >  > >
>  >  >
>  >  > ______________
>  >  >
>  >  > The information contained in this message is proprietary and/or
>  >  > confidential. If you are not the
>  >  > intended recipient, please: (i) delete the message and all copies;
(ii) do
>  >  > not disclose,
>  >  > distribute or use the message in any manner; and (iii) notify the
sender
>  >  > immediately. In addition,
>  >  > please be aware that any message addressed to our domain is subject
to
>  >  > archiving and review by
>  >  > persons other than the intended recipient. Thank you.
>  >  > _____________
>  >  >
>  >  >
---------------------------------------------------------------------
>  >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  > For additional commands, e-mail: users-help@wicket.apache.org
>  >  >
>  >  >
>  >
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>



-- 
Resizable and reorderable grid components.
http://www.inmethod.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Matej Knopp <ma...@gmail.com>.
On the other hand, crawling non-bookmarkable pages is not very useful
anyway, since ?wicket:interface url will always get page expired when
you click on the result.

However, preserving session makes lot of sense with hybrid url. Google
remembers the original url (without page instance) while indexing the
real page (after redirect).

I think though that the crawler is quite advanced. I'm would think  it
supports cookies (at least JSESSIONID) as well as it evaluates some of
the javascript on page.

-Matej

On Thu, Apr 3, 2008 at 6:56 PM, Igor Vaynberg <ig...@gmail.com> wrote:
> right. if you strip sessionid then all your nonbookmarkable urls will
>  resolve to a 404. that will probably drop your rank a lot faster....
>
>  -igor
>
>
>
>
>  On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner <jc...@gmail.com> wrote:
>  > the problem is that then you have to have all stateless pages. Else google
>  >  can't crawl your website.
>  >  And if that is the case then you could be completely stateless so you dont
>  >  have a session (id) to worry about at all.
>  >
>  >  johan
>  >
>  >
>  >
>  >
>  >
>  >
>  >  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
>  >  Larry.Zappaterrini@fnis.com> wrote:
>  >
>  >  > When Google asks to not have special treatment for their bot, they are
>  >  > referring to content more than anything. Regarding the session id being
>  >  > coded in the URL, see the Technical guidelines section of Google's
>  >  > Webmaster Guidelines -
>  >  > http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>  >  > gn
>  >  >
>  >  > It specifically recommends "allow(ing) search bots to crawl your sites
>  >  > without session IDs or arguments that track their path through the
>  >  > site."
>  >  >
>  >  > -----Original Message-----
>  >  > From: Johan Compagner [mailto:jcompagner@gmail.com]
>  >  > Sent: Thursday, April 03, 2008 7:35 AM
>  >  > To: users@wicket.apache.org
>  >  > Subject: Re: Removing the jsessionid for SEO
>  >  >
>  >  > isnt google always saying that you shouldn't alter behavior of your site
>  >  > depending of it is there bot or not?
>  >  >
>  >  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W. <a_...@gazeta.pl> wrote:
>  >  >
>  >  > >
>  >  > > Hi!
>  >  > >
>  >  > >
>  >  > > igor.vaynberg wrote:
>  >  > > >
>  >  > > > also by doing what you have done users with cookies disabled wont be
>  >  > > > able to use your site...
>  >  > > >
>  >  > >
>  >  > > In my opinion session id is a problem. Google index the same page
>  >  > again
>  >  > > and
>  >  > > again.
>  >  > >
>  >  > > About the users without cookies we can do like this:
>  >  > >
>  >  > >
>  >  > >        static class Unbuffered extends WebResponse {
>  >  > >
>  >  > >                 private static final String[] botAgents = {
>  >  > "onetszukaj",
>  >  > > "googlebot",
>  >  > > "appie", "architext",
>  >  > >                        "jeeves", "bjaaland", "ferret", "gulliver",
>  >  > > "harvest", "htdig",
>  >  > >                        "linkwalker", "lycos_", "moget",
>  >  > "muscatferret",
>  >  > > "myweb", "nomad",
>  >  > > "scooter",
>  >  > >                        "yahoo!\\sslurp\\schina", "slurp", "weblayers",
>  >  > > "antibot", "bruinbot",
>  >  > > "digout4u",
>  >  > >                        "echo!", "ia_archiver", "jennybot", "mercator",
>  >  > > "netcraft", "msnbot",
>  >  > > "petersnews",
>  >  > >                        "unlost_web_crawler", "voila", "webbase",
>  >  > > "webcollage", "cfetch",
>  >  > > "zyborg",
>  >  > >                        "wisenutbot", "robot", "crawl", "spider" }; /*
>  >  > and
>  >  > > so on... */
>  >  > >
>  >  > >                public Unbuffered(final HttpServletResponse res) {
>  >  > >            super(res);
>  >  > >         }
>  >  > >
>  >  > >        @Override
>  >  > >        public CharSequence encodeURL(final CharSequence url) {
>  >  > >             return isAgent() ? url : super.encodeURL(url);
>  >  > >        }
>  >  > >
>  >  > >                private static boolean isAgent() {
>  >  > >
>  >  > >                        String agent =
>  >  > >
>  >  > >
>  >  > ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>  >  > tHeader("User-Agent");
>  >  > >
>  >  > >                        for(String bot : botAgents) {
>  >  > >                                if (agent.toLowerCase().indexOf(bot) !=
>  >  > -1)
>  >  > > {
>  >  > >                                        return true;
>  >  > >                                }
>  >  > >                        }
>  >  > >
>  >  > >                        return false;
>  >  > >                }
>  >  > >    }
>  >  > >
>  >  > >
>  >  > > I didn't test this code but I do similar thing in my old application
>  >  > in
>  >  > > Spring and it works.
>  >  > >
>  >  > > Take care,
>  >  > > Artur
>  >  > >
>  >  > >
>  >  > > --
>  >  > > View this message in context:
>  >  > >
>  >  > http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
>  >  > 6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html>
>  >
>  >
>  > > > Sent from the Wicket - User mailing list archive at Nabble.com.
>  >  > >
>  >  > >
>  >  > > ---------------------------------------------------------------------
>  >  > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  > > For additional commands, e-mail: users-help@wicket.apache.org
>  >  > >
>  >  > >
>  >  >
>  >  > ______________
>  >  >
>  >  > The information contained in this message is proprietary and/or
>  >  > confidential. If you are not the
>  >  > intended recipient, please: (i) delete the message and all copies; (ii) do
>  >  > not disclose,
>  >  > distribute or use the message in any manner; and (iii) notify the sender
>  >  > immediately. In addition,
>  >  > please be aware that any message addressed to our domain is subject to
>  >  > archiving and review by
>  >  > persons other than the intended recipient. Thank you.
>  >  > _____________
>  >  >
>  >  > ---------------------------------------------------------------------
>  >  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  > For additional commands, e-mail: users-help@wicket.apache.org
>  >  >
>  >  >
>  >
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>



-- 
Resizable and reorderable grid components.
http://www.inmethod.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Igor Vaynberg <ig...@gmail.com>.
right. if you strip sessionid then all your nonbookmarkable urls will
resolve to a 404. that will probably drop your rank a lot faster....

-igor


On Thu, Apr 3, 2008 at 9:16 AM, Johan Compagner <jc...@gmail.com> wrote:
> the problem is that then you have to have all stateless pages. Else google
>  can't crawl your website.
>  And if that is the case then you could be completely stateless so you dont
>  have a session (id) to worry about at all.
>
>  johan
>
>
>
>
>
>
>  On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
>  Larry.Zappaterrini@fnis.com> wrote:
>
>  > When Google asks to not have special treatment for their bot, they are
>  > referring to content more than anything. Regarding the session id being
>  > coded in the URL, see the Technical guidelines section of Google's
>  > Webmaster Guidelines -
>  > http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
>  > gn
>  >
>  > It specifically recommends "allow(ing) search bots to crawl your sites
>  > without session IDs or arguments that track their path through the
>  > site."
>  >
>  > -----Original Message-----
>  > From: Johan Compagner [mailto:jcompagner@gmail.com]
>  > Sent: Thursday, April 03, 2008 7:35 AM
>  > To: users@wicket.apache.org
>  > Subject: Re: Removing the jsessionid for SEO
>  >
>  > isnt google always saying that you shouldn't alter behavior of your site
>  > depending of it is there bot or not?
>  >
>  > On Thu, Apr 3, 2008 at 1:00 PM, Artur W. <a_...@gazeta.pl> wrote:
>  >
>  > >
>  > > Hi!
>  > >
>  > >
>  > > igor.vaynberg wrote:
>  > > >
>  > > > also by doing what you have done users with cookies disabled wont be
>  > > > able to use your site...
>  > > >
>  > >
>  > > In my opinion session id is a problem. Google index the same page
>  > again
>  > > and
>  > > again.
>  > >
>  > > About the users without cookies we can do like this:
>  > >
>  > >
>  > >        static class Unbuffered extends WebResponse {
>  > >
>  > >                 private static final String[] botAgents = {
>  > "onetszukaj",
>  > > "googlebot",
>  > > "appie", "architext",
>  > >                        "jeeves", "bjaaland", "ferret", "gulliver",
>  > > "harvest", "htdig",
>  > >                        "linkwalker", "lycos_", "moget",
>  > "muscatferret",
>  > > "myweb", "nomad",
>  > > "scooter",
>  > >                        "yahoo!\\sslurp\\schina", "slurp", "weblayers",
>  > > "antibot", "bruinbot",
>  > > "digout4u",
>  > >                        "echo!", "ia_archiver", "jennybot", "mercator",
>  > > "netcraft", "msnbot",
>  > > "petersnews",
>  > >                        "unlost_web_crawler", "voila", "webbase",
>  > > "webcollage", "cfetch",
>  > > "zyborg",
>  > >                        "wisenutbot", "robot", "crawl", "spider" }; /*
>  > and
>  > > so on... */
>  > >
>  > >                public Unbuffered(final HttpServletResponse res) {
>  > >            super(res);
>  > >         }
>  > >
>  > >        @Override
>  > >        public CharSequence encodeURL(final CharSequence url) {
>  > >             return isAgent() ? url : super.encodeURL(url);
>  > >        }
>  > >
>  > >                private static boolean isAgent() {
>  > >
>  > >                        String agent =
>  > >
>  > >
>  > ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
>  > tHeader("User-Agent");
>  > >
>  > >                        for(String bot : botAgents) {
>  > >                                if (agent.toLowerCase().indexOf(bot) !=
>  > -1)
>  > > {
>  > >                                        return true;
>  > >                                }
>  > >                        }
>  > >
>  > >                        return false;
>  > >                }
>  > >    }
>  > >
>  > >
>  > > I didn't test this code but I do similar thing in my old application
>  > in
>  > > Spring and it works.
>  > >
>  > > Take care,
>  > > Artur
>  > >
>  > >
>  > > --
>  > > View this message in context:
>  > >
>  > http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
>  > 6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html>
>
>
> > > Sent from the Wicket - User mailing list archive at Nabble.com.
>  > >
>  > >
>  > > ---------------------------------------------------------------------
>  > > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > > For additional commands, e-mail: users-help@wicket.apache.org
>  > >
>  > >
>  >
>  > ______________
>  >
>  > The information contained in this message is proprietary and/or
>  > confidential. If you are not the
>  > intended recipient, please: (i) delete the message and all copies; (ii) do
>  > not disclose,
>  > distribute or use the message in any manner; and (iii) notify the sender
>  > immediately. In addition,
>  > please be aware that any message addressed to our domain is subject to
>  > archiving and review by
>  > persons other than the intended recipient. Thank you.
>  > _____________
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  > For additional commands, e-mail: users-help@wicket.apache.org
>  >
>  >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Johan Compagner <jc...@gmail.com>.
the problem is that then you have to have all stateless pages. Else google
can't crawl your website.
And if that is the case then you could be completely stateless so you dont
have a session (id) to worry about at all.

johan




On Thu, Apr 3, 2008 at 4:54 PM, Zappaterrini, Larry <
Larry.Zappaterrini@fnis.com> wrote:

> When Google asks to not have special treatment for their bot, they are
> referring to content more than anything. Regarding the session id being
> coded in the URL, see the Technical guidelines section of Google's
> Webmaster Guidelines -
> http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
> gn
>
> It specifically recommends "allow(ing) search bots to crawl your sites
> without session IDs or arguments that track their path through the
> site."
>
> -----Original Message-----
> From: Johan Compagner [mailto:jcompagner@gmail.com]
> Sent: Thursday, April 03, 2008 7:35 AM
> To: users@wicket.apache.org
> Subject: Re: Removing the jsessionid for SEO
>
> isnt google always saying that you shouldn't alter behavior of your site
> depending of it is there bot or not?
>
> On Thu, Apr 3, 2008 at 1:00 PM, Artur W. <a_...@gazeta.pl> wrote:
>
> >
> > Hi!
> >
> >
> > igor.vaynberg wrote:
> > >
> > > also by doing what you have done users with cookies disabled wont be
> > > able to use your site...
> > >
> >
> > In my opinion session id is a problem. Google index the same page
> again
> > and
> > again.
> >
> > About the users without cookies we can do like this:
> >
> >
> >        static class Unbuffered extends WebResponse {
> >
> >                 private static final String[] botAgents = {
> "onetszukaj",
> > "googlebot",
> > "appie", "architext",
> >                        "jeeves", "bjaaland", "ferret", "gulliver",
> > "harvest", "htdig",
> >                        "linkwalker", "lycos_", "moget",
> "muscatferret",
> > "myweb", "nomad",
> > "scooter",
> >                        "yahoo!\\sslurp\\schina", "slurp", "weblayers",
> > "antibot", "bruinbot",
> > "digout4u",
> >                        "echo!", "ia_archiver", "jennybot", "mercator",
> > "netcraft", "msnbot",
> > "petersnews",
> >                        "unlost_web_crawler", "voila", "webbase",
> > "webcollage", "cfetch",
> > "zyborg",
> >                        "wisenutbot", "robot", "crawl", "spider" }; /*
> and
> > so on... */
> >
> >                public Unbuffered(final HttpServletResponse res) {
> >            super(res);
> >         }
> >
> >        @Override
> >        public CharSequence encodeURL(final CharSequence url) {
> >             return isAgent() ? url : super.encodeURL(url);
> >        }
> >
> >                private static boolean isAgent() {
> >
> >                        String agent =
> >
> >
> ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
> tHeader("User-Agent");
> >
> >                        for(String bot : botAgents) {
> >                                if (agent.toLowerCase().indexOf(bot) !=
> -1)
> > {
> >                                        return true;
> >                                }
> >                        }
> >
> >                        return false;
> >                }
> >    }
> >
> >
> > I didn't test this code but I do similar thing in my old application
> in
> > Spring and it works.
> >
> > Take care,
> > Artur
> >
> >
> > --
> > View this message in context:
> >
> http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
> 6.html<http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html>
> > Sent from the Wicket - User mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> > For additional commands, e-mail: users-help@wicket.apache.org
> >
> >
>
> ______________
>
> The information contained in this message is proprietary and/or
> confidential. If you are not the
> intended recipient, please: (i) delete the message and all copies; (ii) do
> not disclose,
> distribute or use the message in any manner; and (iii) notify the sender
> immediately. In addition,
> please be aware that any message addressed to our domain is subject to
> archiving and review by
> persons other than the intended recipient. Thank you.
> _____________
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

RE: Removing the jsessionid for SEO

Posted by "Zappaterrini, Larry" <La...@fnis.com>.
When Google asks to not have special treatment for their bot, they are
referring to content more than anything. Regarding the session id being
coded in the URL, see the Technical guidelines section of Google's
Webmaster Guidelines -
http://www.google.com/support/webmasters/bin/answer.py?answer=35769#desi
gn 

It specifically recommends "allow(ing) search bots to crawl your sites
without session IDs or arguments that track their path through the
site."

-----Original Message-----
From: Johan Compagner [mailto:jcompagner@gmail.com] 
Sent: Thursday, April 03, 2008 7:35 AM
To: users@wicket.apache.org
Subject: Re: Removing the jsessionid for SEO

isnt google always saying that you shouldn't alter behavior of your site
depending of it is there bot or not?

On Thu, Apr 3, 2008 at 1:00 PM, Artur W. <a_...@gazeta.pl> wrote:

>
> Hi!
>
>
> igor.vaynberg wrote:
> >
> > also by doing what you have done users with cookies disabled wont be
> > able to use your site...
> >
>
> In my opinion session id is a problem. Google index the same page
again
> and
> again.
>
> About the users without cookies we can do like this:
>
>
>        static class Unbuffered extends WebResponse {
>
>                 private static final String[] botAgents = {
"onetszukaj",
> "googlebot",
> "appie", "architext",
>                        "jeeves", "bjaaland", "ferret", "gulliver",
> "harvest", "htdig",
>                        "linkwalker", "lycos_", "moget",
"muscatferret",
> "myweb", "nomad",
> "scooter",
>                        "yahoo!\\sslurp\\schina", "slurp", "weblayers",
> "antibot", "bruinbot",
> "digout4u",
>                        "echo!", "ia_archiver", "jennybot", "mercator",
> "netcraft", "msnbot",
> "petersnews",
>                        "unlost_web_crawler", "voila", "webbase",
> "webcollage", "cfetch",
> "zyborg",
>                        "wisenutbot", "robot", "crawl", "spider" }; /*
and
> so on... */
>
>                public Unbuffered(final HttpServletResponse res) {
>            super(res);
>         }
>
>        @Override
>        public CharSequence encodeURL(final CharSequence url) {
>             return isAgent() ? url : super.encodeURL(url);
>        }
>
>                private static boolean isAgent() {
>
>                        String agent =
>
>
((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().ge
tHeader("User-Agent");
>
>                        for(String bot : botAgents) {
>                                if (agent.toLowerCase().indexOf(bot) !=
-1)
> {
>                                        return true;
>                                }
>                        }
>
>                        return false;
>                }
>    }
>
>
> I didn't test this code but I do similar thing in my old application
in
> Spring and it works.
>
> Take care,
> Artur
>
>
> --
> View this message in context:
>
http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p1646739
6.html
> Sent from the Wicket - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

______________

The information contained in this message is proprietary and/or confidential. If you are not the 
intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, 
distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, 
please be aware that any message addressed to our domain is subject to archiving and review by 
persons other than the intended recipient. Thank you.
_____________

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Johan Compagner <jc...@gmail.com>.
isnt google always saying that you shouldn't alter behavior of your site
depending of it is there bot or not?

On Thu, Apr 3, 2008 at 1:00 PM, Artur W. <a_...@gazeta.pl> wrote:

>
> Hi!
>
>
> igor.vaynberg wrote:
> >
> > also by doing what you have done users with cookies disabled wont be
> > able to use your site...
> >
>
> In my opinion session id is a problem. Google index the same page again
> and
> again.
>
> About the users without cookies we can do like this:
>
>
>        static class Unbuffered extends WebResponse {
>
>                 private static final String[] botAgents = { "onetszukaj",
> "googlebot",
> "appie", "architext",
>                        "jeeves", "bjaaland", "ferret", "gulliver",
> "harvest", "htdig",
>                        "linkwalker", "lycos_", "moget", "muscatferret",
> "myweb", "nomad",
> "scooter",
>                        "yahoo!\\sslurp\\schina", "slurp", "weblayers",
> "antibot", "bruinbot",
> "digout4u",
>                        "echo!", "ia_archiver", "jennybot", "mercator",
> "netcraft", "msnbot",
> "petersnews",
>                        "unlost_web_crawler", "voila", "webbase",
> "webcollage", "cfetch",
> "zyborg",
>                        "wisenutbot", "robot", "crawl", "spider" }; /* and
> so on... */
>
>                public Unbuffered(final HttpServletResponse res) {
>            super(res);
>         }
>
>        @Override
>        public CharSequence encodeURL(final CharSequence url) {
>             return isAgent() ? url : super.encodeURL(url);
>        }
>
>                private static boolean isAgent() {
>
>                        String agent =
>
> ((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().getHeader("User-Agent");
>
>                        for(String bot : botAgents) {
>                                if (agent.toLowerCase().indexOf(bot) != -1)
> {
>                                        return true;
>                                }
>                        }
>
>                        return false;
>                }
>    }
>
>
> I didn't test this code but I do similar thing in my old application in
> Spring and it works.
>
> Take care,
> Artur
>
>
> --
> View this message in context:
> http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html
> Sent from the Wicket - User mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
> For additional commands, e-mail: users-help@wicket.apache.org
>
>

Re: Removing the jsessionid for SEO

Posted by "Artur W." <a_...@gazeta.pl>.
Hi!


igor.vaynberg wrote:
> 
> also by doing what you have done users with cookies disabled wont be
> able to use your site...
> 

In my opinion session id is a problem. Google index the same page again and
again.

About the users without cookies we can do like this:


	static class Unbuffered extends WebResponse {

		private static final String[] botAgents = { "onetszukaj", "googlebot",
"appie", "architext",
			"jeeves", "bjaaland", "ferret", "gulliver", "harvest", "htdig",
			"linkwalker", "lycos_", "moget", "muscatferret", "myweb", "nomad",
"scooter",
			"yahoo!\\sslurp\\schina", "slurp", "weblayers", "antibot", "bruinbot",
"digout4u",
			"echo!", "ia_archiver", "jennybot", "mercator", "netcraft", "msnbot",
"petersnews",
			"unlost_web_crawler", "voila", "webbase", "webcollage", "cfetch",
"zyborg",
			"wisenutbot", "robot", "crawl", "spider" }; /* and so on... */

		public Unbuffered(final HttpServletResponse res) {
            super(res);
        }

        @Override
        public CharSequence encodeURL(final CharSequence url) {
            return isAgent() ? url : super.encodeURL(url);
        }

		private static boolean isAgent() {

			String agent =
((WebRequest)RequestCycle.get().getRequest()).getHttpServletRequest().getHeader("User-Agent");

			for(String bot : botAgents) {
				if (agent.toLowerCase().indexOf(bot) != -1) {
					return true;
				}
			}

			return false;
		}
    }


I didn't test this code but I do similar thing in my old application in
Spring and it works.

Take care,
Artur


-- 
View this message in context: http://www.nabble.com/Removing-the-jsessionid-for-SEO-tp16464534p16467396.html
Sent from the Wicket - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Ryan Gravener <ry...@ryangravener.com>.
I have noticed something like this with http_check on nagios.  Is
there a proper way to get rid of these temporary sessions?

On Wed, Apr 2, 2008 at 10:45 PM, Igor Vaynberg <ig...@gmail.com> wrote:
> also by doing what you have done users with cookies disabled wont be
>  able to use your site...
>
>  -igor
>
>
>
>
>  On Wed, Apr 2, 2008 at 7:44 PM, Igor Vaynberg <ig...@gmail.com> wrote:
>  > you would think that the crawl bots are smart enough to ignore
>  >  jsessionid tokens...
>  >
>  >  -igor
>  >
>  >
>  >
>  >
>  >  On Wed, Apr 2, 2008 at 5:20 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
>  >  > victori_ provided this information on IRC and I just wanted to share it with
>  >  >  everyone else.  Googlebot and others don't use cookies.  This means when
>  >  >  they visit your site it adds ;jsessionid=code to the end of all your urls
>  >  >  they visit.  When they re-visit it, they get a different code, consider that
>  >  >  a different url with the same content and punish you.  So, for the web
>  >  >  crawling bots, it's very important to get rid of this (Perhaps it's
>  >  >  worthwhile to check this code in to the code base).
>  >  >
>  >  >  Here's what you do in your Application:
>  >  >
>  >  >   @Override
>  >  >      protected WebResponse newWebResponse(final HttpServletResponse servletRe
>  >  >  sponse) {
>  >  >           return CleanWebResponse.getNew(this, servletResponse);
>  >  >       }
>  >  >
>  >  >  Here's the CleanWebResponse class:
>  >  >  public class CleanWebResponse {
>  >  >     public static WebResponse getNew(final Application app, final
>  >  >  HttpServletResponse servletResponse) {
>  >  >         return app.getRequestCycleSettings().getBufferResponse() ? new
>  >  >  Buffered(servletResponse) : new Unbuffered(
>  >  >                 servletResponse);
>  >  >     }
>  >  >
>  >  >     static class Buffered extends BufferedWebResponse {
>  >  >         public Buffered(final HttpServletResponse httpServletResponse) {
>  >  >             super(httpServletResponse);
>  >  >         }
>  >  >
>  >  >         @Override
>  >  >         public CharSequence encodeURL(final CharSequence url) {
>  >  >             return url;
>  >  >         }
>  >  >     }
>  >  >
>  >  >     static class Unbuffered extends WebResponse {
>  >  >         public Unbuffered(final HttpServletResponse httpServletResponse) {
>  >  >             super(httpServletResponse);
>  >  >         }
>  >  >
>  >  >         @Override
>  >  >         public CharSequence encodeURL(final CharSequence url) {
>  >  >             return url;
>  >  >         }
>  >  >     }
>  >  >  }
>  >  >
>  >  >  Note, I haven't tested this myself yet but I plan to tonight.  Hope this was
>  >  >  helpful.
>  >  >
>  >  >
>  >  >  ---------------------------------------------------------------------
>  >  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >  >
>  >  >
>  >
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>



-- 
Ryan Gravener
http://ryangravener.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Igor Vaynberg <ig...@gmail.com>.
also by doing what you have done users with cookies disabled wont be
able to use your site...

-igor


On Wed, Apr 2, 2008 at 7:44 PM, Igor Vaynberg <ig...@gmail.com> wrote:
> you would think that the crawl bots are smart enough to ignore
>  jsessionid tokens...
>
>  -igor
>
>
>
>
>  On Wed, Apr 2, 2008 at 5:20 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
>  > victori_ provided this information on IRC and I just wanted to share it with
>  >  everyone else.  Googlebot and others don't use cookies.  This means when
>  >  they visit your site it adds ;jsessionid=code to the end of all your urls
>  >  they visit.  When they re-visit it, they get a different code, consider that
>  >  a different url with the same content and punish you.  So, for the web
>  >  crawling bots, it's very important to get rid of this (Perhaps it's
>  >  worthwhile to check this code in to the code base).
>  >
>  >  Here's what you do in your Application:
>  >
>  >   @Override
>  >      protected WebResponse newWebResponse(final HttpServletResponse servletRe
>  >  sponse) {
>  >           return CleanWebResponse.getNew(this, servletResponse);
>  >       }
>  >
>  >  Here's the CleanWebResponse class:
>  >  public class CleanWebResponse {
>  >     public static WebResponse getNew(final Application app, final
>  >  HttpServletResponse servletResponse) {
>  >         return app.getRequestCycleSettings().getBufferResponse() ? new
>  >  Buffered(servletResponse) : new Unbuffered(
>  >                 servletResponse);
>  >     }
>  >
>  >     static class Buffered extends BufferedWebResponse {
>  >         public Buffered(final HttpServletResponse httpServletResponse) {
>  >             super(httpServletResponse);
>  >         }
>  >
>  >         @Override
>  >         public CharSequence encodeURL(final CharSequence url) {
>  >             return url;
>  >         }
>  >     }
>  >
>  >     static class Unbuffered extends WebResponse {
>  >         public Unbuffered(final HttpServletResponse httpServletResponse) {
>  >             super(httpServletResponse);
>  >         }
>  >
>  >         @Override
>  >         public CharSequence encodeURL(final CharSequence url) {
>  >             return url;
>  >         }
>  >     }
>  >  }
>  >
>  >  Note, I haven't tested this myself yet but I plan to tonight.  Hope this was
>  >  helpful.
>  >
>  >
>  >  ---------------------------------------------------------------------
>  >  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  >  For additional commands, e-mail: users-help@wicket.apache.org
>  >
>  >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org


Re: Removing the jsessionid for SEO

Posted by Igor Vaynberg <ig...@gmail.com>.
you would think that the crawl bots are smart enough to ignore
jsessionid tokens...

-igor


On Wed, Apr 2, 2008 at 5:20 PM, Dan Kaplan <dk...@citizenhawk.com> wrote:
> victori_ provided this information on IRC and I just wanted to share it with
>  everyone else.  Googlebot and others don't use cookies.  This means when
>  they visit your site it adds ;jsessionid=code to the end of all your urls
>  they visit.  When they re-visit it, they get a different code, consider that
>  a different url with the same content and punish you.  So, for the web
>  crawling bots, it's very important to get rid of this (Perhaps it's
>  worthwhile to check this code in to the code base).
>
>  Here's what you do in your Application:
>
>   @Override
>      protected WebResponse newWebResponse(final HttpServletResponse servletRe
>  sponse) {
>           return CleanWebResponse.getNew(this, servletResponse);
>       }
>
>  Here's the CleanWebResponse class:
>  public class CleanWebResponse {
>     public static WebResponse getNew(final Application app, final
>  HttpServletResponse servletResponse) {
>         return app.getRequestCycleSettings().getBufferResponse() ? new
>  Buffered(servletResponse) : new Unbuffered(
>                 servletResponse);
>     }
>
>     static class Buffered extends BufferedWebResponse {
>         public Buffered(final HttpServletResponse httpServletResponse) {
>             super(httpServletResponse);
>         }
>
>         @Override
>         public CharSequence encodeURL(final CharSequence url) {
>             return url;
>         }
>     }
>
>     static class Unbuffered extends WebResponse {
>         public Unbuffered(final HttpServletResponse httpServletResponse) {
>             super(httpServletResponse);
>         }
>
>         @Override
>         public CharSequence encodeURL(final CharSequence url) {
>             return url;
>         }
>     }
>  }
>
>  Note, I haven't tested this myself yet but I plan to tonight.  Hope this was
>  helpful.
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
>  For additional commands, e-mail: users-help@wicket.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@wicket.apache.org
For additional commands, e-mail: users-help@wicket.apache.org