You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Bert Van Kets <be...@vankets.com> on 2002/04/12 12:40:10 UTC

Search Engine Optimization and Cocoon

One of my *things* is Search Engine Optimization or SEO.  SEO tries to 
define the rules a site must comply to to be "found" by search engines and 
thus get a lot of relevant visitors.  I found Cocoon to be a perfect 
platform to do server side programming that can be hidden entirely from the 
client.  Even page optimization is possible.
There is a rather important rule that is broken by some samples: "Some 
major robots (programs that requests and index the content of your site) 
will not index your page and thus not follow the links in it if you use a 
querystring in the URL."  This rule is not important if you use forms, 
since robots will never enter data and submit a form.  I hope that passing 
parameters in the URL will not be a "standard" thing in Cocoon.
I am willing to spend some time to compile a text that explains how the 
benefits of Cocoon can be used in SEO.  Anybody interested?
Bert


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Search Engine Optimization and Cocoon

Posted by Stefano Mazzocchi <st...@apache.org>.
Bert Van Kets wrote:
> 
> One of my *things* is Search Engine Optimization or SEO.  SEO tries to
> define the rules a site must comply to to be "found" by search engines and
> thus get a lot of relevant visitors.  I found Cocoon to be a perfect
> platform to do server side programming that can be hidden entirely from the
> client.  Even page optimization is possible.
> There is a rather important rule that is broken by some samples: "Some
> major robots (programs that requests and index the content of your site)
> will not index your page and thus not follow the links in it if you use a
> querystring in the URL."  This rule is not important if you use forms,
> since robots will never enter data and submit a form.  I hope that passing
> parameters in the URL will not be a "standard" thing in Cocoon.

Of course not! 

At least for one thing: I *totally* hate it!

The Cocoon sitemap was almost designed to allow you to 'encode' stuff
right into the URI and in my projects I do

 /search(2)

to get the second page of the search, instead of the usual (but ugly!)

 /search?page=2

Not only for search engines, but also for site usability.

> I am willing to spend some time to compile a text that explains how the
> benefits of Cocoon can be used in SEO.  Anybody interested?

I am.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: Search Engine Optimization and Cocoon (long!)

Posted by Bert Van Kets <be...@visitronics.be>.
I see your point.  It's just that IIRC 301 is not always correctly 
interpreted.  This can be a problem.
I was just pointing out that removing a page from the server also removes 
it from the index once the robot has passed since the removal from teh 
index will be triggered by the 404.  That's all.
Bert

At 08:43 16/04/2002 +1200, you wrote:
>Are you sure, Bert? I can see how a search engine would be suspicious of a
>META REFRESH tag because the search engine would be ranking the page which
>contained the META tag (and a whole lot of key words), but the user will be
>sent to some different page. So I certainly wouldn't recommend using a html
>META tag! But if a Search bot receives a 301 code as part of the HTTP
>transaction, then the old (e.g. Britney Spears) page has disappeared, and
>the search engine should follow the link to find the new (spaghetti) page.
>WHen it finds a 301, the search engine should trash the old page in its
>index, just as with a 404, so there's no chance of "britney spears" turning
>into "spaghetti" again. To use this technique to spam search engines you'd
>have to return the "britney spears" page to the search engines and the 301
>code to browsers ("cloaking"). If you're going to do this, then you may as
>well return the spaghetti page directly, rather than a 301 link to it. So
>why should they care about the 301?
>
>http://www.google.com/remove.html#change_url
>
>I'm no expert on search engines really ... and you could well be right
>(which search engines are averse to 301 codes?) but I stuck my oar in
>because it seemed to me that it's a waste of a perfectly good hit to return
>a 404 unnecessarily. With Cocoon this kind of thing can be handled so easily
>that I don't think "search engine optimization" should be an excuse for
>turning away your EXISTING users who ALREADY HAVE an (old) link to your
>site.
>
>Con
>
> > -----Original Message-----
> > From: Bert Van Kets [mailto:bert@visitronics.be]
> > Sent: Monday, 15 April 2002 06:19
> > To: cocoon-dev@xml.apache.org
> > Subject: RE: Search Engine Optimization and Cocoon (long!)
> >
> >
> > I meant that a 404 is the signal for th robot to remove the
> > file from the
> > index.  A 301 is, wrongly, interpreted as a "meta refresh" 307.
> > The meta refresh is used in a technique where a page
> > containing a meta
> > refresh is optimized for a specific, very popular, keyword is
> > promoted but
> > the visitor is redirected to a completely different content.  ex. You
> > search for "Britmey Spears", but get redirected to a page
> > about spaghetti.
> > Search engines want to give good relevant results, so they hate this
> > technique.  You can get listed as a spammer for this.
> > Although technically
> > a 301 is more correct, it's not good for SEO!
> > Don't use it unless SEO is not important for the site.
> > Bert
> >
> > At 10:46 14/04/2002 +1200, you wrote:
> > >Don't use a 404 to signal that a URL has changed: use a 301 "Moved
> > >Permanently".
> > >http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.2
> > >
> > > > -----Original Message-----
> > > > From: Bert Van Kets [mailto:bert@visitronics.be]
> > > > Sent: Sunday, 14 April 2002 08:57
> > > > To: cocoon-dev@xml.apache.org
> > > > Subject: Re: Search Engine Optimization and Cocoon (long!)
> > > >
> > > >
> > > > If you get visits from search engines to those pages it would
> > > > be crazy to
> > > > get rid of those links.  I would however install the new
> > > > pages without the
> > > > query string and try to get as high as possible with as
> > many SE's as
> > > > possible.  Once those new links do their work, you have an
> > > > alternative and
> > > > you can get rid of the old links if you want.
> > > > Most major SE's don't like different links to the same
> > page, so it is
> > > > actually an advantage of having only one good link to a page.
> > > >  Then again,
> > > > it's only a disadvantage if you get caught, so you can keep
> > > > the old link
> > > > with some spam risk involved.
> > > >
> > > > Robots will revisit your site after a certain time.  Most
> > > > have an interval
> > > > of about 1 month.  If it gets a 404 page not found it will
> > > > erase that page
> > > > from the index.  That will get rid of the old links
> > > > automatically.  You can
> > > > get rid of them manually by resubmitting them, but that's a lot of
> > > > work.  If they find the new links, they will index the pages.
> > > >  If some
> > > > pages are in a search engine, it will get out and get the
> > new pages
> > > > automatically.
> > > >
> > > > Keeping track of the logs is *always* a good idea!
> > > >
> > > > BTW:  Don't confuse Search Engines with directories.  Search
> > > > Engines use a
> > > > robot to index the content of your site.  Directories like
> > > > Yahoo en Open
> > > > Directory (dmoz.com) have humans look at the site and quote
> > > > it.  A clear,
> > > > good content is all you can do for these guys.
> > > >
> > > > Bert
> > > >
> > > > At 06:13 13/04/2002 -0400, you wrote:
> > > > >Bert,
> > > > >
> > > > >Thanks for the great read!
> > > > >
> > > > >>Part 7: other things you should know
> > > > >>-----------------------------------------------------
> > > > >>A. Querystrings (everything behind a ? in the URL)
> > > > >>Most major search engines hate querystrings.  They assume
> > > > that the query
> > > > >>strings are used for database access and dynamic page
> > > > generation.  This
> > > > >>can give them a "black hole" where they eventually
> > index a complete
> > > > >>database.  Altavista clearly states that they will index a
> > > > page with
> > > > >>querystrings, but won't follow any links.  Google is one of
> > > > the first to
> > > > >>start indexing pages with querystrings.  They are very
> > > > coutious and will
> > > > >>go only a certain levels.
> > > > >
> > > > >Now that we can effectively rewrite page URLs without query
> > > > strings using
> > > > >C2, do you think it's simply a matter of resubmitting to
> > > > search engines to
> > > > >remove any existing search engine links to the "old"
> > pages? In the
> > > > >meantime, I suppose we could leave up a pipeline up, that
> > > > maps the old
> > > > >URLs with query strings to the new URL without query strings
> > > > (and monitor
> > > > >logs to determine when/if to delete them down the road.)
> > > > >
> > > > >Would that be your approach for updating old sites?
> > > > >
> > > > >Diana
> > > > >
> > > > >
> > > >
> > >---------------------------------------------------------------------
> > > > >To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > > > >For additional commands, email: cocoon-dev-help@xml.apache.org
> > > >
> > > >
> > > >
> > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > > > For additional commands, email: cocoon-dev-help@xml.apache.org
> > > >
> > > >
> > >
> > >
> > >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > >For additional commands, email: cocoon-dev-help@xml.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > For additional commands, email: cocoon-dev-help@xml.apache.org
> >
> >
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
>For additional commands, email: cocoon-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: Search Engine Optimization and Cocoon (long!)

Posted by Conal Tuohy <co...@paradise.net.nz>.
Are you sure, Bert? I can see how a search engine would be suspicious of a
META REFRESH tag because the search engine would be ranking the page which
contained the META tag (and a whole lot of key words), but the user will be
sent to some different page. So I certainly wouldn't recommend using a html
META tag! But if a Search bot receives a 301 code as part of the HTTP
transaction, then the old (e.g. Britney Spears) page has disappeared, and
the search engine should follow the link to find the new (spaghetti) page.
WHen it finds a 301, the search engine should trash the old page in its
index, just as with a 404, so there's no chance of "britney spears" turning
into "spaghetti" again. To use this technique to spam search engines you'd
have to return the "britney spears" page to the search engines and the 301
code to browsers ("cloaking"). If you're going to do this, then you may as
well return the spaghetti page directly, rather than a 301 link to it. So
why should they care about the 301?

http://www.google.com/remove.html#change_url

I'm no expert on search engines really ... and you could well be right
(which search engines are averse to 301 codes?) but I stuck my oar in
because it seemed to me that it's a waste of a perfectly good hit to return
a 404 unnecessarily. With Cocoon this kind of thing can be handled so easily
that I don't think "search engine optimization" should be an excuse for
turning away your EXISTING users who ALREADY HAVE an (old) link to your
site.

Con

> -----Original Message-----
> From: Bert Van Kets [mailto:bert@visitronics.be]
> Sent: Monday, 15 April 2002 06:19
> To: cocoon-dev@xml.apache.org
> Subject: RE: Search Engine Optimization and Cocoon (long!)
>
>
> I meant that a 404 is the signal for th robot to remove the
> file from the
> index.  A 301 is, wrongly, interpreted as a "meta refresh" 307.
> The meta refresh is used in a technique where a page
> containing a meta
> refresh is optimized for a specific, very popular, keyword is
> promoted but
> the visitor is redirected to a completely different content.  ex. You
> search for "Britmey Spears", but get redirected to a page
> about spaghetti.
> Search engines want to give good relevant results, so they hate this
> technique.  You can get listed as a spammer for this.
> Although technically
> a 301 is more correct, it's not good for SEO!
> Don't use it unless SEO is not important for the site.
> Bert
>
> At 10:46 14/04/2002 +1200, you wrote:
> >Don't use a 404 to signal that a URL has changed: use a 301 "Moved
> >Permanently".
> >http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.2
> >
> > > -----Original Message-----
> > > From: Bert Van Kets [mailto:bert@visitronics.be]
> > > Sent: Sunday, 14 April 2002 08:57
> > > To: cocoon-dev@xml.apache.org
> > > Subject: Re: Search Engine Optimization and Cocoon (long!)
> > >
> > >
> > > If you get visits from search engines to those pages it would
> > > be crazy to
> > > get rid of those links.  I would however install the new
> > > pages without the
> > > query string and try to get as high as possible with as
> many SE's as
> > > possible.  Once those new links do their work, you have an
> > > alternative and
> > > you can get rid of the old links if you want.
> > > Most major SE's don't like different links to the same
> page, so it is
> > > actually an advantage of having only one good link to a page.
> > >  Then again,
> > > it's only a disadvantage if you get caught, so you can keep
> > > the old link
> > > with some spam risk involved.
> > >
> > > Robots will revisit your site after a certain time.  Most
> > > have an interval
> > > of about 1 month.  If it gets a 404 page not found it will
> > > erase that page
> > > from the index.  That will get rid of the old links
> > > automatically.  You can
> > > get rid of them manually by resubmitting them, but that's a lot of
> > > work.  If they find the new links, they will index the pages.
> > >  If some
> > > pages are in a search engine, it will get out and get the
> new pages
> > > automatically.
> > >
> > > Keeping track of the logs is *always* a good idea!
> > >
> > > BTW:  Don't confuse Search Engines with directories.  Search
> > > Engines use a
> > > robot to index the content of your site.  Directories like
> > > Yahoo en Open
> > > Directory (dmoz.com) have humans look at the site and quote
> > > it.  A clear,
> > > good content is all you can do for these guys.
> > >
> > > Bert
> > >
> > > At 06:13 13/04/2002 -0400, you wrote:
> > > >Bert,
> > > >
> > > >Thanks for the great read!
> > > >
> > > >>Part 7: other things you should know
> > > >>-----------------------------------------------------
> > > >>A. Querystrings (everything behind a ? in the URL)
> > > >>Most major search engines hate querystrings.  They assume
> > > that the query
> > > >>strings are used for database access and dynamic page
> > > generation.  This
> > > >>can give them a "black hole" where they eventually
> index a complete
> > > >>database.  Altavista clearly states that they will index a
> > > page with
> > > >>querystrings, but won't follow any links.  Google is one of
> > > the first to
> > > >>start indexing pages with querystrings.  They are very
> > > coutious and will
> > > >>go only a certain levels.
> > > >
> > > >Now that we can effectively rewrite page URLs without query
> > > strings using
> > > >C2, do you think it's simply a matter of resubmitting to
> > > search engines to
> > > >remove any existing search engine links to the "old"
> pages? In the
> > > >meantime, I suppose we could leave up a pipeline up, that
> > > maps the old
> > > >URLs with query strings to the new URL without query strings
> > > (and monitor
> > > >logs to determine when/if to delete them down the road.)
> > > >
> > > >Would that be your approach for updating old sites?
> > > >
> > > >Diana
> > > >
> > > >
> > >
> >---------------------------------------------------------------------
> > > >To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > > >For additional commands, email: cocoon-dev-help@xml.apache.org
> > >
> > >
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > > For additional commands, email: cocoon-dev-help@xml.apache.org
> > >
> > >
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> >For additional commands, email: cocoon-dev-help@xml.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: Search Engine Optimization and Cocoon (long!)

Posted by Bert Van Kets <be...@visitronics.be>.
I meant that a 404 is the signal for th robot to remove the file from the 
index.  A 301 is, wrongly, interpreted as a "meta refresh" 307.
The meta refresh is used in a technique where a page containing a meta 
refresh is optimized for a specific, very popular, keyword is promoted but 
the visitor is redirected to a completely different content.  ex. You 
search for "Britmey Spears", but get redirected to a page about spaghetti.
Search engines want to give good relevant results, so they hate this 
technique.  You can get listed as a spammer for this.  Although technically 
a 301 is more correct, it's not good for SEO!
Don't use it unless SEO is not important for the site.
Bert

At 10:46 14/04/2002 +1200, you wrote:
>Don't use a 404 to signal that a URL has changed: use a 301 "Moved
>Permanently".
>http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.2
>
> > -----Original Message-----
> > From: Bert Van Kets [mailto:bert@visitronics.be]
> > Sent: Sunday, 14 April 2002 08:57
> > To: cocoon-dev@xml.apache.org
> > Subject: Re: Search Engine Optimization and Cocoon (long!)
> >
> >
> > If you get visits from search engines to those pages it would
> > be crazy to
> > get rid of those links.  I would however install the new
> > pages without the
> > query string and try to get as high as possible with as many SE's as
> > possible.  Once those new links do their work, you have an
> > alternative and
> > you can get rid of the old links if you want.
> > Most major SE's don't like different links to the same page, so it is
> > actually an advantage of having only one good link to a page.
> >  Then again,
> > it's only a disadvantage if you get caught, so you can keep
> > the old link
> > with some spam risk involved.
> >
> > Robots will revisit your site after a certain time.  Most
> > have an interval
> > of about 1 month.  If it gets a 404 page not found it will
> > erase that page
> > from the index.  That will get rid of the old links
> > automatically.  You can
> > get rid of them manually by resubmitting them, but that's a lot of
> > work.  If they find the new links, they will index the pages.
> >  If some
> > pages are in a search engine, it will get out and get the new pages
> > automatically.
> >
> > Keeping track of the logs is *always* a good idea!
> >
> > BTW:  Don't confuse Search Engines with directories.  Search
> > Engines use a
> > robot to index the content of your site.  Directories like
> > Yahoo en Open
> > Directory (dmoz.com) have humans look at the site and quote
> > it.  A clear,
> > good content is all you can do for these guys.
> >
> > Bert
> >
> > At 06:13 13/04/2002 -0400, you wrote:
> > >Bert,
> > >
> > >Thanks for the great read!
> > >
> > >>Part 7: other things you should know
> > >>-----------------------------------------------------
> > >>A. Querystrings (everything behind a ? in the URL)
> > >>Most major search engines hate querystrings.  They assume
> > that the query
> > >>strings are used for database access and dynamic page
> > generation.  This
> > >>can give them a "black hole" where they eventually index a complete
> > >>database.  Altavista clearly states that they will index a
> > page with
> > >>querystrings, but won't follow any links.  Google is one of
> > the first to
> > >>start indexing pages with querystrings.  They are very
> > coutious and will
> > >>go only a certain levels.
> > >
> > >Now that we can effectively rewrite page URLs without query
> > strings using
> > >C2, do you think it's simply a matter of resubmitting to
> > search engines to
> > >remove any existing search engine links to the "old" pages? In the
> > >meantime, I suppose we could leave up a pipeline up, that
> > maps the old
> > >URLs with query strings to the new URL without query strings
> > (and monitor
> > >logs to determine when/if to delete them down the road.)
> > >
> > >Would that be your approach for updating old sites?
> > >
> > >Diana
> > >
> > >
> > >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > >For additional commands, email: cocoon-dev-help@xml.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > For additional commands, email: cocoon-dev-help@xml.apache.org
> >
> >
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
>For additional commands, email: cocoon-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


RE: Search Engine Optimization and Cocoon (long!)

Posted by Conal Tuohy <co...@paradise.net.nz>.
Don't use a 404 to signal that a URL has changed: use a 301 "Moved
Permanently".
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.2

> -----Original Message-----
> From: Bert Van Kets [mailto:bert@visitronics.be]
> Sent: Sunday, 14 April 2002 08:57
> To: cocoon-dev@xml.apache.org
> Subject: Re: Search Engine Optimization and Cocoon (long!)
>
>
> If you get visits from search engines to those pages it would
> be crazy to
> get rid of those links.  I would however install the new
> pages without the
> query string and try to get as high as possible with as many SE's as
> possible.  Once those new links do their work, you have an
> alternative and
> you can get rid of the old links if you want.
> Most major SE's don't like different links to the same page, so it is
> actually an advantage of having only one good link to a page.
>  Then again,
> it's only a disadvantage if you get caught, so you can keep
> the old link
> with some spam risk involved.
>
> Robots will revisit your site after a certain time.  Most
> have an interval
> of about 1 month.  If it gets a 404 page not found it will
> erase that page
> from the index.  That will get rid of the old links
> automatically.  You can
> get rid of them manually by resubmitting them, but that's a lot of
> work.  If they find the new links, they will index the pages.
>  If some
> pages are in a search engine, it will get out and get the new pages
> automatically.
>
> Keeping track of the logs is *always* a good idea!
>
> BTW:  Don't confuse Search Engines with directories.  Search
> Engines use a
> robot to index the content of your site.  Directories like
> Yahoo en Open
> Directory (dmoz.com) have humans look at the site and quote
> it.  A clear,
> good content is all you can do for these guys.
>
> Bert
>
> At 06:13 13/04/2002 -0400, you wrote:
> >Bert,
> >
> >Thanks for the great read!
> >
> >>Part 7: other things you should know
> >>-----------------------------------------------------
> >>A. Querystrings (everything behind a ? in the URL)
> >>Most major search engines hate querystrings.  They assume
> that the query
> >>strings are used for database access and dynamic page
> generation.  This
> >>can give them a "black hole" where they eventually index a complete
> >>database.  Altavista clearly states that they will index a
> page with
> >>querystrings, but won't follow any links.  Google is one of
> the first to
> >>start indexing pages with querystrings.  They are very
> coutious and will
> >>go only a certain levels.
> >
> >Now that we can effectively rewrite page URLs without query
> strings using
> >C2, do you think it's simply a matter of resubmitting to
> search engines to
> >remove any existing search engine links to the "old" pages? In the
> >meantime, I suppose we could leave up a pipeline up, that
> maps the old
> >URLs with query strings to the new URL without query strings
> (and monitor
> >logs to determine when/if to delete them down the road.)
> >
> >Would that be your approach for updating old sites?
> >
> >Diana
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> >For additional commands, email: cocoon-dev-help@xml.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Search Engine Optimization and Cocoon (long!)

Posted by Bert Van Kets <be...@visitronics.be>.
If you get visits from search engines to those pages it would be crazy to 
get rid of those links.  I would however install the new pages without the 
query string and try to get as high as possible with as many SE's as 
possible.  Once those new links do their work, you have an alternative and 
you can get rid of the old links if you want.
Most major SE's don't like different links to the same page, so it is 
actually an advantage of having only one good link to a page.  Then again, 
it's only a disadvantage if you get caught, so you can keep the old link 
with some spam risk involved.

Robots will revisit your site after a certain time.  Most have an interval 
of about 1 month.  If it gets a 404 page not found it will erase that page 
from the index.  That will get rid of the old links automatically.  You can 
get rid of them manually by resubmitting them, but that's a lot of 
work.  If they find the new links, they will index the pages.  If some 
pages are in a search engine, it will get out and get the new pages 
automatically.

Keeping track of the logs is *always* a good idea!

BTW:  Don't confuse Search Engines with directories.  Search Engines use a 
robot to index the content of your site.  Directories like Yahoo en Open 
Directory (dmoz.com) have humans look at the site and quote it.  A clear, 
good content is all you can do for these guys.

Bert

At 06:13 13/04/2002 -0400, you wrote:
>Bert,
>
>Thanks for the great read!
>
>>Part 7: other things you should know
>>-----------------------------------------------------
>>A. Querystrings (everything behind a ? in the URL)
>>Most major search engines hate querystrings.  They assume that the query 
>>strings are used for database access and dynamic page generation.  This 
>>can give them a "black hole" where they eventually index a complete 
>>database.  Altavista clearly states that they will index a page with 
>>querystrings, but won't follow any links.  Google is one of the first to 
>>start indexing pages with querystrings.  They are very coutious and will 
>>go only a certain levels.
>
>Now that we can effectively rewrite page URLs without query strings using 
>C2, do you think it's simply a matter of resubmitting to search engines to 
>remove any existing search engine links to the "old" pages? In the 
>meantime, I suppose we could leave up a pipeline up, that maps the old 
>URLs with query strings to the new URL without query strings (and monitor 
>logs to determine when/if to delete them down the road.)
>
>Would that be your approach for updating old sites?
>
>Diana
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
>For additional commands, email: cocoon-dev-help@xml.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Search Engine Optimization and Cocoon (long!)

Posted by Diana Shannon <te...@mac.com>.
Bert,

Thanks for the great read!

> Part 7: other things you should know
> -----------------------------------------------------
> A. Querystrings (everything behind a ? in the URL)
> Most major search engines hate querystrings.  They assume that the 
> query strings are used for database access and dynamic page 
> generation.  This can give them a "black hole" where they eventually 
> index a complete database.  Altavista clearly states that they will 
> index a page with querystrings, but won't follow any links.  Google is 
> one of the first to start indexing pages with querystrings.  They are 
> very coutious and will go only a certain levels.

Now that we can effectively rewrite page URLs without query strings 
using C2, do you think it's simply a matter of resubmitting to search 
engines to remove any existing search engine links to the "old" pages? 
In the meantime, I suppose we could leave up a pipeline up, that maps 
the old URLs with query strings to the new URL without query strings 
(and monitor logs to determine when/if to delete them down the road.)

Would that be your approach for updating old sites?

Diana


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Search Engine Optimization and Cocoon (long!)

Posted by Bert Van Kets <be...@vankets.com>.
SEO is a very vague and ever changing science.  There's a lot to know about 
each and every Search engine.  The rules are different for every one of 
them.  I will only give the major, more general rules to get you going.

----------------------------------------------------------------------------------------------------------------------------------------------------------

To know how your site can be found you have to know how people look for 
your site.  What do you do when you surf the web?  You look for textual 
information!  You enter a keyword or key phrase into a textbox and ask the 
search engine to query his database and come up with web pages that are as 
relevant as possible.
- Remember the importance of the keyword or keyphrase here. -
Now you probably get that the trick is to know two things:
1) what keywords or keyphrases do people actually use
2) what are the rules that make a page relevant to a specific keyword or 
keyphrase (where do you apply the keywords)

Part 1: The keywords used by the visitor
----------------------------------------------------------
A. Try to find them yourself
Brainstorm about ALL possible keywords that can be applied to your 
site.  Write them all down.  You might have about 200 words or 
phrases.  Don't just sum product names or descriptions of products, but 
also think about what people would search for when they don't know your 
product.  People who don't know what an electric drill is don't search for 
a drill but for a "hole in a wall".
Also don't get keywords that are too general.  If your site is about second 
hand cars, you don't want people searching for new cars or car 
parts.  Getting relevant visitors is what we are after.
B. See what your competitors are doing
Check out the sites from your competitors.  See what keywords they are 
using and more importantly the most important words they are using in the 
body of the major pages.
C. Filter out the most relevant keywords.
Go over the list and select the keywords you think are best.  Keep a list 
of about 50 keyphases, knowing that people use keyphrases more than keywords.
D. Test your keywords and keyphrases
Now go out to some of the major search engines like Google, AltaVista, 
Excite, MSN, AOL, Netscape, Northern Light, Lycos, etc. and see how many 
times your keywords is found.  This will give you an idea of the relevance 
of the words in sites.  Don't use this as a definite guide to the 
relevance, only use it as a guide.
If you want to absolutely sure about your keywords you have to use a 
database of keyphases entered in a series of search engines.  WordTracker 
(http://www.wordtracker.com/) is the only service I know of that can do this.

After you have done all this hard work you should have a list of five to 
ten major keywords and ten to twenty secondary.  It is very important that 
you stick with these for a long time, so you'd better be sure you have the 
right ones.  SEO is a slooooow process and it can take 6 months easily to 
see some results of a decision you made.

Part 2: Where do you use the keywords
---------------------------------------------------------
A. What can be indexed?  Text, text and nothing but text.  Text in graphics 
will not be indexed! Why do you think graphics have Alt tags.  Alt tags are 
compulsory, BTW!
So if you have a site with only a teeny weeny bit of text, how do you 
suppose your site can be indexed and found?  The first and most important 
rule of SEO is content, content and some more content.
B. Pick your page
Pick a page you want to use to promote a keyword.  When this page is 
indexed the robot is supposed to think the keyword is very important.  Use 
one page for each keyword you want to promote.   It is nearly impossible to 
promote multiple keywords on one page.
C. Page title
The page title is the most important part of a page.  Just as you judge a 
book by it's title a robot will judge the page by the title.  Most search 
engines like your keyword only once in the title.  Too much and you are 
regarded spam, and that is the last thing you want.
Make sure the title tag is the first tag below the head tag.
D. The meta keywords and meta description tag.
More and more search engines skip these and try to filter out the keywords 
themselves.  Add your keyword and some variations like plurals.
The description tag is important since it will be used by the search engine 
to describe the page.  In the list returned by the search engine most of 
the time you will see the page title with the link on it and below it the 
description.  A good title and a good description can lure visitors by 
themselves (provided you can get your page in the top 30 in some search 
engines).
E. Page body
Keywords are only found relevant when they appear in the body of the 
page.  Use your keyword in the first and the last paragraph. That's where 
they will have the most relevance.  Use about 400 words in the body.
F. Links
A link on the word spaghetti that points to a page called spaghetti.html in 
the directory spaghetti and having a high keyword relevance on spaghetti 
MUST be about spaghetti, no?  Try to implement two to three links in the 
page containing your keyword.
This is VERY important: use ONLY <a href> links for your 
navigation.  Hardly any robot will interpret JavaScript, so window.location 
links are not followed.  Robots don't submit pages either.  Don't rely on 
navigation using <input> tags.
Flash or Applets are not followed either!  So forget about the fancy, 
flashy sites.  They don't get anywhere.  Cocoon makes navigation 
maintenance easy, why use client side scripts?
G. Alt tags
Alt tags are the only way to get the content of your images indexed.  Use them!
If you want to address as many visitors as possible, don't forget the 
visually impaired and give them something they can see in their own 
way.  Braillereaders can only show text, so if there are no alt tags on 
your navigation buttons, these people won't know where there are, or where 
they can go.  Use XSLT to make sure every image gets his alt tag.

Part 3: Site structure
------------------------------
The best way you can structure your site is using one keyword for every 
part of the site.  Use that keyword are the name of the directory where the 
files will be located.  Of course in cocoon you don't really need to place 
the xml files in that directory.  A pipeline that mimics the directory is 
good too.  The robot only need to think the html files are in a certain 
directory.
Links between your pages are VERY important to Google, the most important 
search engine.  A good internal link structure is very important .  Make 
sure you link your pages to each other where possible.  The best thing you 
can do is use a regular html menu structure (like the cocoon site).
The home page is regarded as the most important page in the site (like a 
book cover).  The links to and from that page are therefor important 
too.  So DON'T use splash screens! they will mess up the whole site structure.
Another reason why you shouldn't use splash screens is that robots don't 
follow more than two or three links deep.  So if you use a splash screen 
you lose one level!
Use a sitemap and link it to the homepage.  It's even better if you don't 
call it "sitemap" and add some regular text on the page.  AltaVista won't 
follow or index pages containing only links.  I guess you get the 
picture.  Cocoon can generate a sitemap automatically.

Part 4: external links
------------------------------
Google was the first one to use links between sites as a measure of the 
relevance.  Sites having a lot of links to them are called authorities. 
Sites having a lot of links to other sites are called hubs.  Both of these 
types have a higher relevance.  The last few years Google has added a 
relevance to these links by adding "themes".  Only links to and from your 
site of sites that have the same "theme" are regarded relevant.  So beware 
who links to you and who you link to.
Check the number of links to your site in search engines.  Most of the 
times you can do this by using link:www.yoursite.com
IBM has done a study where they found that only half the web is linked 
together.  The other half has links to or from the central part.  There are 
also some islands that don't even link to that main chunk.  IBM called this 
the bow tie theory.  To the left are the sites that link to the knot.  The 
knot contains sites that are linked both ways and to the right are sites 
that have links from the knot to them.  If you think about the way people 
surf the web, clicking from one page to another, from one site to the next, 
you can see that it is very important to be in the center part.  Get as 
many *relevant* links to and from your site as possible.

Part 5: submit
---------------------
A. Submit to major SE's
That you need to submit your pages is pretty relevant, but don't overdo 
it.  If you submit too many times you can get on the spam list!  Check the 
server logs to see if your site is indexed.  If not, compare it with the 
search engine's help to see how long it normally takes to index a site.
Don't assume that your site will be indexed because you submitted.  Keep 
track of your submissions!
In an ideal situation you submit only the home page or the domain.  Let the 
robot find the rest, even if it takes some time.  This will give a higher 
relevance to the pages the robot found himself.
B. Don't submit to thousands of SE's
Most of these SE's are simple lists and can't even be queried.  Some of 
them are SE's for specific topics.  You don't want a business site to be 
listed in a humor search engine, do you?
Focus on the 18 biggest search engines and you'll be amazed what that can 
do for your site.

Part 6: the results
--------------------------
Use the server logs to see
- what keywords are used by people to find your site
- what are the most popular pages (don't change these)
- which page do you need to adjust to get a higher ranking
- what search engine gives a lot of hits
- where are people coming from
- how long are people staying on the site
- what route are they following, adjust your navigation and page content to 
manipulate this
- through what page do they enter the site, find out why
- What robots visited the site

Part 7: other things you should know
-----------------------------------------------------
A. Querystrings (everything behind a ? in the URL)
Most major search engines hate querystrings.  They assume that the query 
strings are used for database access and dynamic page generation.  This can 
give them a "black hole" where they eventually index a complete 
database.  Altavista clearly states that they will index a page with 
querystrings, but won't follow any links.  Google is one of the first to 
start indexing pages with querystrings.  They are very coutious and will go 
only a certain levels.
B. page extentions
This is a good one for M$ haters.  ASP pages are not indexed by 
AltaVista.  Other search engines are a bit cautious too.  Asp clearly 
states: "server side scripting", so dynamic pages and possible hell for robots.
C. Give the robots what they want: well structured HTML
Cocoon is a perfect platform for this.  Through XSLT you can create perfect 
pages each and every time.  Providing your XSL files are perfect, of 
course.  When pages are created manually, chances are that some human error 
is made on some page and that the HTML doesn't render correctly.
D. Don't try to fool robots
It's very easy in Cocoon to provide different content to robots than you 
provide a regular visitor.  Robots have their own client name.  Google's 
robot is called "googlebot".  The robot AltaVista is using is called 
"scooter".  If you provide a similar but optimized content you could get 
away with it, but if you use a different, more popular, content you are 
luring visitors to your site under false pretenses.  Search engines hate 
this since their users don't get what they look for.  They look for "sex" 
(the most important keyword on the net) and get a site about 
spaghetti.  Sites that use this technique will be put on a spam list and be 
banned from the search engines.  If the spam violations keep coming the IP 
address can get banned.  You can guess how happy your webmaster will be 
when he hears he has to move all his virtual hosts to another IP 
address.  You can start looking for another host right there and then.
E. Getting content
One of the major concerns in site creation is getting the content.  This is 
the string point of Cocoon.  By separating presentation, logic and content 
it's a LOT easier.  Using the right XML editor, like XMLSpy it's even easy 
for a customer to enter the texts themselves without corrupting the logic 
or navigation.  If you let the client edit the navigation XML files (ex. 
the book.xml files in the cocoon documentation) the client can update the 
site on his own.
If you add dynamic form creation for updating the navigation files and 
wysiwyg editing for the actual content.  Combine this with user 
authentication and you've got one hell of a platform that gives the client 
total control over the site without any need of knowledge of the technology 
behind it. This is what I'm building BTW.
H. Browser support
Using the client detection and different XSLT files it is rather easy to 
create a site that can be viewed by all browsers.  Make sure your site is 
visitable by
- IE 4 and up
- Netscape 4 and up
- Opera 4 and up
- Lynx (for the visually impaired a perfect test)
This way you can be sure you don't miss out on visitors simply because your 
site doesn't look right in their browser.  It's a bit of work, but always 
keep in mind that you must adjust yourself to the visitor and not the other 
way around.
I. Page content
Make sure EVERY page of your site answers these questions when the visitor 
get there
- Where am I?
- What can I find here?
- Where can I go?
If you don't answer one of these questions the visitor will leave, and 
that's NOT what we want.


Part8: Want to know more about SEO, check out these sites
----------------------------------------------------------------------------------------

Firstplace Software
         Lots of info and unique promotion- and optimisation software
         http://www.firstplacesoftware.com

Search Engine Watchb
         Search engine and spider info
         http://www.seaerchenginewatch.com

Search Engine World
         Search engine and spider info
         http://www.searchengineworld.com

Cre8PC
         Web design, tools, tutorials and web promotion
         http://www.cre8PC.com

AIM Pro
         Internet Marketing tips, tools and services
         http://www.aim-pro.com

SmallZine
         http://www.smallzine.nl/

About.com Web desing tips
         Pure web design
         http://webdesign.about.com/cs/designtips/

Webmonkey
         A compilation of information
         http://hotwired.lycos.com/webmonkey/

AnyBrowser
Check your browser compatibility
         http://www.anybrowser.com/

Xenu link checking tool
         http://home.snafu.de/tilman/xenulink.html

Hitboxb
         Web traffice analyser
         http://get.hitbox.com/cgi-bin/getit.cgi?hb&hb_intro

WordTracker
         Find the right keywords for your site
         http://www.wordtracker.com/

I-Marketeer
         Internet Marketing in Belgiƫ
         http://www.i-marketeer.com/

Search Engine Optimization Strategies
         All kind of info regarding SEO
         http://strategies.topsitelistings.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Search Engine Optimization and Cocoon

Posted by Andrew Savory <an...@luminas.co.uk>.
On Fri, 12 Apr 2002, Stefano Mazzocchi wrote:

> Andrew Savory wrote:
>
> > also be interested in knowing if anyone out there in cocoon-land has
> > looked at initiatives such as OAI?
>
> No, I haven't, but I'm curious.

Ok: the Open Archives Initiative [1] "develops and promotes
interoperability standards that aim to facilitate the efficient
dissemination of content". I'm hoping to find a way to provide support for
the OAI protocol in Cocoon ... but I'm still at the spec-reading stage at
the moment.

[1] http://www.openarchives.org/


Andrew.

-- 
Andrew Savory                                Email: andrew@luminas.co.uk
Managing Director                              Tel:  +44 (0)870 741 6658
Luminas Internet Applications                  Fax:  +44 (0)870 28 47489
This is not an official statement or order.    Web:    www.luminas.co.uk


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Search Engine Optimization and Cocoon

Posted by Stefano Mazzocchi <st...@apache.org>.
Andrew Savory wrote:
> 
> On Fri, 12 Apr 2002, Bert Van Kets wrote:
> 
> > I am willing to spend some time to compile a text that explains how the
> > benefits of Cocoon can be used in SEO.  Anybody interested?
> 
> Sounds good. One of the things we're focusing on is using Cocoon to give
> better accessibility to the "dark web"

Oh, I *love* that term. I've been speaking around this issue in the last
Open Content Management conference in Zurich (see the slides in my
homepage) but I'll be using this term in the future.

The 'dark matter of the web'... very cool.

>, and making content search-visible
> is an important part of that. When we've finished building the latest site
> I hope to make our notes on this available (perhaps added to yours?). I'd
> also be interested in knowing if anyone out there in cocoon-land has
> looked at initiatives such as OAI?

No, I haven't, but I'm curious.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Re: Search Engine Optimization and Cocoon

Posted by Andrew Savory <an...@luminas.co.uk>.
On Fri, 12 Apr 2002, Bert Van Kets wrote:

> I am willing to spend some time to compile a text that explains how the
> benefits of Cocoon can be used in SEO.  Anybody interested?

Sounds good. One of the things we're focusing on is using Cocoon to give
better accessibility to the "dark web", and making content search-visible
is an important part of that. When we've finished building the latest site
I hope to make our notes on this available (perhaps added to yours?). I'd
also be interested in knowing if anyone out there in cocoon-land has
looked at initiatives such as OAI?


Andrew.

-- 
Andrew Savory                                Email: andrew@luminas.co.uk
Managing Director                              Tel:  +44 (0)870 741 6658
Luminas Internet Applications                  Fax:  +44 (0)870 28 47489
This is not an official statement or order.    Web:    www.luminas.co.uk



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org