You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Ovidiu Predescu <ov...@cup.hp.com> on 2002/04/12 19:44:24 UTC
Re: Search Engine Optimization and Cocoon
On Fri, 12 Apr 2002 12:40:10 +0200, Bert Van Kets <be...@vankets.com> wrote:
> One of my *things* is Search Engine Optimization or SEO. SEO tries to
> define the rules a site must comply to to be "found" by search engines and
> thus get a lot of relevant visitors. I found Cocoon to be a perfect
> platform to do server side programming that can be hidden entirely from the
> client. Even page optimization is possible.
> There is a rather important rule that is broken by some samples: "Some
> major robots (programs that requests and index the content of your site)
> will not index your page and thus not follow the links in it if you use a
> querystring in the URL." This rule is not important if you use forms,
> since robots will never enter data and submit a form. I hope that passing
> parameters in the URL will not be a "standard" thing in Cocoon.
> I am willing to spend some time to compile a text that explains how the
> benefits of Cocoon can be used in SEO. Anybody interested?
I am interested in this, as I was planning to use a parameter to store
continuation ids for the flow engine, instead of encoding it in the
URL.
Please let me know when you compile the text, I'm very interested in
reading this.
Thanks,
Ovidiu
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
RE: Search Engine Optimization and Cocoon (long!)
Posted by Bert Van Kets <be...@visitronics.be>.
I see your point. It's just that IIRC 301 is not always correctly
interpreted. This can be a problem.
I was just pointing out that removing a page from the server also removes
it from the index once the robot has passed since the removal from teh
index will be triggered by the 404. That's all.
Bert
At 08:43 16/04/2002 +1200, you wrote:
>Are you sure, Bert? I can see how a search engine would be suspicious of a
>META REFRESH tag because the search engine would be ranking the page which
>contained the META tag (and a whole lot of key words), but the user will be
>sent to some different page. So I certainly wouldn't recommend using a html
>META tag! But if a Search bot receives a 301 code as part of the HTTP
>transaction, then the old (e.g. Britney Spears) page has disappeared, and
>the search engine should follow the link to find the new (spaghetti) page.
>WHen it finds a 301, the search engine should trash the old page in its
>index, just as with a 404, so there's no chance of "britney spears" turning
>into "spaghetti" again. To use this technique to spam search engines you'd
>have to return the "britney spears" page to the search engines and the 301
>code to browsers ("cloaking"). If you're going to do this, then you may as
>well return the spaghetti page directly, rather than a 301 link to it. So
>why should they care about the 301?
>
>http://www.google.com/remove.html#change_url
>
>I'm no expert on search engines really ... and you could well be right
>(which search engines are averse to 301 codes?) but I stuck my oar in
>because it seemed to me that it's a waste of a perfectly good hit to return
>a 404 unnecessarily. With Cocoon this kind of thing can be handled so easily
>that I don't think "search engine optimization" should be an excuse for
>turning away your EXISTING users who ALREADY HAVE an (old) link to your
>site.
>
>Con
>
> > -----Original Message-----
> > From: Bert Van Kets [mailto:bert@visitronics.be]
> > Sent: Monday, 15 April 2002 06:19
> > To: cocoon-dev@xml.apache.org
> > Subject: RE: Search Engine Optimization and Cocoon (long!)
> >
> >
> > I meant that a 404 is the signal for th robot to remove the
> > file from the
> > index. A 301 is, wrongly, interpreted as a "meta refresh" 307.
> > The meta refresh is used in a technique where a page
> > containing a meta
> > refresh is optimized for a specific, very popular, keyword is
> > promoted but
> > the visitor is redirected to a completely different content. ex. You
> > search for "Britmey Spears", but get redirected to a page
> > about spaghetti.
> > Search engines want to give good relevant results, so they hate this
> > technique. You can get listed as a spammer for this.
> > Although technically
> > a 301 is more correct, it's not good for SEO!
> > Don't use it unless SEO is not important for the site.
> > Bert
> >
> > At 10:46 14/04/2002 +1200, you wrote:
> > >Don't use a 404 to signal that a URL has changed: use a 301 "Moved
> > >Permanently".
> > >http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.2
> > >
> > > > -----Original Message-----
> > > > From: Bert Van Kets [mailto:bert@visitronics.be]
> > > > Sent: Sunday, 14 April 2002 08:57
> > > > To: cocoon-dev@xml.apache.org
> > > > Subject: Re: Search Engine Optimization and Cocoon (long!)
> > > >
> > > >
> > > > If you get visits from search engines to those pages it would
> > > > be crazy to
> > > > get rid of those links. I would however install the new
> > > > pages without the
> > > > query string and try to get as high as possible with as
> > many SE's as
> > > > possible. Once those new links do their work, you have an
> > > > alternative and
> > > > you can get rid of the old links if you want.
> > > > Most major SE's don't like different links to the same
> > page, so it is
> > > > actually an advantage of having only one good link to a page.
> > > > Then again,
> > > > it's only a disadvantage if you get caught, so you can keep
> > > > the old link
> > > > with some spam risk involved.
> > > >
> > > > Robots will revisit your site after a certain time. Most
> > > > have an interval
> > > > of about 1 month. If it gets a 404 page not found it will
> > > > erase that page
> > > > from the index. That will get rid of the old links
> > > > automatically. You can
> > > > get rid of them manually by resubmitting them, but that's a lot of
> > > > work. If they find the new links, they will index the pages.
> > > > If some
> > > > pages are in a search engine, it will get out and get the
> > new pages
> > > > automatically.
> > > >
> > > > Keeping track of the logs is *always* a good idea!
> > > >
> > > > BTW: Don't confuse Search Engines with directories. Search
> > > > Engines use a
> > > > robot to index the content of your site. Directories like
> > > > Yahoo en Open
> > > > Directory (dmoz.com) have humans look at the site and quote
> > > > it. A clear,
> > > > good content is all you can do for these guys.
> > > >
> > > > Bert
> > > >
> > > > At 06:13 13/04/2002 -0400, you wrote:
> > > > >Bert,
> > > > >
> > > > >Thanks for the great read!
> > > > >
> > > > >>Part 7: other things you should know
> > > > >>-----------------------------------------------------
> > > > >>A. Querystrings (everything behind a ? in the URL)
> > > > >>Most major search engines hate querystrings. They assume
> > > > that the query
> > > > >>strings are used for database access and dynamic page
> > > > generation. This
> > > > >>can give them a "black hole" where they eventually
> > index a complete
> > > > >>database. Altavista clearly states that they will index a
> > > > page with
> > > > >>querystrings, but won't follow any links. Google is one of
> > > > the first to
> > > > >>start indexing pages with querystrings. They are very
> > > > coutious and will
> > > > >>go only a certain levels.
> > > > >
> > > > >Now that we can effectively rewrite page URLs without query
> > > > strings using
> > > > >C2, do you think it's simply a matter of resubmitting to
> > > > search engines to
> > > > >remove any existing search engine links to the "old"
> > pages? In the
> > > > >meantime, I suppose we could leave up a pipeline up, that
> > > > maps the old
> > > > >URLs with query strings to the new URL without query strings
> > > > (and monitor
> > > > >logs to determine when/if to delete them down the road.)
> > > > >
> > > > >Would that be your approach for updating old sites?
> > > > >
> > > > >Diana
> > > > >
> > > > >
> > > >
> > >---------------------------------------------------------------------
> > > > >To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > > > >For additional commands, email: cocoon-dev-help@xml.apache.org
> > > >
> > > >
> > > >
> > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > > > For additional commands, email: cocoon-dev-help@xml.apache.org
> > > >
> > > >
> > >
> > >
> > >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > >For additional commands, email: cocoon-dev-help@xml.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > For additional commands, email: cocoon-dev-help@xml.apache.org
> >
> >
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
>For additional commands, email: cocoon-dev-help@xml.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
RE: Search Engine Optimization and Cocoon (long!)
Posted by Conal Tuohy <co...@paradise.net.nz>.
Are you sure, Bert? I can see how a search engine would be suspicious of a
META REFRESH tag because the search engine would be ranking the page which
contained the META tag (and a whole lot of key words), but the user will be
sent to some different page. So I certainly wouldn't recommend using a html
META tag! But if a Search bot receives a 301 code as part of the HTTP
transaction, then the old (e.g. Britney Spears) page has disappeared, and
the search engine should follow the link to find the new (spaghetti) page.
WHen it finds a 301, the search engine should trash the old page in its
index, just as with a 404, so there's no chance of "britney spears" turning
into "spaghetti" again. To use this technique to spam search engines you'd
have to return the "britney spears" page to the search engines and the 301
code to browsers ("cloaking"). If you're going to do this, then you may as
well return the spaghetti page directly, rather than a 301 link to it. So
why should they care about the 301?
http://www.google.com/remove.html#change_url
I'm no expert on search engines really ... and you could well be right
(which search engines are averse to 301 codes?) but I stuck my oar in
because it seemed to me that it's a waste of a perfectly good hit to return
a 404 unnecessarily. With Cocoon this kind of thing can be handled so easily
that I don't think "search engine optimization" should be an excuse for
turning away your EXISTING users who ALREADY HAVE an (old) link to your
site.
Con
> -----Original Message-----
> From: Bert Van Kets [mailto:bert@visitronics.be]
> Sent: Monday, 15 April 2002 06:19
> To: cocoon-dev@xml.apache.org
> Subject: RE: Search Engine Optimization and Cocoon (long!)
>
>
> I meant that a 404 is the signal for th robot to remove the
> file from the
> index. A 301 is, wrongly, interpreted as a "meta refresh" 307.
> The meta refresh is used in a technique where a page
> containing a meta
> refresh is optimized for a specific, very popular, keyword is
> promoted but
> the visitor is redirected to a completely different content. ex. You
> search for "Britmey Spears", but get redirected to a page
> about spaghetti.
> Search engines want to give good relevant results, so they hate this
> technique. You can get listed as a spammer for this.
> Although technically
> a 301 is more correct, it's not good for SEO!
> Don't use it unless SEO is not important for the site.
> Bert
>
> At 10:46 14/04/2002 +1200, you wrote:
> >Don't use a 404 to signal that a URL has changed: use a 301 "Moved
> >Permanently".
> >http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.2
> >
> > > -----Original Message-----
> > > From: Bert Van Kets [mailto:bert@visitronics.be]
> > > Sent: Sunday, 14 April 2002 08:57
> > > To: cocoon-dev@xml.apache.org
> > > Subject: Re: Search Engine Optimization and Cocoon (long!)
> > >
> > >
> > > If you get visits from search engines to those pages it would
> > > be crazy to
> > > get rid of those links. I would however install the new
> > > pages without the
> > > query string and try to get as high as possible with as
> many SE's as
> > > possible. Once those new links do their work, you have an
> > > alternative and
> > > you can get rid of the old links if you want.
> > > Most major SE's don't like different links to the same
> page, so it is
> > > actually an advantage of having only one good link to a page.
> > > Then again,
> > > it's only a disadvantage if you get caught, so you can keep
> > > the old link
> > > with some spam risk involved.
> > >
> > > Robots will revisit your site after a certain time. Most
> > > have an interval
> > > of about 1 month. If it gets a 404 page not found it will
> > > erase that page
> > > from the index. That will get rid of the old links
> > > automatically. You can
> > > get rid of them manually by resubmitting them, but that's a lot of
> > > work. If they find the new links, they will index the pages.
> > > If some
> > > pages are in a search engine, it will get out and get the
> new pages
> > > automatically.
> > >
> > > Keeping track of the logs is *always* a good idea!
> > >
> > > BTW: Don't confuse Search Engines with directories. Search
> > > Engines use a
> > > robot to index the content of your site. Directories like
> > > Yahoo en Open
> > > Directory (dmoz.com) have humans look at the site and quote
> > > it. A clear,
> > > good content is all you can do for these guys.
> > >
> > > Bert
> > >
> > > At 06:13 13/04/2002 -0400, you wrote:
> > > >Bert,
> > > >
> > > >Thanks for the great read!
> > > >
> > > >>Part 7: other things you should know
> > > >>-----------------------------------------------------
> > > >>A. Querystrings (everything behind a ? in the URL)
> > > >>Most major search engines hate querystrings. They assume
> > > that the query
> > > >>strings are used for database access and dynamic page
> > > generation. This
> > > >>can give them a "black hole" where they eventually
> index a complete
> > > >>database. Altavista clearly states that they will index a
> > > page with
> > > >>querystrings, but won't follow any links. Google is one of
> > > the first to
> > > >>start indexing pages with querystrings. They are very
> > > coutious and will
> > > >>go only a certain levels.
> > > >
> > > >Now that we can effectively rewrite page URLs without query
> > > strings using
> > > >C2, do you think it's simply a matter of resubmitting to
> > > search engines to
> > > >remove any existing search engine links to the "old"
> pages? In the
> > > >meantime, I suppose we could leave up a pipeline up, that
> > > maps the old
> > > >URLs with query strings to the new URL without query strings
> > > (and monitor
> > > >logs to determine when/if to delete them down the road.)
> > > >
> > > >Would that be your approach for updating old sites?
> > > >
> > > >Diana
> > > >
> > > >
> > >
> >---------------------------------------------------------------------
> > > >To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > > >For additional commands, email: cocoon-dev-help@xml.apache.org
> > >
> > >
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > > For additional commands, email: cocoon-dev-help@xml.apache.org
> > >
> > >
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> >For additional commands, email: cocoon-dev-help@xml.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
RE: Search Engine Optimization and Cocoon (long!)
Posted by Bert Van Kets <be...@visitronics.be>.
I meant that a 404 is the signal for th robot to remove the file from the
index. A 301 is, wrongly, interpreted as a "meta refresh" 307.
The meta refresh is used in a technique where a page containing a meta
refresh is optimized for a specific, very popular, keyword is promoted but
the visitor is redirected to a completely different content. ex. You
search for "Britmey Spears", but get redirected to a page about spaghetti.
Search engines want to give good relevant results, so they hate this
technique. You can get listed as a spammer for this. Although technically
a 301 is more correct, it's not good for SEO!
Don't use it unless SEO is not important for the site.
Bert
At 10:46 14/04/2002 +1200, you wrote:
>Don't use a 404 to signal that a URL has changed: use a 301 "Moved
>Permanently".
>http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.2
>
> > -----Original Message-----
> > From: Bert Van Kets [mailto:bert@visitronics.be]
> > Sent: Sunday, 14 April 2002 08:57
> > To: cocoon-dev@xml.apache.org
> > Subject: Re: Search Engine Optimization and Cocoon (long!)
> >
> >
> > If you get visits from search engines to those pages it would
> > be crazy to
> > get rid of those links. I would however install the new
> > pages without the
> > query string and try to get as high as possible with as many SE's as
> > possible. Once those new links do their work, you have an
> > alternative and
> > you can get rid of the old links if you want.
> > Most major SE's don't like different links to the same page, so it is
> > actually an advantage of having only one good link to a page.
> > Then again,
> > it's only a disadvantage if you get caught, so you can keep
> > the old link
> > with some spam risk involved.
> >
> > Robots will revisit your site after a certain time. Most
> > have an interval
> > of about 1 month. If it gets a 404 page not found it will
> > erase that page
> > from the index. That will get rid of the old links
> > automatically. You can
> > get rid of them manually by resubmitting them, but that's a lot of
> > work. If they find the new links, they will index the pages.
> > If some
> > pages are in a search engine, it will get out and get the new pages
> > automatically.
> >
> > Keeping track of the logs is *always* a good idea!
> >
> > BTW: Don't confuse Search Engines with directories. Search
> > Engines use a
> > robot to index the content of your site. Directories like
> > Yahoo en Open
> > Directory (dmoz.com) have humans look at the site and quote
> > it. A clear,
> > good content is all you can do for these guys.
> >
> > Bert
> >
> > At 06:13 13/04/2002 -0400, you wrote:
> > >Bert,
> > >
> > >Thanks for the great read!
> > >
> > >>Part 7: other things you should know
> > >>-----------------------------------------------------
> > >>A. Querystrings (everything behind a ? in the URL)
> > >>Most major search engines hate querystrings. They assume
> > that the query
> > >>strings are used for database access and dynamic page
> > generation. This
> > >>can give them a "black hole" where they eventually index a complete
> > >>database. Altavista clearly states that they will index a
> > page with
> > >>querystrings, but won't follow any links. Google is one of
> > the first to
> > >>start indexing pages with querystrings. They are very
> > coutious and will
> > >>go only a certain levels.
> > >
> > >Now that we can effectively rewrite page URLs without query
> > strings using
> > >C2, do you think it's simply a matter of resubmitting to
> > search engines to
> > >remove any existing search engine links to the "old" pages? In the
> > >meantime, I suppose we could leave up a pipeline up, that
> > maps the old
> > >URLs with query strings to the new URL without query strings
> > (and monitor
> > >logs to determine when/if to delete them down the road.)
> > >
> > >Would that be your approach for updating old sites?
> > >
> > >Diana
> > >
> > >
> > >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > >For additional commands, email: cocoon-dev-help@xml.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> > For additional commands, email: cocoon-dev-help@xml.apache.org
> >
> >
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
>For additional commands, email: cocoon-dev-help@xml.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
RE: Search Engine Optimization and Cocoon (long!)
Posted by Conal Tuohy <co...@paradise.net.nz>.
Don't use a 404 to signal that a URL has changed: use a 301 "Moved
Permanently".
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3.2
> -----Original Message-----
> From: Bert Van Kets [mailto:bert@visitronics.be]
> Sent: Sunday, 14 April 2002 08:57
> To: cocoon-dev@xml.apache.org
> Subject: Re: Search Engine Optimization and Cocoon (long!)
>
>
> If you get visits from search engines to those pages it would
> be crazy to
> get rid of those links. I would however install the new
> pages without the
> query string and try to get as high as possible with as many SE's as
> possible. Once those new links do their work, you have an
> alternative and
> you can get rid of the old links if you want.
> Most major SE's don't like different links to the same page, so it is
> actually an advantage of having only one good link to a page.
> Then again,
> it's only a disadvantage if you get caught, so you can keep
> the old link
> with some spam risk involved.
>
> Robots will revisit your site after a certain time. Most
> have an interval
> of about 1 month. If it gets a 404 page not found it will
> erase that page
> from the index. That will get rid of the old links
> automatically. You can
> get rid of them manually by resubmitting them, but that's a lot of
> work. If they find the new links, they will index the pages.
> If some
> pages are in a search engine, it will get out and get the new pages
> automatically.
>
> Keeping track of the logs is *always* a good idea!
>
> BTW: Don't confuse Search Engines with directories. Search
> Engines use a
> robot to index the content of your site. Directories like
> Yahoo en Open
> Directory (dmoz.com) have humans look at the site and quote
> it. A clear,
> good content is all you can do for these guys.
>
> Bert
>
> At 06:13 13/04/2002 -0400, you wrote:
> >Bert,
> >
> >Thanks for the great read!
> >
> >>Part 7: other things you should know
> >>-----------------------------------------------------
> >>A. Querystrings (everything behind a ? in the URL)
> >>Most major search engines hate querystrings. They assume
> that the query
> >>strings are used for database access and dynamic page
> generation. This
> >>can give them a "black hole" where they eventually index a complete
> >>database. Altavista clearly states that they will index a
> page with
> >>querystrings, but won't follow any links. Google is one of
> the first to
> >>start indexing pages with querystrings. They are very
> coutious and will
> >>go only a certain levels.
> >
> >Now that we can effectively rewrite page URLs without query
> strings using
> >C2, do you think it's simply a matter of resubmitting to
> search engines to
> >remove any existing search engine links to the "old" pages? In the
> >meantime, I suppose we could leave up a pipeline up, that
> maps the old
> >URLs with query strings to the new URL without query strings
> (and monitor
> >logs to determine when/if to delete them down the road.)
> >
> >Would that be your approach for updating old sites?
> >
> >Diana
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> >For additional commands, email: cocoon-dev-help@xml.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: Search Engine Optimization and Cocoon (long!)
Posted by Bert Van Kets <be...@visitronics.be>.
If you get visits from search engines to those pages it would be crazy to
get rid of those links. I would however install the new pages without the
query string and try to get as high as possible with as many SE's as
possible. Once those new links do their work, you have an alternative and
you can get rid of the old links if you want.
Most major SE's don't like different links to the same page, so it is
actually an advantage of having only one good link to a page. Then again,
it's only a disadvantage if you get caught, so you can keep the old link
with some spam risk involved.
Robots will revisit your site after a certain time. Most have an interval
of about 1 month. If it gets a 404 page not found it will erase that page
from the index. That will get rid of the old links automatically. You can
get rid of them manually by resubmitting them, but that's a lot of
work. If they find the new links, they will index the pages. If some
pages are in a search engine, it will get out and get the new pages
automatically.
Keeping track of the logs is *always* a good idea!
BTW: Don't confuse Search Engines with directories. Search Engines use a
robot to index the content of your site. Directories like Yahoo en Open
Directory (dmoz.com) have humans look at the site and quote it. A clear,
good content is all you can do for these guys.
Bert
At 06:13 13/04/2002 -0400, you wrote:
>Bert,
>
>Thanks for the great read!
>
>>Part 7: other things you should know
>>-----------------------------------------------------
>>A. Querystrings (everything behind a ? in the URL)
>>Most major search engines hate querystrings. They assume that the query
>>strings are used for database access and dynamic page generation. This
>>can give them a "black hole" where they eventually index a complete
>>database. Altavista clearly states that they will index a page with
>>querystrings, but won't follow any links. Google is one of the first to
>>start indexing pages with querystrings. They are very coutious and will
>>go only a certain levels.
>
>Now that we can effectively rewrite page URLs without query strings using
>C2, do you think it's simply a matter of resubmitting to search engines to
>remove any existing search engine links to the "old" pages? In the
>meantime, I suppose we could leave up a pipeline up, that maps the old
>URLs with query strings to the new URL without query strings (and monitor
>logs to determine when/if to delete them down the road.)
>
>Would that be your approach for updating old sites?
>
>Diana
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
>For additional commands, email: cocoon-dev-help@xml.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: Search Engine Optimization and Cocoon (long!)
Posted by Diana Shannon <te...@mac.com>.
Bert,
Thanks for the great read!
> Part 7: other things you should know
> -----------------------------------------------------
> A. Querystrings (everything behind a ? in the URL)
> Most major search engines hate querystrings. They assume that the
> query strings are used for database access and dynamic page
> generation. This can give them a "black hole" where they eventually
> index a complete database. Altavista clearly states that they will
> index a page with querystrings, but won't follow any links. Google is
> one of the first to start indexing pages with querystrings. They are
> very coutious and will go only a certain levels.
Now that we can effectively rewrite page URLs without query strings
using C2, do you think it's simply a matter of resubmitting to search
engines to remove any existing search engine links to the "old" pages?
In the meantime, I suppose we could leave up a pipeline up, that maps
the old URLs with query strings to the new URL without query strings
(and monitor logs to determine when/if to delete them down the road.)
Would that be your approach for updating old sites?
Diana
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org
Re: Search Engine Optimization and Cocoon (long!)
Posted by Bert Van Kets <be...@vankets.com>.
SEO is a very vague and ever changing science. There's a lot to know about
each and every Search engine. The rules are different for every one of
them. I will only give the major, more general rules to get you going.
----------------------------------------------------------------------------------------------------------------------------------------------------------
To know how your site can be found you have to know how people look for
your site. What do you do when you surf the web? You look for textual
information! You enter a keyword or key phrase into a textbox and ask the
search engine to query his database and come up with web pages that are as
relevant as possible.
- Remember the importance of the keyword or keyphrase here. -
Now you probably get that the trick is to know two things:
1) what keywords or keyphrases do people actually use
2) what are the rules that make a page relevant to a specific keyword or
keyphrase (where do you apply the keywords)
Part 1: The keywords used by the visitor
----------------------------------------------------------
A. Try to find them yourself
Brainstorm about ALL possible keywords that can be applied to your
site. Write them all down. You might have about 200 words or
phrases. Don't just sum product names or descriptions of products, but
also think about what people would search for when they don't know your
product. People who don't know what an electric drill is don't search for
a drill but for a "hole in a wall".
Also don't get keywords that are too general. If your site is about second
hand cars, you don't want people searching for new cars or car
parts. Getting relevant visitors is what we are after.
B. See what your competitors are doing
Check out the sites from your competitors. See what keywords they are
using and more importantly the most important words they are using in the
body of the major pages.
C. Filter out the most relevant keywords.
Go over the list and select the keywords you think are best. Keep a list
of about 50 keyphases, knowing that people use keyphrases more than keywords.
D. Test your keywords and keyphrases
Now go out to some of the major search engines like Google, AltaVista,
Excite, MSN, AOL, Netscape, Northern Light, Lycos, etc. and see how many
times your keywords is found. This will give you an idea of the relevance
of the words in sites. Don't use this as a definite guide to the
relevance, only use it as a guide.
If you want to absolutely sure about your keywords you have to use a
database of keyphases entered in a series of search engines. WordTracker
(http://www.wordtracker.com/) is the only service I know of that can do this.
After you have done all this hard work you should have a list of five to
ten major keywords and ten to twenty secondary. It is very important that
you stick with these for a long time, so you'd better be sure you have the
right ones. SEO is a slooooow process and it can take 6 months easily to
see some results of a decision you made.
Part 2: Where do you use the keywords
---------------------------------------------------------
A. What can be indexed? Text, text and nothing but text. Text in graphics
will not be indexed! Why do you think graphics have Alt tags. Alt tags are
compulsory, BTW!
So if you have a site with only a teeny weeny bit of text, how do you
suppose your site can be indexed and found? The first and most important
rule of SEO is content, content and some more content.
B. Pick your page
Pick a page you want to use to promote a keyword. When this page is
indexed the robot is supposed to think the keyword is very important. Use
one page for each keyword you want to promote. It is nearly impossible to
promote multiple keywords on one page.
C. Page title
The page title is the most important part of a page. Just as you judge a
book by it's title a robot will judge the page by the title. Most search
engines like your keyword only once in the title. Too much and you are
regarded spam, and that is the last thing you want.
Make sure the title tag is the first tag below the head tag.
D. The meta keywords and meta description tag.
More and more search engines skip these and try to filter out the keywords
themselves. Add your keyword and some variations like plurals.
The description tag is important since it will be used by the search engine
to describe the page. In the list returned by the search engine most of
the time you will see the page title with the link on it and below it the
description. A good title and a good description can lure visitors by
themselves (provided you can get your page in the top 30 in some search
engines).
E. Page body
Keywords are only found relevant when they appear in the body of the
page. Use your keyword in the first and the last paragraph. That's where
they will have the most relevance. Use about 400 words in the body.
F. Links
A link on the word spaghetti that points to a page called spaghetti.html in
the directory spaghetti and having a high keyword relevance on spaghetti
MUST be about spaghetti, no? Try to implement two to three links in the
page containing your keyword.
This is VERY important: use ONLY <a href> links for your
navigation. Hardly any robot will interpret JavaScript, so window.location
links are not followed. Robots don't submit pages either. Don't rely on
navigation using <input> tags.
Flash or Applets are not followed either! So forget about the fancy,
flashy sites. They don't get anywhere. Cocoon makes navigation
maintenance easy, why use client side scripts?
G. Alt tags
Alt tags are the only way to get the content of your images indexed. Use them!
If you want to address as many visitors as possible, don't forget the
visually impaired and give them something they can see in their own
way. Braillereaders can only show text, so if there are no alt tags on
your navigation buttons, these people won't know where there are, or where
they can go. Use XSLT to make sure every image gets his alt tag.
Part 3: Site structure
------------------------------
The best way you can structure your site is using one keyword for every
part of the site. Use that keyword are the name of the directory where the
files will be located. Of course in cocoon you don't really need to place
the xml files in that directory. A pipeline that mimics the directory is
good too. The robot only need to think the html files are in a certain
directory.
Links between your pages are VERY important to Google, the most important
search engine. A good internal link structure is very important . Make
sure you link your pages to each other where possible. The best thing you
can do is use a regular html menu structure (like the cocoon site).
The home page is regarded as the most important page in the site (like a
book cover). The links to and from that page are therefor important
too. So DON'T use splash screens! they will mess up the whole site structure.
Another reason why you shouldn't use splash screens is that robots don't
follow more than two or three links deep. So if you use a splash screen
you lose one level!
Use a sitemap and link it to the homepage. It's even better if you don't
call it "sitemap" and add some regular text on the page. AltaVista won't
follow or index pages containing only links. I guess you get the
picture. Cocoon can generate a sitemap automatically.
Part 4: external links
------------------------------
Google was the first one to use links between sites as a measure of the
relevance. Sites having a lot of links to them are called authorities.
Sites having a lot of links to other sites are called hubs. Both of these
types have a higher relevance. The last few years Google has added a
relevance to these links by adding "themes". Only links to and from your
site of sites that have the same "theme" are regarded relevant. So beware
who links to you and who you link to.
Check the number of links to your site in search engines. Most of the
times you can do this by using link:www.yoursite.com
IBM has done a study where they found that only half the web is linked
together. The other half has links to or from the central part. There are
also some islands that don't even link to that main chunk. IBM called this
the bow tie theory. To the left are the sites that link to the knot. The
knot contains sites that are linked both ways and to the right are sites
that have links from the knot to them. If you think about the way people
surf the web, clicking from one page to another, from one site to the next,
you can see that it is very important to be in the center part. Get as
many *relevant* links to and from your site as possible.
Part 5: submit
---------------------
A. Submit to major SE's
That you need to submit your pages is pretty relevant, but don't overdo
it. If you submit too many times you can get on the spam list! Check the
server logs to see if your site is indexed. If not, compare it with the
search engine's help to see how long it normally takes to index a site.
Don't assume that your site will be indexed because you submitted. Keep
track of your submissions!
In an ideal situation you submit only the home page or the domain. Let the
robot find the rest, even if it takes some time. This will give a higher
relevance to the pages the robot found himself.
B. Don't submit to thousands of SE's
Most of these SE's are simple lists and can't even be queried. Some of
them are SE's for specific topics. You don't want a business site to be
listed in a humor search engine, do you?
Focus on the 18 biggest search engines and you'll be amazed what that can
do for your site.
Part 6: the results
--------------------------
Use the server logs to see
- what keywords are used by people to find your site
- what are the most popular pages (don't change these)
- which page do you need to adjust to get a higher ranking
- what search engine gives a lot of hits
- where are people coming from
- how long are people staying on the site
- what route are they following, adjust your navigation and page content to
manipulate this
- through what page do they enter the site, find out why
- What robots visited the site
Part 7: other things you should know
-----------------------------------------------------
A. Querystrings (everything behind a ? in the URL)
Most major search engines hate querystrings. They assume that the query
strings are used for database access and dynamic page generation. This can
give them a "black hole" where they eventually index a complete
database. Altavista clearly states that they will index a page with
querystrings, but won't follow any links. Google is one of the first to
start indexing pages with querystrings. They are very coutious and will go
only a certain levels.
B. page extentions
This is a good one for M$ haters. ASP pages are not indexed by
AltaVista. Other search engines are a bit cautious too. Asp clearly
states: "server side scripting", so dynamic pages and possible hell for robots.
C. Give the robots what they want: well structured HTML
Cocoon is a perfect platform for this. Through XSLT you can create perfect
pages each and every time. Providing your XSL files are perfect, of
course. When pages are created manually, chances are that some human error
is made on some page and that the HTML doesn't render correctly.
D. Don't try to fool robots
It's very easy in Cocoon to provide different content to robots than you
provide a regular visitor. Robots have their own client name. Google's
robot is called "googlebot". The robot AltaVista is using is called
"scooter". If you provide a similar but optimized content you could get
away with it, but if you use a different, more popular, content you are
luring visitors to your site under false pretenses. Search engines hate
this since their users don't get what they look for. They look for "sex"
(the most important keyword on the net) and get a site about
spaghetti. Sites that use this technique will be put on a spam list and be
banned from the search engines. If the spam violations keep coming the IP
address can get banned. You can guess how happy your webmaster will be
when he hears he has to move all his virtual hosts to another IP
address. You can start looking for another host right there and then.
E. Getting content
One of the major concerns in site creation is getting the content. This is
the string point of Cocoon. By separating presentation, logic and content
it's a LOT easier. Using the right XML editor, like XMLSpy it's even easy
for a customer to enter the texts themselves without corrupting the logic
or navigation. If you let the client edit the navigation XML files (ex.
the book.xml files in the cocoon documentation) the client can update the
site on his own.
If you add dynamic form creation for updating the navigation files and
wysiwyg editing for the actual content. Combine this with user
authentication and you've got one hell of a platform that gives the client
total control over the site without any need of knowledge of the technology
behind it. This is what I'm building BTW.
H. Browser support
Using the client detection and different XSLT files it is rather easy to
create a site that can be viewed by all browsers. Make sure your site is
visitable by
- IE 4 and up
- Netscape 4 and up
- Opera 4 and up
- Lynx (for the visually impaired a perfect test)
This way you can be sure you don't miss out on visitors simply because your
site doesn't look right in their browser. It's a bit of work, but always
keep in mind that you must adjust yourself to the visitor and not the other
way around.
I. Page content
Make sure EVERY page of your site answers these questions when the visitor
get there
- Where am I?
- What can I find here?
- Where can I go?
If you don't answer one of these questions the visitor will leave, and
that's NOT what we want.
Part8: Want to know more about SEO, check out these sites
----------------------------------------------------------------------------------------
Firstplace Software
Lots of info and unique promotion- and optimisation software
http://www.firstplacesoftware.com
Search Engine Watchb
Search engine and spider info
http://www.seaerchenginewatch.com
Search Engine World
Search engine and spider info
http://www.searchengineworld.com
Cre8PC
Web design, tools, tutorials and web promotion
http://www.cre8PC.com
AIM Pro
Internet Marketing tips, tools and services
http://www.aim-pro.com
SmallZine
http://www.smallzine.nl/
About.com Web desing tips
Pure web design
http://webdesign.about.com/cs/designtips/
Webmonkey
A compilation of information
http://hotwired.lycos.com/webmonkey/
AnyBrowser
Check your browser compatibility
http://www.anybrowser.com/
Xenu link checking tool
http://home.snafu.de/tilman/xenulink.html
Hitboxb
Web traffice analyser
http://get.hitbox.com/cgi-bin/getit.cgi?hb&hb_intro
WordTracker
Find the right keywords for your site
http://www.wordtracker.com/
I-Marketeer
Internet Marketing in Belgiƫ
http://www.i-marketeer.com/
Search Engine Optimization Strategies
All kind of info regarding SEO
http://strategies.topsitelistings.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org