You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by Olly Betts <ol...@muscat.co.uk> on 1997/08/28 11:03:05 UTC

Re: mod_dir/1057: Web robots should be told not to index auto-generated index pages

In message <19...@hyperreal.org>, brian@hyperreal.org writes:
>Synopsis: Web robots should be told not to index auto-generated index pages
>
>State-Changed-From-To: open-closed
>State-Changed-By: brian
>State-Changed-When: Wed Aug 27 10:47:33 PDT 1997
>State-Changed-Why:
>We talked about it on the developers list, and don't necessarily
>agree that index pages shouldn't be indexed by robots.  If
>you want to add custom META tags to your pages, you can set
>"IndexOptions SuppressHTMLPreamble", and then put a full HTML <HEAD>
>section in HEADER.html in each directory.
>
>

However, this relies on a majority of web page authors being savvy enough to
know about the protocol, get their admin to add the IndexOptions line and to
remember to copy HEADER.html into every directory.  I think this is at best
optimistic.

Does anyone really disagree that marking auto-index pages as
"noindex,follow" *by default* is not a good idea?  This is what my
suggestion amounts to, since it could be overridden as you describe.

The real problem robots have with the current situation is that (assuming
the robot author even appreciates the problem) it is hard to come up with a
reliable way to determine if a page is an auto-generated index page.

Olly