You are viewing a plain text version of this content. The canonical link for it is here.
Posted to infrastructure-issues@apache.org by "Daniel Kulp (JIRA)" <ji...@apache.org> on 2009/12/14 15:35:18 UTC

[jira] Reopened: (INFRA-1343) setup robots.txt and/or other access rules to prevent bots from crawling Continuum pages

     [ https://issues.apache.org/jira/browse/INFRA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Kulp reopened INFRA-1343:
--------------------------------



I'm going to reopen this as the current solution is extremely problematic.

It only has:
Disallow: /confluence/


That means the "static" content for all the spaces is indexable by the crawlers.   For sites that are copying the content to their project spaces, that means it's getting indexed at both cwiki and in the "real" spots.    In many cases, the cwiki pages are showing up in search results at google instead of the real pages. 

Basically, we need a way for each space to "opt out" of being indexed on cwiki.   

For the short term, can we add:

Disallow: /CXF/
Disallow: /CXF20DOC/
Disallow: /ACTIVEMQ/
Disallow: /CAMEL/
Disallow: /SM/
Disallow: /SMX3/
Disallow: /SMX4/
Disallow: /SMX4KNL/
Disallow: /SMX4NMR/
Disallow: /SMX4RUN/
Disallow: /SMXCOMP/
Disallow: /TUSCANY/


Probably a bunch of others as well.    I almost want to suggest default is disallowed with an "Opt In" per space, just not sure how to accomplish that.




> setup robots.txt and/or other access rules to prevent bots from crawling Continuum pages 
> -----------------------------------------------------------------------------------------
>
>                 Key: INFRA-1343
>                 URL: https://issues.apache.org/jira/browse/INFRA-1343
>             Project: Infrastructure
>          Issue Type: Task
>      Security Level: public(Regular issues) 
>          Components: Continuum
>            Reporter: Brett Porter
>
> We don't need search engines crawling the build pages (especially since it can navigate its way all the way through a working copy). It is picking up links from the mails sent out to mailing lists, presumably.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.