You are viewing a plain text version of this content. The canonical link for it is here.
Posted to infrastructure-issues@apache.org by "Daniel Kulp (JIRA)" <ji...@apache.org> on 2009/12/14 15:43:18 UTC

[jira] Created: (INFRA-2375) robots.txt for confluence needs help

robots.txt for confluence needs help
------------------------------------

                 Key: INFRA-2375
                 URL: https://issues.apache.org/jira/browse/INFRA-2375
             Project: Infrastructure
          Issue Type: Bug
      Security Level: public (Regular issues)
          Components: Confluence
            Reporter: Daniel Kulp


 It only has: 
 Disallow: /confluence/ 
 
 
 That means the "static" content for all the spaces is indexable by the crawlers. For sites that are copying the content to their project spaces, that means it's getting indexed at both cwiki and in the "real" spots. In many cases, the cwiki pages are showing up in search results at google instead of the real pages. 
 
 Basically, we need a way for each space to "opt out" of being indexed on cwiki. 
 
 For the short term, can we add: 
 
 Disallow: /CXF/ 
 Disallow: /CXF20DOC/ 
 Disallow: /ACTIVEMQ/ 
 Disallow: /CAMEL/ 
 Disallow: /SM/ 
 Disallow: /SMX3/ 
 Disallow: /SMX4/ 
 Disallow: /SMX4KNL/ 
 Disallow: /SMX4NMR/ 
 Disallow: /SMX4RUN/ 
 Disallow: /SMXCOMP/ 
 Disallow: /TUSCANY/ 
 
 
 Probably a bunch of others as well. I almost want to suggest default is disallowed with an "Opt In" per space, just not sure how to accomplish that. 
 
 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (INFRA-2375) robots.txt for confluence needs help

Posted by "Gavin (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/INFRA-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gavin closed INFRA-2375.
------------------------

    Resolution: Fixed

I've added those entries above.

Opt out would be better, as it is there are far more cwikis that are not exported than those that are.
The politics of this discussion can be raised on the infra list if needed.

> robots.txt for confluence needs help
> ------------------------------------
>
>                 Key: INFRA-2375
>                 URL: https://issues.apache.org/jira/browse/INFRA-2375
>             Project: Infrastructure
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: Confluence
>            Reporter: Daniel Kulp
>
>  It only has: 
>  Disallow: /confluence/ 
>  
>  
>  That means the "static" content for all the spaces is indexable by the crawlers. For sites that are copying the content to their project spaces, that means it's getting indexed at both cwiki and in the "real" spots. In many cases, the cwiki pages are showing up in search results at google instead of the real pages. 
>  
>  Basically, we need a way for each space to "opt out" of being indexed on cwiki. 
>  
>  For the short term, can we add: 
>  
>  Disallow: /CXF/ 
>  Disallow: /CXF20DOC/ 
>  Disallow: /ACTIVEMQ/ 
>  Disallow: /CAMEL/ 
>  Disallow: /SM/ 
>  Disallow: /SMX3/ 
>  Disallow: /SMX4/ 
>  Disallow: /SMX4KNL/ 
>  Disallow: /SMX4NMR/ 
>  Disallow: /SMX4RUN/ 
>  Disallow: /SMXCOMP/ 
>  Disallow: /TUSCANY/ 
>  
>  
>  Probably a bunch of others as well. I almost want to suggest default is disallowed with an "Opt In" per space, just not sure how to accomplish that. 
>  
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.