You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Markus Schuch (JIRA)" <ji...@apache.org> on 2017/02/27 20:48:45 UTC

[jira] [Created] (CONNECTORS-1392) Add option for Web connector to ignore robots instructions in meta tags and rel attributes

Markus Schuch created CONNECTORS-1392:
-----------------------------------------

             Summary: Add option for Web connector to ignore robots instructions in meta tags and rel attributes
                 Key: CONNECTORS-1392
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1392
             Project: ManifoldCF
          Issue Type: New Feature
          Components: Web connector
            Reporter: Markus Schuch


The Web connectors already allows to ignore robots.txt by option.

With this ticket, another option is added, to allow the connector to ignore robots instructions in {{<meta name="robots ...}} tags and {{<a ... rel="nofollow" ...}} attributes.

*First proposal*

Reuse the existing "Robots.txt usage" option in the "Robots" Tab. Rename the existing options:
# Don't look at robots.txt, meta robots and rel attributes
# Obey robots.txt, meta robots tags and rel attributes for data fetches only
# Obey robots.txt, meta robots tags and rel attributes _(the default)_

The end user doc needs to be updated.

Google ressources on robot instructions in HTML pages:
[0] https://support.google.com/webmasters/answer/79812?hl=en&ctx=cb&src=cb&cbid=tnnsjq5jcodt&cbrank=4
[1] https://support.google.com/webmasters/answer/96569?hl=en&ctx=cb&src=cb&cbid=-5rmggrfsp2rq&cbrank=3



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)