You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/07/21 17:09:04 UTC

[jira] [Created] (NUTCH-2065) Domain URL filter to support protocols

Markus Jelsma created NUTCH-2065:
------------------------------------

             Summary: Domain URL filter to support protocols
                 Key: NUTCH-2065
                 URL: https://issues.apache.org/jira/browse/NUTCH-2065
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.10
            Reporter: Markus Jelsma
             Fix For: 1.11


The filter allows all protocols for all whitelisted domains, hosts or suffixes but it usually makes little sense to index both http and https URL's of the same domain. This is not unlike the host URL filter, which prevents indexing of duplicate hosts e.g. apache.org and www.apache.org.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)