You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Markus Jelsma (JIRA)" <ji...@apache.org> on 2015/07/21 17:09:04 UTC
[jira] [Created] (NUTCH-2065) Domain URL filter to support
protocols
Markus Jelsma created NUTCH-2065:
------------------------------------
Summary: Domain URL filter to support protocols
Key: NUTCH-2065
URL: https://issues.apache.org/jira/browse/NUTCH-2065
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.10
Reporter: Markus Jelsma
Fix For: 1.11
The filter allows all protocols for all whitelisted domains, hosts or suffixes but it usually makes little sense to index both http and https URL's of the same domain. This is not unlike the host URL filter, which prevents indexing of duplicate hosts e.g. apache.org and www.apache.org.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)