You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2021/04/06 14:54:00 UTC

[jira] [Resolved] (NUTCH-2859) urlnormalizer-protocol: allow to normalize domains

     [ https://issues.apache.org/jira/browse/NUTCH-2859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel resolved NUTCH-2859.
------------------------------------
    Resolution: Implemented

Thanks for the review, [~markus17]!

> urlnormalizer-protocol: allow to normalize domains
> --------------------------------------------------
>
>                 Key: NUTCH-2859
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2859
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin, urlnormalizer
>    Affects Versions: 1.18
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.19
>
>
> The plugin urlnormalizer-protocol normalizes the URL protocol/scheme for a given list of hosts to the desired "normal" protocol (usually one of http or https). It would be handy to allow to specify domain names as well, so that all hosts/subdomains in a given domain are normalized.
> In order to stay backward-compatible this could be done by matching {{*.example.org}} as a pattern for all hosts or subdomains of the domain {{example.org}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)