You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2021/04/06 14:53:00 UTC
[jira] [Resolved] (NUTCH-2858) urlnormalizer-protocol: URL port is
lost during normalization
[ https://issues.apache.org/jira/browse/NUTCH-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Nagel resolved NUTCH-2858.
------------------------------------
Resolution: Fixed
> urlnormalizer-protocol: URL port is lost during normalization
> -------------------------------------------------------------
>
> Key: NUTCH-2858
> URL: https://issues.apache.org/jira/browse/NUTCH-2858
> Project: Nutch
> Issue Type: Bug
> Components: plugin, urlnormalizer
> Affects Versions: 1.18
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Minor
> Fix For: 1.19
>
>
> If a URL includes a port, e.g. {{http://example.com:8080/}} or {{https://example.com:8443/}}, the port is removed when normalizing using the protocol-urlnormalizer.
> Instead, if the port is set,
> - the port should be kept as is and
> - the protocol should be unchanged
> -* keeping the port and changing the protocol might result in a connection failure
> -* unlike the default port mappings (80 (http) <> 443 (https)), non-default port mappings (8080 <> 8443) are risky and unlikely to work on every server setup.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)