You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/04/06 14:53:00 UTC

[jira] [Commented] (NUTCH-2858) urlnormalizer-protocol: URL port is lost during normalization

    [ https://issues.apache.org/jira/browse/NUTCH-2858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315631#comment-17315631 ] 

ASF GitHub Bot commented on NUTCH-2858:
---------------------------------------

sebastian-nagel merged pull request #575:
URL: https://github.com/apache/nutch/pull/575


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> urlnormalizer-protocol: URL port is lost during normalization
> -------------------------------------------------------------
>
>                 Key: NUTCH-2858
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2858
>             Project: Nutch
>          Issue Type: Bug
>          Components: plugin, urlnormalizer
>    Affects Versions: 1.18
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.19
>
>
> If a URL includes a port, e.g. {{http://example.com:8080/}} or {{https://example.com:8443/}}, the port is removed when normalizing using the protocol-urlnormalizer.
> Instead, if the port is set,
> - the port should be kept as is and
> - the protocol should be unchanged
>    -* keeping the port and changing the protocol might result in a connection failure
>    -* unlike the default port mappings (80 (http) <> 443 (https)), non-default port mappings (8080 <> 8443) are risky and unlikely to work on every server setup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)