You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by GitBox <gi...@apache.org> on 2021/03/27 11:20:54 UTC

[GitHub] [nutch] sebastian-nagel opened a new pull request #575: NUTCH-2858 urlnormalizer-protocol: URL port is lost during normalization

sebastian-nagel opened a new pull request #575:
URL: https://github.com/apache/nutch/pull/575


   - if URL includes a port the protocol is not normalized
   
   Note that
   - urlnormalizer-basic removes default ports:  `https://example.com:443/` is normalized to `https://example.com/` - by chaining normalizers there is no need to handle default ports in urlnormalizer-protocol
   - non-default ports can always be mapped by urlnormalizer-regex, there shouldn't be many, so the price of more complex rules and slower execution is acceptable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [nutch] sebastian-nagel merged pull request #575: NUTCH-2858 urlnormalizer-protocol: URL port is lost during normalization

Posted by GitBox <gi...@apache.org>.
sebastian-nagel merged pull request #575:
URL: https://github.com/apache/nutch/pull/575


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org