You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Sebastian Nagel (Jira)" <ji...@apache.org> on 2019/09/23 08:48:00 UTC

[jira] [Updated] (NUTCH-2705) urlfilter-validator rejects IPv6 URLs

     [ https://issues.apache.org/jira/browse/NUTCH-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel updated NUTCH-2705:
-----------------------------------
    Fix Version/s:     (was: 1.16)
                   1.17

> urlfilter-validator rejects IPv6 URLs
> -------------------------------------
>
>                 Key: NUTCH-2705
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2705
>             Project: Nutch
>          Issue Type: Bug
>          Components: plugin
>    Affects Versions: 1.15
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.17
>
>
> The plugin urlfilter-validator rejects URLs with an IPv6 address as hostname/authority (given according to [RFC 2732|https://tools.ietf.org/html/rfc2732]:
> {noformat}
> % echo "http://[2010:836B:4179::836B:4179]/" \
>     | bin/nutch filterchecker -filterName urlfilter-validator -stdin
> Checking combination of these URLFilters: UrlValidator 
> -http://[2010:836B:4179::836B:4179]/
> {noformat}
> We should also consider to use the class [UrlValidator|https://commons.apache.org/proper/commons-validator/apidocs/org/apache/commons/validator/routines/UrlValidator.html] from commons-validator directly instead of a modified copy. This would help to get updates and improvements with little effort - IPv6 is already supported, see the [class implementation|https://commons.apache.org/proper/commons-validator/apidocs/src-html/org/apache/commons/validator/routines/UrlValidator.html#line.380].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)