You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Julien Nioche (Jira)" <ji...@apache.org> on 2023/11/06 13:13:00 UTC

[jira] [Created] (NUTCH-3025) urlfilter-fast to filter based on the length of the URL

Julien Nioche created NUTCH-3025:
------------------------------------

             Summary: urlfilter-fast to filter based on the length of the URL
                 Key: NUTCH-3025
                 URL: https://issues.apache.org/jira/browse/NUTCH-3025
             Project: Nutch
          Issue Type: Improvement
    Affects Versions: 1.19
            Reporter: Julien Nioche
             Fix For: 1.20


There currently is no filter implementation to remove URLs based on their length or the length of their path / query.
Doing so with the regex filter would be inefficient, instead we could implement it in _urlfilter-fast _



--
This message was sent by Atlassian Jira
(v8.20.10#820010)