You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Rod Taylor (JIRA)" <ji...@apache.org> on 2005/12/31 21:09:01 UTC

[jira] Commented: (NUTCH-160) Use standard Java Regex library rather than org.apache.oro.text.regex

    [ http://issues.apache.org/jira/browse/NUTCH-160?page=comments#action_12361472 ] 

Rod Taylor commented on NUTCH-160:
----------------------------------

This patch also appears to eliminate the issue reported on November 18th to the mailing list with the subject "Urlfilter bug (doesn't return on long URLs)" regarding abnormally long urls causing a timeout in the URLFilter.

> Use standard Java Regex library rather than org.apache.oro.text.regex
> ---------------------------------------------------------------------
>
>          Key: NUTCH-160
>          URL: http://issues.apache.org/jira/browse/NUTCH-160
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Rod Taylor
>  Attachments: regex.patch
>
> org.apache.oro.text.regex is based on perl 5.003 which has some corner cases which perform poorly. The standard regular expression libraries for Java (1.4 and later) do not seen to contain these issues.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira