You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/01/06 20:38:15 UTC

[jira] Commented: (NUTCH-160) Use standard Java Regex library rather than org.apache.oro.text.regex

    [ http://issues.apache.org/jira/browse/NUTCH-160?page=comments#action_12361999 ] 

Doug Cutting commented on NUTCH-160:
------------------------------------

+1

I like this patch.  I don't see a need for us to use oro anywhere, since Java now has good builtin regex support.  And Java's regex's are faster in many cases, not just this:

http://tbray.org/ongoing/When/200x/2004/08/22/PJre

There are a few places in which Java's regex's are incompatible with Perl 5 regex's, documented in the "Comparison to Perl 5" section of:

http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html

So this change is not completely back-compatible.

Any objections?

> Use standard Java Regex library rather than org.apache.oro.text.regex
> ---------------------------------------------------------------------
>
>          Key: NUTCH-160
>          URL: http://issues.apache.org/jira/browse/NUTCH-160
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Rod Taylor
>  Attachments: regex.patch
>
> org.apache.oro.text.regex is based on perl 5.003 which has some corner cases which perform poorly. The standard regular expression libraries for Java (1.4 and later) do not seen to contain these issues.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira