You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@harmony.apache.org by "Tim Ellison (JIRA)" <ji...@apache.org> on 2006/06/08 12:07:31 UTC

[jira] Created: (HARMONY-580) [classlib][regex][perf] Lookaround matching is slow

[classlib][regex][perf] Lookaround matching is slow
---------------------------------------------------

         Key: HARMONY-580
         URL: http://issues.apache.org/jira/browse/HARMONY-580
     Project: Harmony
        Type: Bug

  Components: Classlib  
 Environment: WinXP, 2GHz Pentium M
    Reporter: Tim Ellison
 Attachments: MatchingStuff.java

Running tests on the Harmony impl of regex and comparing it with another impl shows that Harmony is much slower at lookaround matching.  Here is the output from the testcase:

-- on Sun jdk1.5.0_06 --

Search space length = 28522
Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
DONE


-- on Harmony r412711 --

Search space length = 28522
Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
Took 50 Match not found
Took 20 Match not found
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
Took 8753 Match not found
Took 8502 Match not found
Took 8463 Match not found
Took 8703 Match not found
Took 8492 Match not found
DONE


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HARMONY-580) [classlib][regex][perf] Lookaround matching is slow

Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HARMONY-580?page=all ]

Tim Ellison updated HARMONY-580:
--------------------------------

    Attachment: MatchingStuff.java

> [classlib][regex][perf] Lookaround matching is slow
> ---------------------------------------------------
>
>          Key: HARMONY-580
>          URL: http://issues.apache.org/jira/browse/HARMONY-580
>      Project: Harmony
>         Type: Bug

>   Components: Classlib
>  Environment: WinXP, 2GHz Pentium M
>     Reporter: Tim Ellison
>  Attachments: MatchingStuff.java
>
> Running tests on the Harmony impl of regex and comparing it with another impl shows that Harmony is much slower at lookaround matching.  Here is the output from the testcase:
> -- on Sun jdk1.5.0_06 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> DONE
> -- on Harmony r412711 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 50 Match not found
> Took 20 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 8753 Match not found
> Took 8502 Match not found
> Took 8463 Match not found
> Took 8703 Match not found
> Took 8492 Match not found
> DONE

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HARMONY-580) [classlib][regex][perf] Lookaround matching is slow

Posted by "Nikolay Kuznetsov (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HARMONY-580?page=all ]

Nikolay Kuznetsov updated HARMONY-580:
--------------------------------------

    Attachment: perf.patch

Attached patch is a partial (since we are not faster in this test) relieve to the problem mentioned. The problem here is behaviour of lookbehind construct, wich tries to find specified token startting from every symbol(during find) and till the begining of the string prior to actual match.

In the suggested patch I've put look behind check after actual match. I believe that in most of the cases this is fastest order of checks. 

Tim: do you have other perfomance data about harmony regex, if yes, could you please share tests. As an author of this implementation I personally interested in perfomance improvement :). 

My current results on 1.7 GHz Pentium M are:

-- on jrockit-jdk1.5.0 --
Search space length = 28522
Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
Took 20 Match not found
Took 10 Match not found
Took 10 Match not found
Took 20 Match not found
Took 10 Match not found
Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
Took 10 Match not found
DONE

-- on Harmony (working copy) --

Search space length = 28522
Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
Took 30 Match not found
Took 20 Match not found
Took 20 Match not found
Took 20 Match not found
Took 20 Match not found
Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
Took 30 Match not found
Took 30 Match not found
Took 30 Match not found
Took 30 Match not found
Took 30 Match not found
DONE  

> [classlib][regex][perf] Lookaround matching is slow
> ---------------------------------------------------
>
>          Key: HARMONY-580
>          URL: http://issues.apache.org/jira/browse/HARMONY-580
>      Project: Harmony
>         Type: Bug

>   Components: Classlib
>  Environment: WinXP, 2GHz Pentium M
>     Reporter: Tim Ellison
>  Attachments: MatchingStuff.java, perf.patch
>
> Running tests on the Harmony impl of regex and comparing it with another impl shows that Harmony is much slower at lookaround matching.  Here is the output from the testcase:
> -- on Sun jdk1.5.0_06 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> DONE
> -- on Harmony r412711 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 50 Match not found
> Took 20 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 8753 Match not found
> Took 8502 Match not found
> Took 8463 Match not found
> Took 8703 Match not found
> Took 8492 Match not found
> DONE

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Assigned: (HARMONY-580) [classlib][regex][perf] Lookaround matching is slow

Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HARMONY-580?page=all ]

Tim Ellison reassigned HARMONY-580:
-----------------------------------

    Assign To: Tim Ellison

> [classlib][regex][perf] Lookaround matching is slow
> ---------------------------------------------------
>
>          Key: HARMONY-580
>          URL: http://issues.apache.org/jira/browse/HARMONY-580
>      Project: Harmony
>         Type: Bug

>   Components: Classlib
>  Environment: WinXP, 2GHz Pentium M
>     Reporter: Tim Ellison
>     Assignee: Tim Ellison
>  Attachments: MatchingStuff.java, perf.patch
>
> Running tests on the Harmony impl of regex and comparing it with another impl shows that Harmony is much slower at lookaround matching.  Here is the output from the testcase:
> -- on Sun jdk1.5.0_06 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> DONE
> -- on Harmony r412711 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 50 Match not found
> Took 20 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 8753 Match not found
> Took 8502 Match not found
> Took 8463 Match not found
> Took 8703 Match not found
> Took 8492 Match not found
> DONE

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HARMONY-580) [classlib][regex][perf] Lookaround matching is slow

Posted by "Nikolay Kuznetsov (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HARMONY-580?page=comments#action_12417445 ] 

Nikolay Kuznetsov commented on HARMONY-580:
-------------------------------------------

Thanks Tim, the patch was applied as expected.

> [classlib][regex][perf] Lookaround matching is slow
> ---------------------------------------------------
>
>          Key: HARMONY-580
>          URL: http://issues.apache.org/jira/browse/HARMONY-580
>      Project: Harmony
>         Type: Bug

>   Components: Classlib
>  Environment: WinXP, 2GHz Pentium M
>     Reporter: Tim Ellison
>     Assignee: Tim Ellison
>  Attachments: MatchingStuff.java, perf.patch
>
> Running tests on the Harmony impl of regex and comparing it with another impl shows that Harmony is much slower at lookaround matching.  Here is the output from the testcase:
> -- on Sun jdk1.5.0_06 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> DONE
> -- on Harmony r412711 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 50 Match not found
> Took 20 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 8753 Match not found
> Took 8502 Match not found
> Took 8463 Match not found
> Took 8703 Match not found
> Took 8492 Match not found
> DONE

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Closed: (HARMONY-580) [classlib][regex][perf] Lookaround matching is slow

Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HARMONY-580?page=all ]
     
Tim Ellison closed HARMONY-580:
-------------------------------


I verify that the performance problem has been resolved.


> [classlib][regex][perf] Lookaround matching is slow
> ---------------------------------------------------
>
>          Key: HARMONY-580
>          URL: http://issues.apache.org/jira/browse/HARMONY-580
>      Project: Harmony
>         Type: Bug

>   Components: Classlib
>  Environment: WinXP, 2GHz Pentium M
>     Reporter: Tim Ellison
>     Assignee: Tim Ellison
>  Attachments: MatchingStuff.java, perf.patch
>
> Running tests on the Harmony impl of regex and comparing it with another impl shows that Harmony is much slower at lookaround matching.  Here is the output from the testcase:
> -- on Sun jdk1.5.0_06 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> DONE
> -- on Harmony r412711 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 50 Match not found
> Took 20 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 8753 Match not found
> Took 8502 Match not found
> Took 8463 Match not found
> Took 8703 Match not found
> Took 8492 Match not found
> DONE

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (HARMONY-580) [classlib][regex][perf] Lookaround matching is slow

Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HARMONY-580?page=all ]
     
Tim Ellison resolved HARMONY-580:
---------------------------------

    Resolution: Fixed

That's a remarkable improvement, thanks Nikolay!

Patch applied to REGEX module at repo revision r416455.

Please check that the patch was applied as you expected.


> [classlib][regex][perf] Lookaround matching is slow
> ---------------------------------------------------
>
>          Key: HARMONY-580
>          URL: http://issues.apache.org/jira/browse/HARMONY-580
>      Project: Harmony
>         Type: Bug

>   Components: Classlib
>  Environment: WinXP, 2GHz Pentium M
>     Reporter: Tim Ellison
>     Assignee: Tim Ellison
>  Attachments: MatchingStuff.java, perf.patch
>
> Running tests on the Harmony impl of regex and comparing it with another impl shows that Harmony is much slower at lookaround matching.  Here is the output from the testcase:
> -- on Sun jdk1.5.0_06 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> DONE
> -- on Harmony r412711 --
> Search space length = 28522
> Pattern = (http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 50 Match not found
> Took 20 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Took 10 Match not found
> Look pattern = ((?=^)|(?<=\s))(http://|ftp://|news://|https://|callto://|gopher://|mailto:|im:|www.)([\d\w;/\?:@=&$\-+!*'~#%\{\}\|]|[,.()"][^ .!?\s$])*
> Took 8753 Match not found
> Took 8502 Match not found
> Took 8463 Match not found
> Took 8703 Match not found
> Took 8492 Match not found
> DONE

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira