You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-user@jakarta.apache.org by Keith Kyzivat <kk...@iconverse.com> on 2000/08/08 21:00:54 UTC

Array index out of bounds on RE creation...

Hello there...

Jakarta Regexp (v1.1) has worked quite well up till now...  I have a
complicated multi-part regular expression that matches numbers with comma
separated triplets, which works fine when testing it with Vim (after
appropriate syntactical changes):

	Jakarta regexp format:
(^\s*(\d{1,3}(,\d\d\d)*)??\.\d+\s*$|^\s*\d{1,3}(,\d\d\d)*(\.\d+)??\s*$)
	Vim format:
\(^\s*\(\d{1,3}\(,\d\d\d\)*\)\=\.\d+\s*$\|^\s*\d{1,3}\(,\d\d\d\)*\(\.\d+\)\=
\s*$\)

But, when I pass the expression to the constructor of a RE, it comes back
with an ArrayIndexOutOfBoundsException at index 65809 (as seen below).  I
traced it a bit, but couldn't quite follow it.  Attached is a test that
should show it. (it's extremely easy to reproduce)


65809
java.lang.ArrayIndexOutOfBoundsException: 65809
        at org.apache.regexp.RECompiler.setNextOfEnd(RECompiler.java:207)
        at org.apache.regexp.RECompiler.branch(RECompiler.java:1160)
        at org.apache.regexp.RECompiler.expr(RECompiler.java:1217)
        at org.apache.regexp.RECompiler.terminal(RECompiler.java:866)
        at org.apache.regexp.RECompiler.closure(RECompiler.java:942)
        at org.apache.regexp.RECompiler.branch(RECompiler.java:1151)
        at org.apache.regexp.RECompiler.expr(RECompiler.java:1203)
        at org.apache.regexp.RECompiler.compile(RECompiler.java:1281)
        at org.apache.regexp.RE.<init>(RE.java:490)
        at org.apache.regexp.RE.<init>(RE.java:475)
        at Foo.main(Foo.java:13)



 

RE: Array index out of bounds on RE creation...

Posted by Keith Kyzivat <kk...@iconverse.com>.
More info on the problem:

I put the expression into the REDemo tester app, and played with it for a
while.

I concluded that the ArrayIndexOutOfBounds exception happens with any atom
after a "??" (0 or 1 match) if that next atom is a complex atom (not an
individual character), or an individual character with multiple matching
(i.e. a* or a?? or a{3.4}, etc).

This is definitely incorrect.

Here's a simpler regular expression that shows the problem:
a??b*



>  -----Original Message-----
> From: 	Keith Kyzivat [mailto:kkyzivat@iconverse.com] 
> Sent:	Tuesday, August 08, 2000 3:01 PM
> To:	Apache Regexp-User
> Subject:	Array index out of bounds on RE creation...
> 
> Hello there...
> 
> Jakarta Regexp (v1.1) has worked quite well up till now...  I have a
> complicated multi-part regular expression that matches numbers with comma
> separated triplets, which works fine when testing it with Vim (after
> appropriate syntactical changes):
> 
> 	Jakarta regexp format:
> (^\s*(\d{1,3}(,\d\d\d)*)??\.\d+\s*$|^\s*\d{1,3}(,\d\d\d)*(\.\d+)??\s*$)
> 	Vim format:
> \(^\s*\(\d{1,3}\(,\d\d\d\)*\)\=\.\d+\s*$\|^\s*\d{1,3}\(,\d\d\d\)*\(\.\d+\)
> \=\s*$\)
> 
> But, when I pass the expression to the constructor of a RE, it comes back
> with an ArrayIndexOutOfBoundsException at index 65809 (as seen below).  I
> traced it a bit, but couldn't quite follow it.  Attached is a test that
> should show it. (it's extremely easy to reproduce)
> 
> 
> 65809
> java.lang.ArrayIndexOutOfBoundsException: 65809
>         at org.apache.regexp.RECompiler.setNextOfEnd(RECompiler.java:207)
>         at org.apache.regexp.RECompiler.branch(RECompiler.java:1160)
>         at org.apache.regexp.RECompiler.expr(RECompiler.java:1217)
>         at org.apache.regexp.RECompiler.terminal(RECompiler.java:866)
>         at org.apache.regexp.RECompiler.closure(RECompiler.java:942)
>         at org.apache.regexp.RECompiler.branch(RECompiler.java:1151)
>         at org.apache.regexp.RECompiler.expr(RECompiler.java:1203)
>         at org.apache.regexp.RECompiler.compile(RECompiler.java:1281)
>         at org.apache.regexp.RE.<init>(RE.java:490)
>         at org.apache.regexp.RE.<init>(RE.java:475)
>         at Foo.main(Foo.java:13)
> 
> 
> 
>  << File: Foo.java >>