You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-dev@jakarta.apache.org by bu...@apache.org on 2003/08/28 22:08:16 UTC

DO NOT REPLY [Bug 22804] New: - java.lang.ArrayIndexOutOfBoundsException on negated classes

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22804>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=22804

java.lang.ArrayIndexOutOfBoundsException on negated classes

           Summary: java.lang.ArrayIndexOutOfBoundsException on negated
                    classes
           Product: Regexp
           Version: unspecified
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: Major
          Priority: Other
         Component: Other
        AssignedTo: regexp-dev@jakarta.apache.org
        ReportedBy: fernando@mecon.gov.ar


I use this code as a "sanitizer" (ie, filters bad input from users) on JDK 1.3.1:

String allowed= "a-zA-Z0-9_@.: ñÑáéíóúÁÉÍÓÚ\r\n\\-";
RE r= new RE("[^"+allowed+"]");
output= r.subst(input, "_", RE.REPLACE_ALL);

When running:

sanitize("aé$.JOla^|-+_")

I get:

java.lang.ArrayIndexOutOfBoundsException: 16
        at org.apache.regexp.RECompiler$RERange.delete(Unknown Source)
        at org.apache.regexp.RECompiler$RERange.remove(Unknown Source)
        at org.apache.regexp.RECompiler$RERange.include(Unknown Source)
        at org.apache.regexp.RECompiler$RERange.include(Unknown Source)
        at org.apache.regexp.RECompiler.characterClass(Unknown Source)
        at org.apache.regexp.RECompiler.terminal(Unknown Source)
        at org.apache.regexp.RECompiler.closure(Unknown Source)
        at org.apache.regexp.RECompiler.branch(Unknown Source)
        at org.apache.regexp.RECompiler.expr(Unknown Source)
        at org.apache.regexp.RECompiler.compile(Unknown Source)
        at org.apache.regexp.RE.<init>(Unknown Source)
        at org.apache.regexp.RE.<init>(Unknown Source)

This is with both 1.2 and 1.3-dev (CVS) as of 28/Aug/2003.

Everything works if I use:

String allowed= "a-zA-Z0-9_@.: ñÑáéíóúÁÉÍÓÚ\r\\-"; (removed \n)

The same happen with other characters inside de [^].

Is there's any other info needed, please let me know.