You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-dev@jakarta.apache.org by bu...@apache.org on 2001/09/19 23:26:28 UTC
DO NOT REPLY [Bug 3730] New: -
Perl5Matcher sometimes confuses the begin/end offsets on similar sub patterns in a regular expression
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3730>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3730
Perl5Matcher sometimes confuses the begin/end offsets on similar sub patterns in a regular expression
Summary: Perl5Matcher sometimes confuses the begin/end offsets on
similar sub patterns in a regular expression
Product: ORO
Version: 2.0
Platform: Other
OS/Version: Other
Status: NEW
Severity: Normal
Priority: Other
Component: Main
AssignedTo: oro-dev@jakarta.apache.org
ReportedBy: jamesv@screamingmedia.com
here is the test program:
import com.oroinc.text.regex.*;
import java.io.*;
public class bug_report
{
public static void main(String[] args) throws Exception
{
String regex = "\010[(]GAME +GID:([^;]+); +GDATE:([^;]*); +GSTART:([^;]
*); +GSITE:([^;]*); +GNEUTRAL:([^;]*); +GSTAT:([^;]*); +GPERIOD:([^;]*);[^\r\n]*
[\r\n]+"
+"("
+"(\010[(]TEAM +TNAME:([^;]*);( +[^:]+:[^;]*;){3}
+THOME: *([Yy][Ee][Ss]); +TSCORE:([^;]*); +TSTAT:([^;]*)[^\r\n]*[\r\n]+)"
+"|"
+"(\010[(]TEAM +TNAME:([^;]*);( +[^:]+:[^;]*;){3}
+THOME: *([Nn][Oo]); +TSCORE:([^;]*); +TSTAT:([^;]*)[^\r\n]*[\r\n]+)"
+"){2}";
String input = "(GAME GID:13805; GDATE:11/01/2000; GSTART:19:30;
GSITE:Charlotte Coliseum; GNEUTRAL:NO; GSTAT:Final; GPERIOD:4; \n"
+"(TEAM TNAME:Hornets; TLOCALE:Charlotte;
TCONF:Eastern; TDIV:Central; THOME:YES; TSCORE:77; TSTAT:LOST; TID:9;)\n"
+"(TEAM TNAME:Wizards; TLOCALE:Washington;
TCONF:Eastern; TDIV:Atlantic; THOME:NO; TSCORE:95; TSTAT:WON; TID:7;))\n";
String input2 = "(GAME GID:13789; GDATE:10/31/2000; GSTART:19:30;
GSITE:TD Waterhouse Centre; GNEUTRAL:NO; GSTAT:Final; GPERIOD:4; \n"
+"(TEAM TNAME:Magic; TLOCALE:Orlando; TCONF:Eastern;
TDIV:Atlantic; THOME:YES; TSCORE:97; TSTAT:WON; TID:5;)\n"
+"(TEAM TNAME:Wizards; TLOCALE:Washington;
TCONF:Eastern; TDIV:Atlantic; THOME:NO; TSCORE:86; TSTAT:LOST; TID:7;))\n";
Perl5Compiler p5compiler = new Perl5Compiler();
Perl5Pattern p5pattern = null;
Perl5Matcher p5matcher = new Perl5Matcher();
PatternMatcherInput p5input = new PatternMatcherInput(input2);
try {
p5pattern = (Perl5Pattern) p5compiler.compile(regex,
Perl5Compiler.SINGLELINE_MASK |
Perl5Compiler.READ_ONLY_MASK );
} catch(MalformedPatternException e) {
System.out.println("Error: Bad Perl5 pattern.");
System.out.println(e.getMessage());
}
boolean result = p5matcher.matchesPrefix(p5input, p5pattern);
if( result )
{
MatchResult mr = p5matcher.getMatch();
int groups = mr.groups();
int start = -1;
int end = -1;
String matchStr = null;
for( int x = 0; x < groups; x++ )
{
start = mr.beginOffset(x);
end = mr.endOffset(x);
//matchStr = mr.group(x);
//System.out.print
("Pos: "+x+"\tStart: "+start+"\tEnd: "+end+"\tMatch: "+matchStr);
System.out.print("Pos: "+x+"\tStart: "+start+"\tEnd: "+end);
if( start > end )
System.out.println( " -- ERROR" );
else
System.out.println();
}
}
else
{
System.out.println("No Match");
}
System.out.println("Program terminating");
}
}
and here is some output:
Pos: 0 Start: 0 End: 338
Pos: 1 Start: 11 End: 16
Pos: 2 Start: 24 End: 34
Pos: 3 Start: 43 End: 48
Pos: 4 Start: 56 End: 76
Pos: 5 Start: 87 End: 89
Pos: 6 Start: 97 End: 102
Pos: 7 Start: 112 End: 113
Pos: 8 Start: 224 End: 338
Pos: 9 Start: 224 End: 224
Pos: 10 Start: 237 End: 237
Pos: 11 Start: 280 End: 295
Pos: 12 Start: 302 End: 192 -- ERROR
Pos: 13 Start: 201 End: 203
Pos: 14 Start: 211 End: 214
Pos: 15 Start: 224 End: 338
Pos: 16 Start: 237 End: 244
Pos: 17 Start: 280 End: 295
Pos: 18 Start: 302 End: 304
Pos: 19 Start: 313 End: 315
Pos: 20 Start: 323 End: 327
Program terminating
if you'll notice, Pos 12 and Pos 18 share the same Start value. In the regex
they have the same pattern. Granted, there are many similar sub patterns as a
matter of fact lines 2 and 3 of the pattern are almost exatly the same except
for [Yy][Ee][Ss] and [Nn][Oo]...