You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-dev@jakarta.apache.org by bu...@apache.org on 2002/05/13 16:16:21 UTC

DO NOT REPLY [Bug 9035] New: - big Latitude Longitude RE causes IndexOutOfBoundsException

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9035>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9035

big Latitude Longitude RE causes IndexOutOfBoundsException

           Summary: big Latitude Longitude RE causes
                    IndexOutOfBoundsException
           Product: Regexp
           Version: unspecified
          Platform: All
        OS/Version: Linux
            Status: NEW
          Severity: Major
          Priority: Other
         Component: Other
        AssignedTo: regexp-dev@jakarta.apache.org
        ReportedBy: mnewcomb@tacintel.com


I have two faily big REs dealing with Latitude and Longitude.  When I use them 
separately, no problems.  However, when I combine the 2 REs, so I can pass one 
Latitude-Longitude string to it, it bombs out with an exception (detailed 
below).

Here is the test program.  Refer to the example run for usage:

import java.io.*;
import java.util.*;
import org.apache.regexp.*;

public class LatLonREBug
{
  private static final String LATITUDE_RE_STRING =
    
"-?(([0-8]?[0-9]((\\.[0-9]+)|((([0-5][0-9])|60)((([0-5][0-9])|60))?))?)|90)[nNsS]";
  private static final String LONGITUDE_RE_STRING =
    
"-?(((([0-9]?[0-9])|(1[0-7][0-9]))((\\.[0-9]+)|((([0-5][0-9])|60)(([0-5][0-9])|60)?))?)|180)[eEwW]";

  public static final String LATITUDE_LONGITUDE_RE_STRING =
    "^" + LATITUDE_RE_STRING + LONGITUDE_RE_STRING + "$";

  public static void main(String[] args)
    throws Throwable
  {
    RE latlonRE = new RE(LATITUDE_LONGITUDE_RE_STRING);
    System.out.println("LATITUDE_LONGITUDE_RE_STRING: " +
                       LATITUDE_LONGITUDE_RE_STRING);

    RE latRE = new RE("^" + LATITUDE_RE_STRING + "$");
    System.out.println("LATITUDE_RE_STRING: " + LATITUDE_RE_STRING);

    RE lonRE = new RE("^" + LONGITUDE_RE_STRING + "$");
    System.out.println("LONGITUDE_RE_STRING: " + LONGITUDE_RE_STRING);

    BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
    String line = br.readLine();
    while (line != null && !line.equals("quit") && !line.equals("exit"))
    {
      StringTokenizer st = new StringTokenizer(line);
      int tokens = st.countTokens();

      if (tokens > 1)
      {
        String command = st.nextToken();

        if (command.equalsIgnoreCase("lat"))
        {
          String lat = st.nextToken();
          latRE.match(lat);
          System.out.println(lat + " is a properly formatted latitude");
        }
        else if (command.equalsIgnoreCase("lon"))
        {
          String lon = st.nextToken();
          lonRE.match(lon);
          System.out.println(lon + " is a properly formatted longitude");
        }
        else if (command.equalsIgnoreCase("latlon"))
        {
          String latlon = st.nextToken();
          latlonRE.match(latlon);
          System.out.println(latlon + " is a properly formatted lat-lon");
        }
        else
        {
          System.out.println("unknown command: " + command);
        }
      }
      else
      {
        System.out.println("invalid line: " + line);
      }

      line = br.readLine();
    }
  }
}

Here is an example run of the test-case.  As you will see, when just doing 
latitude or longitude, the REs match as expected.  But, when I do a 'latlon' 
string, it pukes...

[mnewcomb@localhost sandbox]$ java -classpath 
/usr/local/regexp/jakarta-regexp-1.2.jar:. LatLonREBug
LATITUDE_LONGITUDE_RE_STRING: 
^-?(([0-8]?[0-9]((\.[0-9]+)|((([0-5][0-9])|60)((([0-5][0-9])|60))?))?)|90)[nNsS]-?(((([0-9]?[0-9])|(1[0-7][0-9]))((\.[0-9]+)|((([0-5][0-9])|60)(([0-5][0-9])|60)?))?)|180)[eEwW]$
LATITUDE_RE_STRING: 
-?(([0-8]?[0-9]((\.[0-9]+)|((([0-5][0-9])|60)((([0-5][0-9])|60))?))?)|90)[nNsS]
LONGITUDE_RE_STRING: 
-?(((([0-9]?[0-9])|(1[0-7][0-9]))((\.[0-9]+)|((([0-5][0-9])|60)(([0-5][0-9])|60)?))?)|180)[eEwW]
lat 55N
55N is a properly formatted latitude
lat 55.454N
55.454N is a properly formatted latitude
lat 5545N
5545N is a properly formatted latitude
lon 123E
123E is a properly formatted longitude
lon 5E
5E is a properly formatted longitude
lon 123.444E
123.444E is a properly formatted longitude
lon 1784532W
1784532W is a properly formatted longitude
latlon 55N44E
55N44E is a properly formatted lat-lon
latlon 55N44.33E
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
	at org.apache.regexp.RE.getParenEnd(RE.java:724)
	at org.apache.regexp.RE.matchNodes(RE.java:942)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:933)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchNodes(RE.java:910)
	at org.apache.regexp.RE.matchNodes(RE.java:1376)
	at org.apache.regexp.RE.matchAt(RE.java:1448)
	at org.apache.regexp.RE.match(RE.java:1498)
	at org.apache.regexp.RE.match(RE.java:1468)
	at org.apache.regexp.RE.match(RE.java:1561)
	at LatLonREBug.main(LatLonREBug.java:54)
[mnewcomb@localhost sandbox]$ 


Any help will be greatly appreciated.

Thanks,
Michael

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: - IndexOutOfBoundsException: clarification

Posted by Holger Stratmann <Ho...@cheerful.com>.
Actually, you can write much simpler RE's to reproduce this :-))

I had wanted to file a bugreport (along with a few others):

RegExp does not "support" more than 16 parenthesized sub-expressions.
As soon as you have more than 16 '(...)', you get ArrayIndexOOBExceptions :-(
(Actually, I had seen that while taking a look at the sources and then confirmed the problem by trying it ;)

That's why your two expressions work separately, but not combined.

I guess I'll write a fix for that, but considering i didn#t have time to file a bugreport...

A "workaround" in this case (just as a temporary help for Michael):
Your RE has two clearly defined parts... You can probably use one more general expression to find potential matches and then check two parts separately. Not nice...
Fixing the problem may actually be faster :-))
I had an estimate of 1-3 hours for fixing the code, but I'd need to find out something about the process [of submitting code] first and that would probably take longer...

Cheers,
           Holger


bugzilla@apache.org wrote:

> DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG
> RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
> <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9035>.
> ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND
> INSERTED IN THE BUG DATABASE.
>
> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9035
>
> big Latitude Longitude RE causes IndexOutOfBoundsException
>
>            Summary: big Latitude Longitude RE causes
>                     IndexOutOfBoundsException
>            Product: Regexp
>            Version: unspecified
>           Platform: All
>         OS/Version: Linux
>             Status: NEW
>           Severity: Major
>           Priority: Other
>          Component: Other
>         AssignedTo: regexp-dev@jakarta.apache.org
>         ReportedBy: mnewcomb@tacintel.com
>
> I have two faily big REs dealing with Latitude and Longitude.  When I use them
> separately, no problems.  However, when I combine the 2 REs, so I can pass one
> Latitude-Longitude string to it, it bombs out with an exception (detailed
> below).
>
> Here is the test program.  Refer to the example run for usage:
>
> import java.io.*;
> import java.util.*;
> import org.apache.regexp.*;
>
> public class LatLonREBug
> {
>   private static final String LATITUDE_RE_STRING =
>
> "-?(([0-8]?[0-9]((\\.[0-9]+)|((([0-5][0-9])|60)((([0-5][0-9])|60))?))?)|90)[nNsS]";
>   private static final String LONGITUDE_RE_STRING =
>
> "-?(((([0-9]?[0-9])|(1[0-7][0-9]))((\\.[0-9]+)|((([0-5][0-9])|60)(([0-5][0-9])|60)?))?)|180)[eEwW]";
>
>   public static final String LATITUDE_LONGITUDE_RE_STRING =
>     "^" + LATITUDE_RE_STRING + LONGITUDE_RE_STRING + "$";
>
>   public static void main(String[] args)
>     throws Throwable
>   {
>     RE latlonRE = new RE(LATITUDE_LONGITUDE_RE_STRING);
>     System.out.println("LATITUDE_LONGITUDE_RE_STRING: " +
>                        LATITUDE_LONGITUDE_RE_STRING);
>
>     RE latRE = new RE("^" + LATITUDE_RE_STRING + "$");
>     System.out.println("LATITUDE_RE_STRING: " + LATITUDE_RE_STRING);
>
>     RE lonRE = new RE("^" + LONGITUDE_RE_STRING + "$");
>     System.out.println("LONGITUDE_RE_STRING: " + LONGITUDE_RE_STRING);
>
>     BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
>     String line = br.readLine();
>     while (line != null && !line.equals("quit") && !line.equals("exit"))
>     {
>       StringTokenizer st = new StringTokenizer(line);
>       int tokens = st.countTokens();
>
>       if (tokens > 1)
>       {
>         String command = st.nextToken();
>
>         if (command.equalsIgnoreCase("lat"))
>         {
>           String lat = st.nextToken();
>           latRE.match(lat);
>           System.out.println(lat + " is a properly formatted latitude");
>         }
>         else if (command.equalsIgnoreCase("lon"))
>         {
>           String lon = st.nextToken();
>           lonRE.match(lon);
>           System.out.println(lon + " is a properly formatted longitude");
>         }
>         else if (command.equalsIgnoreCase("latlon"))
>         {
>           String latlon = st.nextToken();
>           latlonRE.match(latlon);
>           System.out.println(latlon + " is a properly formatted lat-lon");
>         }
>         else
>         {
>           System.out.println("unknown command: " + command);
>         }
>       }
>       else
>       {
>         System.out.println("invalid line: " + line);
>       }
>
>       line = br.readLine();
>     }
>   }
> }
>
> Here is an example run of the test-case.  As you will see, when just doing
> latitude or longitude, the REs match as expected.  But, when I do a 'latlon'
> string, it pukes...
>
> [mnewcomb@localhost sandbox]$ java -classpath
> /usr/local/regexp/jakarta-regexp-1.2.jar:. LatLonREBug
> LATITUDE_LONGITUDE_RE_STRING:
> ^-?(([0-8]?[0-9]((\.[0-9]+)|((([0-5][0-9])|60)((([0-5][0-9])|60))?))?)|90)[nNsS]-?(((([0-9]?[0-9])|(1[0-7][0-9]))((\.[0-9]+)|((([0-5][0-9])|60)(([0-5][0-9])|60)?))?)|180)[eEwW]$
> LATITUDE_RE_STRING:
> -?(([0-8]?[0-9]((\.[0-9]+)|((([0-5][0-9])|60)((([0-5][0-9])|60))?))?)|90)[nNsS]
> LONGITUDE_RE_STRING:
> -?(((([0-9]?[0-9])|(1[0-7][0-9]))((\.[0-9]+)|((([0-5][0-9])|60)(([0-5][0-9])|60)?))?)|180)[eEwW]
> lat 55N
> 55N is a properly formatted latitude
> lat 55.454N
> 55.454N is a properly formatted latitude
> lat 5545N
> 5545N is a properly formatted latitude
> lon 123E
> 123E is a properly formatted longitude
> lon 5E
> 5E is a properly formatted longitude
> lon 123.444E
> 123.444E is a properly formatted longitude
> lon 1784532W
> 1784532W is a properly formatted longitude
> latlon 55N44E
> 55N44E is a properly formatted lat-lon
> latlon 55N44.33E
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
>         at org.apache.regexp.RE.getParenEnd(RE.java:724)
>         at org.apache.regexp.RE.matchNodes(RE.java:942)
>         at org.apache.regexp.RE.matchNodes(RE.java:933)
>         at org.apache.regexp.RE.matchNodes(RE.java:1376)
>         at org.apache.regexp.RE.matchNodes(RE.java:1376)
>         at org.apache.regexp.RE.matchNodes(RE.java:910)
>         at org.apache.regexp.RE.matchNodes(RE.java:1376)
>         at org.apache.regexp.RE.matchNodes(RE.java:910)
>         at org.apache.regexp.RE.matchNodes(RE.java:1376)
>         at org.apache.regexp.RE.matchNodes(RE.java:933)
>         at org.apache.regexp.RE.matchNodes(RE.java:933)
>         at org.apache.regexp.RE.matchNodes(RE.java:1376)
>         at org.apache.regexp.RE.matchNodes(RE.java:910)
>         at org.apache.regexp.RE.matchNodes(RE.java:1376)
>         at org.apache.regexp.RE.matchNodes(RE.java:910)
>         at org.apache.regexp.RE.matchNodes(RE.java:910)
>         at org.apache.regexp.RE.matchNodes(RE.java:1376)
>         at org.apache.regexp.RE.matchNodes(RE.java:910)
>         at org.apache.regexp.RE.matchNodes(RE.java:1376)
>         at org.apache.regexp.RE.matchNodes(RE.java:933)
>         at org.apache.regexp.RE.matchNodes(RE.java:933)
>         at org.apache.regexp.RE.matchNodes(RE.java:1376)
>         at org.apache.regexp.RE.matchNodes(RE.java:1376)
>         at org.apache.regexp.RE.matchNodes(RE.java:910)
>         at org.apache.regexp.RE.matchNodes(RE.java:1376)
>         at org.apache.regexp.RE.matchNodes(RE.java:910)
>         at org.apache.regexp.RE.matchNodes(RE.java:1376)
>         at org.apache.regexp.RE.matchAt(RE.java:1448)
>         at org.apache.regexp.RE.match(RE.java:1498)
>         at org.apache.regexp.RE.match(RE.java:1468)
>         at org.apache.regexp.RE.match(RE.java:1561)
>         at LatLonREBug.main(LatLonREBug.java:54)
> [mnewcomb@localhost sandbox]$
>
> Any help will be greatly appreciated.
>
> Thanks,
> Michael
>
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>