You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-dev@jakarta.apache.org by bu...@apache.org on 2001/08/26 08:38:56 UTC

[DO NOT REPLY: Bug 3273] New: CharacterArrayCharacterIterator substring function returns incorrect results

PLEASE DO NOT REPLY TO THIS MESSAGE. TO FURTHER COMMENT
ON THE STATUS OF THIS BUG PLEASE FOLLOW THE LINK BELOW
AND USE THE ON-LINE APPLICATION. REPLYING TO THIS MESSAGE
DOES NOT UPDATE THE DATABASE, AND SO YOUR COMMENT WILL
BE LOST SOMEWHERE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3273

*** shadow/3273	Sat Aug 25 23:38:56 2001
--- shadow/3273.tmp.12103	Sat Aug 25 23:38:56 2001
***************
*** 0 ****
--- 1,57 ----
+ +============================================================================+
+ | CharacterArrayCharacterIterator substring function returns incorrect resul |
+ +----------------------------------------------------------------------------+
+ |        Bug #: 3273                        Product: Regexp                  |
+ |       Status: NEW                         Version: unspecified             |
+ |   Resolution:                            Platform: All                     |
+ |     Severity: Normal                   OS/Version: All                     |
+ |     Priority: Other                     Component: Other                   |
+ +----------------------------------------------------------------------------+
+ |  Assigned To: regexp-dev@jakarta.apache.org                                |
+ |  Reported By: tony_robertson@yahoo.com                                     |
+ |      CC list: Cc:                                                          |
+ +----------------------------------------------------------------------------+
+ |          URL: .../api/org/apache/regexp/CharacterArrayCharacterIterator.ht |
+ +============================================================================+
+ |                              DESCRIPTION                                   |
+ Using the RE.match(CharacterIterator,int) function
+ with a "CharacterArrayCharacterIterator", then calling "getParen(int)"
+ often returns a string of the incorrect length, or throws an exception.
+ 
+ This is due to the implementation of "substring(int,int)" in the
+ CharacterArrayCharacterIterator class and/or the mis-documentation of
+ the CharacterIterator.substring interface.
+ 
+ The confusion is in whether the second argument to substring represents
+ the endIndex or the length. The API docs say it's the length, but the
+ RE implementation, and the StringCharacterIterator implementation both
+ treat it as the endIndex.
+ [Note, the standard java string has,
+ java.lang.String.substring(int beginIndex, int endIndex)
+ but the constructor is java.lang.String(char[] src, int off, int len)]
+ 
+ Secondly, there is no check that the requested substring stays within the
+ bounds of the sequence length specified at construction time.
+ An IndexOutOfBoundsException should be thrown in that case.
+ 
+ I think the best solution is to first update the API docs to specify
+ that it is infact (beginIndex, endIndex), and then to update the 
+ CharacterArrayCharacterIterator.substring functions to be something like this:
+ 
+  public String substring(int beginIndex, int endIndex)
+  {
+    if (endIndex > len)
+      throw new IndexOutOfBoundsException("endIndex=" + endIndex +
+ 	"; sequence size=" + len);
+    if (beginIndex < 0)
+      throw new IndexOutOfBoundsException("beginIndex=" + beginIndex);
+    return new String(src, off + beginIndex, endIndex - beginIndex);
+  }
+ 
+  public String substring(int beginIndex)
+  {
+    if (beginIndex > len)
+      throw new IndexOutOfBoundsException("index=" + beginIndex +
+ 	"; sequence size=" + len);
+    return new String(src, off + beginIndex, len - beginIndex);
+  }