You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-dev@jakarta.apache.org by bu...@apache.org on 2001/06/21 23:29:55 UTC

[Bug 2121] - '.' or '-' in bracket expression gives unexpected results

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=2121

*** shadow/2121	Mon Jun 11 12:26:15 2001
--- shadow/2121.tmp.13736	Thu Jun 21 14:29:55 2001
***************
*** 4,10 ****
  |        Bug #: 2121                        Product: Regexp                  |
  |       Status: NEW                         Version: unspecified             |
  |   Resolution:                            Platform: Other                   |
! |     Severity: Normal                   OS/Version: Windows NT/2K           |
  |     Priority: Other                     Component: Other                   |
  +----------------------------------------------------------------------------+
  |  Assigned To: regexp-dev@jakarta.apache.org                                |
--- 4,10 ----
  |        Bug #: 2121                        Product: Regexp                  |
  |       Status: NEW                         Version: unspecified             |
  |   Resolution:                            Platform: Other                   |
! |     Severity: Normal                   OS/Version: All                     |
  |     Priority: Other                     Component: Other                   |
  +----------------------------------------------------------------------------+
  |  Assigned To: regexp-dev@jakarta.apache.org                                |
***************
*** 69,71 ****
--- 69,150 ----
  reTest( s, "([a-z0-9.\\-]+)" );
  reTest( s, "([a-z0-9\\.-]+)" );
  %>
+ 
+ ------- Additional Comments From edwin@bitstorm.nl  2001-06-21 14:29 -------
+ Here's an contribution to general@jakarta.apache.org,
+ subject "What are we doing in regards to JDK 1.4?".
+ 
+ It contains untested fixes.
+ 
+ At 09:42 21-6-2001 -0700, Jon wrote:
+ Edwin,
+ 
+ on 6/21/01 7:16 AM, "Edwin Martin" <ed...@bitstorm.nl> wrote:
+ 
+ - > org.apache.regexp 1.2 is pretty much broken. It has some
+ - > major flaws since 1.0 and they are still not addressed.
+ - > 
+ - > See http://nagoya.betaversion.org/bugzilla/buglist.cgi?product=Regexp
+ - > for a list of bugs (BTW none of them is assigned).
+ - 
+ - Sending in bug reports doesn't get the problems fixed. This is a community
+ - of VOLUNTEERS. You can't just magically put in a bug report and then someone
+ - is going to jump up and fix it...you have to submit patches or try to nicely
+ - motivate people to fix it for you.
+ - 
+ - <http://jakarta.apache.org/site/understandingopensource.html>
+ -
+ - "With the opensource system, if you find any deficiency in the project, the
+ - onus is on you to redress that deficiency."
+ 
+ I thought submitting bug reports is also an important
+ way to support Open Source.
+ 
+ Well, I looked at the regexp-code and saw one of the bugs:
+ 
+ RECompiler.java, line 664:
+ 
+                    // Premature end of range. define up to Character.MAX_VALUE
+                     if ((idx + 1) < len && pattern.charAt(++idx) == ']')
+                     {
+                         simpleChar = Character.MAX_VALUE;
+                         break;
+                     }
+ 
+ The code makes any minus a range.
+ 
+ The RE "[a-]" becomes "the character a and anything after it".
+ 
+ A minus at the beginning or the end should be just a minus.
+ 
+ The code should be something like this:
+ 
+                     // Premature end of range. define up to Character.MAX_VALUE
+                     if ((idx + 1) < len && pattern.charAt(++idx) == ']')
+                     {
+                         definingRange = false;
+                         break;
+                     }
+ 
+ Futhermore, RECompiler.java, line 697:
+ 
+                 if ((idx + 1) >= len || pattern.charAt(idx + 1) != '-')
+ 
+ Should become something like:
+ 
+                 if ((idx + 1) >= len || !(pattern.charAt(idx + 1) == '-' &&
+ !((idx + 2) <= len && pattern.charAt(idx + 2) == ']')))
+ 
+ Which means: Do not include a char when followed by a minus, but DO include the
+ char when the minus is followed by a ']'.
+ 
+ The code still does not address the possibility of a charclass which starts with a
+ minus, like "[-a]" or "[^-a]", but that shouldn't be too difficult to implement.
+ 
+ It isn't really that hard to fix these bugs, I just wonder if there's anybody
+ responsible for the regexp package.
+ 
+ And by the way, you don't have to shout.
+ 
+ Bye,
+ Edwin Martin.
\ No newline at end of file