You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-dev@jakarta.apache.org by bu...@apache.org on 2001/06/21 23:29:55 UTC
[Bug 2121] - '.' or '-' in bracket expression gives unexpected results
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=2121
*** shadow/2121 Mon Jun 11 12:26:15 2001
--- shadow/2121.tmp.13736 Thu Jun 21 14:29:55 2001
***************
*** 4,10 ****
| Bug #: 2121 Product: Regexp |
| Status: NEW Version: unspecified |
| Resolution: Platform: Other |
! | Severity: Normal OS/Version: Windows NT/2K |
| Priority: Other Component: Other |
+----------------------------------------------------------------------------+
| Assigned To: regexp-dev@jakarta.apache.org |
--- 4,10 ----
| Bug #: 2121 Product: Regexp |
| Status: NEW Version: unspecified |
| Resolution: Platform: Other |
! | Severity: Normal OS/Version: All |
| Priority: Other Component: Other |
+----------------------------------------------------------------------------+
| Assigned To: regexp-dev@jakarta.apache.org |
***************
*** 69,71 ****
--- 69,150 ----
reTest( s, "([a-z0-9.\\-]+)" );
reTest( s, "([a-z0-9\\.-]+)" );
%>
+
+ ------- Additional Comments From edwin@bitstorm.nl 2001-06-21 14:29 -------
+ Here's an contribution to general@jakarta.apache.org,
+ subject "What are we doing in regards to JDK 1.4?".
+
+ It contains untested fixes.
+
+ At 09:42 21-6-2001 -0700, Jon wrote:
+ Edwin,
+
+ on 6/21/01 7:16 AM, "Edwin Martin" <ed...@bitstorm.nl> wrote:
+
+ - > org.apache.regexp 1.2 is pretty much broken. It has some
+ - > major flaws since 1.0 and they are still not addressed.
+ - >
+ - > See http://nagoya.betaversion.org/bugzilla/buglist.cgi?product=Regexp
+ - > for a list of bugs (BTW none of them is assigned).
+ -
+ - Sending in bug reports doesn't get the problems fixed. This is a community
+ - of VOLUNTEERS. You can't just magically put in a bug report and then someone
+ - is going to jump up and fix it...you have to submit patches or try to nicely
+ - motivate people to fix it for you.
+ -
+ - <http://jakarta.apache.org/site/understandingopensource.html>
+ -
+ - "With the opensource system, if you find any deficiency in the project, the
+ - onus is on you to redress that deficiency."
+
+ I thought submitting bug reports is also an important
+ way to support Open Source.
+
+ Well, I looked at the regexp-code and saw one of the bugs:
+
+ RECompiler.java, line 664:
+
+ // Premature end of range. define up to Character.MAX_VALUE
+ if ((idx + 1) < len && pattern.charAt(++idx) == ']')
+ {
+ simpleChar = Character.MAX_VALUE;
+ break;
+ }
+
+ The code makes any minus a range.
+
+ The RE "[a-]" becomes "the character a and anything after it".
+
+ A minus at the beginning or the end should be just a minus.
+
+ The code should be something like this:
+
+ // Premature end of range. define up to Character.MAX_VALUE
+ if ((idx + 1) < len && pattern.charAt(++idx) == ']')
+ {
+ definingRange = false;
+ break;
+ }
+
+ Futhermore, RECompiler.java, line 697:
+
+ if ((idx + 1) >= len || pattern.charAt(idx + 1) != '-')
+
+ Should become something like:
+
+ if ((idx + 1) >= len || !(pattern.charAt(idx + 1) == '-' &&
+ !((idx + 2) <= len && pattern.charAt(idx + 2) == ']')))
+
+ Which means: Do not include a char when followed by a minus, but DO include the
+ char when the minus is followed by a ']'.
+
+ The code still does not address the possibility of a charclass which starts with a
+ minus, like "[-a]" or "[^-a]", but that shouldn't be too difficult to implement.
+
+ It isn't really that hard to fix these bugs, I just wonder if there's anybody
+ responsible for the regexp package.
+
+ And by the way, you don't have to shout.
+
+ Bye,
+ Edwin Martin.
\ No newline at end of file