You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-user@jakarta.apache.org by Jim K <ji...@gmail.com> on 2006/10/10 21:26:38 UTC

Bad character class exception on pattern that works for JDK's Pattern class

I'd like to use a email validation pattern that I came across on a Perl users
board.  It works just fine with Java's regex.Pattern class but fails to compile
when using Apache Regexp.  The RECompiler class returns a Bad character 
exception.  

The pattern I'm trying to use is:

^([0-9a-zA-Z]([-.\\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\\w]*[0-9a-zA-Z]\\.)+[a-zA-Z]
{2,9})$

If I strip out all the groups then RE will accept the pattern.  I know that RE
supports groups, but maybe not nested groups?



---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-user-help@jakarta.apache.org


Re: Bad character class exception on pattern that works for JDK's Pattern class

Posted by Jim K <ji...@gmail.com>.
Hi Reuben-

Thanks for the continued support.  With your help I've been able to complete the
remaning changes to the regexp to get it to work just like the ORO and Java
implementations:

^([0-9a-zA-Z]([\-\.\w]?[0-9a-zA-Z])*@([0-9a-zA-Z][\-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]
{2,9})$


At this point all my positive and negative unit test cases run identically
across all three libraries.

Jim


---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-user-help@jakarta.apache.org


RE: Bad character class exception on pattern that works for JDK's Pattern class

Posted by Reuben Sivan <li...@reubensivan.com>.
Jim,

Apparently the workaround I suggested the other day wasn't the best one, as
it allowed you to compile but then failed at match-time.

This new fix seems to work OK in both java.util.regex and org.apache.regexp:

^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][\-\w]*[0-9a-zA-Z]\.)+[a-zA-Z
]{2,9})$

My guess is that [-\w] and [\w-] are either not well defined or mishandled
by regexp; in either case escaping the hyphen takes care of (or avoids) the
problem.

Interestingly regexp did accept [-.\w] ...

 Reuben


-----Original Message-----
From: Jim K [mailto:jimkski@gmail.com] 
Sent: Thursday, October 12, 2006 6:12 PM
To: regexp-user@jakarta.apache.org
Subject: Re: Bad character class exception on pattern that works for JDK's
Pattern class

I ran through a bunch of unit tests to validate that the RE compiled pattern
operated the same as when compiled with either ORO's Perl5Compiler or Java's
Pattern.  I found that in one instance it did not:

username@host@host.com

I'd expect this pattern to fail the match and it does for ORO and Java.  But
it
passes when run through Apache regexp.  The pattern as amended is:

^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][\w-]*[0-9a-zA-Z]\.)+[a-zA-Z]
{2,9})$





---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-user-help@jakarta.apache.org


Re: Bad character class exception on pattern that works for JDK's Pattern class

Posted by Jim K <ji...@gmail.com>.
I ran through a bunch of unit tests to validate that the RE compiled pattern
operated the same as when compiled with either ORO's Perl5Compiler or Java's
Pattern.  I found that in one instance it did not:

username@host@host.com

I'd expect this pattern to fail the match and it does for ORO and Java.  But it
passes when run through Apache regexp.  The pattern as amended is:

^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][\w-]*[0-9a-zA-Z]\.)+[a-zA-Z]
{2,9})$


---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-user-help@jakarta.apache.org


Re: Bad character class exception on pattern that works for JDK's Pattern class

Posted by Jim K <ji...@gmail.com>.
Hi Reuben-

Thanks for the tip.  That definitely worked.


---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-user-help@jakarta.apache.org


RE: Bad character class exception on pattern that works for JDK's Pattern class

Posted by Reuben Sivan <li...@reubensivan.com>.
Jim,

It appears the following character class is rejected by RE:
 [-\\w]

I believe this is a bug in RE. You can get around the problem by reversing
the expression to:
 [\\w-]

 Reuben

-----Original Message-----
From: Jim K [mailto:jimkski@gmail.com] 
Sent: Tuesday, October 10, 2006 3:27 PM
To: regexp-user@jakarta.apache.org
Subject: Bad character class exception on pattern that works for JDK's
Pattern class

I'd like to use a email validation pattern that I came across on a Perl
users
board.  It works just fine with Java's regex.Pattern class but fails to
compile
when using Apache Regexp.  The RECompiler class returns a Bad character 
exception.  

The pattern I'm trying to use is:

^([0-9a-zA-Z]([-.\\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\\w]*[0-9a-zA-Z]\\.)+[a-zA
-Z]
{2,9})$

If I strip out all the groups then RE will accept the pattern.  I know that
RE
supports groups, but maybe not nested groups?



---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-user-help@jakarta.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-user-help@jakarta.apache.org