You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-user@jakarta.apache.org by Edwin Martin <ed...@bitstorm.nl> on 2001/06/05 01:57:55 UTC

Regexp 1.2 weirdness

Hello,

I stumbled upon a problem with regexp 1.2 which I can't match
with any regex-documentation, either old or new.

In short: "[a-z0-9-]" doesn't match alphanumerics and '-'.

Here's an JSP test page I made:

---------------- begin retest.jsp ----------------

<%@ page import="org.apache.regexp.*" %>

<h2>RE test</h2>

<%

String s = "&lt;john.doe-001.002@my.com&gt;";

out.print(s);

out.print("<p>1<br>");
RE emailRE1 = new RE("([a-z0-9]+)@");
if ( emailRE1.match( s ) )
         out.print( emailRE1.getParen(1) );

out.print("<p>2<br>");
RE emailRE2 = new RE("([a-z0-9.]+)@");
if ( emailRE2.match( s ) )
         out.print( emailRE2.getParen(1) );

out.print("<p>3<br>");
RE emailRE3 = new RE("([a-z0-9.-]+)@");
if ( emailRE3.match( s ) )
         out.print( emailRE3.getParen(1) );

out.print("<p>4<br>");
RE emailRE4 = new RE("([a-z0-9-]+)@");
if ( emailRE4.match( s ) )
         out.print( emailRE4.getParen(1) );

s = "&lt;john.doe.001-002@my.com&gt;";

out.print("<hr>");
out.print(s);

out.print("<p>5<br>");
RE emailRE5 = new RE("([a-z0-9-]+)@");
if ( emailRE5.match( s ) )
         out.print( emailRE5.getParen(1) );

out.print("<p>6<br>");
RE emailRE6 = new RE("([a-z0-9.-]+)@");
if ( emailRE6.match( s ) )
         out.print( emailRE6.getParen(1) );

out.print("<p>7<br>");
RE emailRE7 = new RE("([a-z0-9.]+)@");
if ( emailRE7.match( s ) )
         out.print( emailRE7.getParen(1) );
%>

---------------- end retest.jsp ----------------

This is the output:

---------------- begin output  ----------------
RE test
<jo...@my.com>
1
002
2
001.002
3
001.002
4
<john.doe-001.002
-------------------------------------------------
<jo...@my.com>
5
<john.doe.001-002
6
002
7
002
---------------- end output ----------------

Points 1 and 2 are as expected.

Point 3 should match "john.doe-001.002"

Point 4 (removing the dot) matches all!

Point 5, 6 and 7 are added to see what happens when
the dot and minus are swapped. The same strange
behavior :-(

Do I overlook something?

Bye,
Edwin Martin.