You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-user@jakarta.apache.org by Jann VanOver <Ja...@loudeye.com> on 2001/06/29 01:26:11 UTC

RE: possible bug?

By default, perl expressions are all "greedy" -- they will match as much as
they can.

It looks like org.apache.regexp is not "greedy" -- it matches the first
thing that it can.  There is usually an option you can add to a regexp to
make it greedy -- I can't remember it off hand, though.

-----Original Message-----
From: Edward Q. Bridges [mailto:ebridges@argotec.de]
Sent: Thursday, May 31, 2001 4:48 AM
To: regexp-user@jakarta.apache.org
Subject: possible bug?


When i compile this regexp:
	([\w\d\-\/]+?)\s+([\w\d\.\-]+) 
using org.apache.regexp.recompile, and
use it to match this string:
	X11; I; Linux 2.0.32 i586

i get the following for the output representing the first
and second parenthesis:
OS: Linux 
Version: 2

When i do this in a perl script (see below) i
get the following (which is the desired result):
OS: Linux
Version: 2.0.32

Any ideas?  am i wrong in assuming that the regexp syntax
is compatible with perl's syntax?

thanks
--e--


below is the compiled regexp, and beneath that, how i use it.

/** 
 * Extracts the name and version number of the operating 
 * system when there's a space between them.
 *
 * <p><code> ([\w\d\-\/]+?)\s+([\w\d\.\-]+) </code></p>
 */
final static char[] extractOS_osversion_spaces =
{
    0x007c, 0x0000, 0x0052, 0x0028, 0x0001, 0x0003, 0x007c,
    0x0000, 0x0018, 0x003d, 0x0000, 0x0015, 0x005b, 0x0006,
    0x000f, 0x0061, 0x007a, 0x0041, 0x005a, 0x005f, 0x005f,
    0x0030, 0x0039, 0x002d, 0x002d, 0x002f, 0x002f, 0x0045,
    0x0000, 0x0000, 0x0029, 0x0001, 0x0003, 0x005c, 0x0073,
    0x0003, 0x007c, 0x0000, 0x0006, 0x0047, 0x0000, 0xfffa,
    0x007c, 0x0000, 0x0003, 0x004e, 0x0000, 0x0003, 0x0028,
    0x0002, 0x0003, 0x007c, 0x0000, 0x001c, 0x005b, 0x0005,
    0x000d, 0x0061, 0x007a, 0x0041, 0x005a, 0x005f, 0x005f,
    0x0030, 0x0039, 0x002d, 0x002d, 0x007c, 0x0000, 0x0006,
    0x0047, 0x0000, 0xfff0, 0x007c, 0x0000, 0x0003, 0x004e,
    0x0000, 0x0003, 0x0029, 0x0002, 0x0003, 0x0045, 0x0000,
    0x0000,
};

.
.
.
pattern = new RE( new REProgram(extractOS_osversion_spaces) );
if ( pattern.match(s) ) {
    os = pattern.getParen(1);
    os_version = pattern.getParen(2);
    System.out.println("OS: " + os + "\nVersion: " + os_version);
}
.
.
.


Here's the perl script:

$ua = 'X11; I; Linux 2.0.32 i586';
$ua =~ m#([\w\d\-\/]+?)\s+([\w\d\.\-]+)#;

print "OS: " . $1 . "\n";
print "Version: " . $2 . "\n";





--------------------------------------------
<argo_tec gmbh>
     ed.q.bridges
     tel. 089-368179.xx / fax 089-368179.79
     osterwaldstrasse 10 / 80805 muenchen
</argo_tec gmbh>
--------------------------------------------  


Re: possible bug?

Posted by "--==c.g.==--" <cg...@gmx.de>.
Hi !

I have never heard of an option to make a expression greedy. As you said,
it's the default.
If you want an expression that is not greedy, you can tell that by a
trailing question mark.

examples .*?


Christoph Gaffga


----- Original Message -----
From: "Jann VanOver" <Ja...@loudeye.com>
To: <re...@jakarta.apache.org>
Sent: Friday, June 29, 2001 1:26 AM
Subject: RE: possible bug?


> By default, perl expressions are all "greedy" -- they will match as much
as
> they can.
>
> It looks like org.apache.regexp is not "greedy" -- it matches the first
> thing that it can.  There is usually an option you can add to a regexp to
> make it greedy -- I can't remember it off hand, though.
>
> -----Original Message-----
> From: Edward Q. Bridges [mailto:ebridges@argotec.de]
> Sent: Thursday, May 31, 2001 4:48 AM
> To: regexp-user@jakarta.apache.org
> Subject: possible bug?
>
>
> When i compile this regexp:
> ([\w\d\-\/]+?)\s+([\w\d\.\-]+)
> using org.apache.regexp.recompile, and
> use it to match this string:
> X11; I; Linux 2.0.32 i586
>
> i get the following for the output representing the first
> and second parenthesis:
> OS: Linux
> Version: 2
>
> When i do this in a perl script (see below) i
> get the following (which is the desired result):
> OS: Linux
> Version: 2.0.32
>
> Any ideas?  am i wrong in assuming that the regexp syntax
> is compatible with perl's syntax?
>
> thanks
> --e--
>
>
> below is the compiled regexp, and beneath that, how i use it.
>
> /**
>  * Extracts the name and version number of the operating
>  * system when there's a space between them.
>  *
>  * <p><code> ([\w\d\-\/]+?)\s+([\w\d\.\-]+) </code></p>
>  */
> final static char[] extractOS_osversion_spaces =
> {
>     0x007c, 0x0000, 0x0052, 0x0028, 0x0001, 0x0003, 0x007c,
>     0x0000, 0x0018, 0x003d, 0x0000, 0x0015, 0x005b, 0x0006,
>     0x000f, 0x0061, 0x007a, 0x0041, 0x005a, 0x005f, 0x005f,
>     0x0030, 0x0039, 0x002d, 0x002d, 0x002f, 0x002f, 0x0045,
>     0x0000, 0x0000, 0x0029, 0x0001, 0x0003, 0x005c, 0x0073,
>     0x0003, 0x007c, 0x0000, 0x0006, 0x0047, 0x0000, 0xfffa,
>     0x007c, 0x0000, 0x0003, 0x004e, 0x0000, 0x0003, 0x0028,
>     0x0002, 0x0003, 0x007c, 0x0000, 0x001c, 0x005b, 0x0005,
>     0x000d, 0x0061, 0x007a, 0x0041, 0x005a, 0x005f, 0x005f,
>     0x0030, 0x0039, 0x002d, 0x002d, 0x007c, 0x0000, 0x0006,
>     0x0047, 0x0000, 0xfff0, 0x007c, 0x0000, 0x0003, 0x004e,
>     0x0000, 0x0003, 0x0029, 0x0002, 0x0003, 0x0045, 0x0000,
>     0x0000,
> };
>
> .
> .
> .
> pattern = new RE( new REProgram(extractOS_osversion_spaces) );
> if ( pattern.match(s) ) {
>     os = pattern.getParen(1);
>     os_version = pattern.getParen(2);
>     System.out.println("OS: " + os + "\nVersion: " + os_version);
> }
> .
> .
> .
>
>
> Here's the perl script:
>
> $ua = 'X11; I; Linux 2.0.32 i586';
> $ua =~ m#([\w\d\-\/]+?)\s+([\w\d\.\-]+)#;
>
> print "OS: " . $1 . "\n";
> print "Version: " . $2 . "\n";
>
>
>
>
>
> --------------------------------------------
> <argo_tec gmbh>
>      ed.q.bridges
>      tel. 089-368179.xx / fax 089-368179.79
>      osterwaldstrasse 10 / 80805 muenchen
> </argo_tec gmbh>
> --------------------------------------------
>


RE: possible bug?

Posted by "Edward Q. Bridges" <eb...@argotec.de>.
from the javadoc at
http://jakarta.apache.org/regexp/apidocs/org/apache/regexp/RE.html

>
>    A+                   Matches A 1 or more times (greedy)
>

it would seem that by default org.apache.regexp *is* greedy.

as well, that same page explains the syntax for reluctant closures -- so that
one can make a pattern non-greedy.

thanks anyway
--e--



On Thu, 28 Jun 2001 16:26:11 -0700, Jann VanOver wrote:

>By default, perl expressions are all "greedy" -- they will match as much as
>they can.
>
>It looks like org.apache.regexp is not "greedy" -- it matches the first
>thing that it can.  There is usually an option you can add to a regexp to
>make it greedy -- I can't remember it off hand, though.
>
>-----Original Message-----
>From: Edward Q. Bridges [mailto:ebridges@argotec.de]
>Sent: Thursday, May 31, 2001 4:48 AM
>To: regexp-user@jakarta.apache.org
>Subject: possible bug?
>
>
>When i compile this regexp:
>	([\w\d\-\/]+?)\s+([\w\d\.\-]+) 
>using org.apache.regexp.recompile, and
>use it to match this string:
>	X11; I; Linux 2.0.32 i586
>
>i get the following for the output representing the first
>and second parenthesis:
>OS: Linux 
>Version: 2
>
>When i do this in a perl script (see below) i
>get the following (which is the desired result):
>OS: Linux
>Version: 2.0.32
>
>Any ideas?  am i wrong in assuming that the regexp syntax
>is compatible with perl's syntax?
>
>thanks
>--e--
>
>
>below is the compiled regexp, and beneath that, how i use it.
>
>/** 
> * Extracts the name and version number of the operating 
> * system when there's a space between them.
> *
> * <p><code> ([\w\d\-\/]+?)\s+([\w\d\.\-]+) </code></p>
> */
>final static char[] extractOS_osversion_spaces =
>{
>    0x007c, 0x0000, 0x0052, 0x0028, 0x0001, 0x0003, 0x007c,
>    0x0000, 0x0018, 0x003d, 0x0000, 0x0015, 0x005b, 0x0006,
>    0x000f, 0x0061, 0x007a, 0x0041, 0x005a, 0x005f, 0x005f,
>    0x0030, 0x0039, 0x002d, 0x002d, 0x002f, 0x002f, 0x0045,
>    0x0000, 0x0000, 0x0029, 0x0001, 0x0003, 0x005c, 0x0073,
>    0x0003, 0x007c, 0x0000, 0x0006, 0x0047, 0x0000, 0xfffa,
>    0x007c, 0x0000, 0x0003, 0x004e, 0x0000, 0x0003, 0x0028,
>    0x0002, 0x0003, 0x007c, 0x0000, 0x001c, 0x005b, 0x0005,
>    0x000d, 0x0061, 0x007a, 0x0041, 0x005a, 0x005f, 0x005f,
>    0x0030, 0x0039, 0x002d, 0x002d, 0x007c, 0x0000, 0x0006,
>    0x0047, 0x0000, 0xfff0, 0x007c, 0x0000, 0x0003, 0x004e,
>    0x0000, 0x0003, 0x0029, 0x0002, 0x0003, 0x0045, 0x0000,
>    0x0000,
>};
>
>.
>.
>.
>pattern = new RE( new REProgram(extractOS_osversion_spaces) );
>if ( pattern.match(s) ) {
>    os = pattern.getParen(1);
>    os_version = pattern.getParen(2);
>    System.out.println("OS: " + os + "\nVersion: " + os_version);
>}
>.
>.
>.
>
>
>Here's the perl script:
>
>$ua = 'X11; I; Linux 2.0.32 i586';
>$ua =~ m#([\w\d\-\/]+?)\s+([\w\d\.\-]+)#;
>
>print "OS: " . $1 . "\n";
>print "Version: " . $2 . "\n";
>
>
>
>
>
>--------------------------------------------
><argo_tec gmbh>
>     ed.q.bridges
>     tel. 089-368179.xx / fax 089-368179.79
>     osterwaldstrasse 10 / 80805 muenchen
></argo_tec gmbh>
>--------------------------------------------  
>

--------------------------------------------
<argo_tec gmbh>
     ed.q.bridges
     tel. 089-368179.552
     fax 089-368179.79
     osterwaldstraße 10 
     (haus F eingang 21)
     80805 münchen
</argo_tec gmbh>
--------------------------------------------