You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oro-user@jakarta.apache.org by "Daniel F. Savarese" <df...@savarese.org> on 2002/05/31 18:02:31 UTC

Re: Subgroup wrong when matching (.)(?=(.)) against "XY"?

In message <OE...@hotmail.com>, "Adrian Boyko" writes:
>But the slightly simpler case, below, doesn't seem to match the 2nd subgroup
>correctly:
>   Perl5 Expression: (.)(?=(.))
>   Search Input: XY
>   Match 1: X
>      Subgroups:
>         1: X
>         2:
>
>Shouldn't the second subgroup in the second case be "Y" instead of blank?

Perl does fill $2 with Y.  And so does Perl5Matcher.  If you look at
the group offsets, you'll find the matching is performed correctly.  In
other words, two groups are found and the begin and end offsets of the
second group are 1 and 2.  However, because the matched group was a
zero-width lookahead assertion, the Y character it is not consumed and
considered part of the full match.  So the full match is just 'X'.  Since
the full match stored in the MatchResult is 'X', offsets that exceed the
length of the match result in empty strings.  To show that the full
match is just 'X' in Perl, look at group 0 here:

~> perl -e '"xy" =~ /(.)(?=(.))/; print "0: $& 1: $1 2: $2 3: $3\n";'
0: x 1: x 2: y 3: 

Now, the problem we're faced with is one that I'm not sure how to deal
with.  Is the capturing of the lookahead assertion an undefined
side-effect, much as some situations involving the capturing of
repetitions used to be before Perl 5.6?  Or is it intended for a
groups that match outside of the full match to be saved?  It's
actually quite tricky to implement this without either maintaining
a reference to a copy of the entire original input (undesirable) or
screwing up a lot of other cases.  I'd like to mull this one over
for a while.  In the meantime, the workaround is to use the group
offset information to extract the appropriate substring from the
input.

daniel



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>