You are viewing a plain text version of this content. The canonical link for it is here.
Posted to regexp-dev@jakarta.apache.org by Ian Swett <is...@ispheres.com> on 2001/02/24 00:16:55 UTC

Bugs I've found

I've found two bugs recently in regexp.  I'm new to the list, so I
apologize if these are known issues.

I wanted to notify the list of the problems I found, ensure they're
actually problems, and make sure I'm going about solving them in the
correct manner.

1) RECompiler dies when compiling regular expressions with '*?(' 
sequence of characters in the regexp.  Sometimes the next offset of a node
has not been set to zero, so when next = node + instruction[node +
offsetNext], next is very large, and you get an arrayoutOfBounds
exception.  I added a check to make sure there was no array out of bounds
case, and returned -1 in that case.  It appears to work, but there may be
a more correct way to fix this bug.

2) The other problem is with reluctant closures.  Because reluctant
closures are not recursive, cases like the following fail: b(aaa|aaaaa)*?b
does not accept baaaaaaaaaab (10 a's), when it should.  I have tried to
change around reluctant closures so they're implemented more similarly to
greedy ones(with recursive or's), but I don't have it working yet.  

Ian Swett


Re: Bugs I've found

Posted by Michael McCallum <mi...@spinsoftware.com>.
On 26 Feb 2001, at 10:28, Ian Swett wrote:

> For #1, your fix seems good.
> 
> For #2, I have not been able to get it working.  I think it may be related
> to the fact that (|a) does not work either(throws a RESyntaxException).  I
> don't believe that putting a NOTHING node as the first node in an OR was
> ever intended.  Unfortunately, it may be more difficult than just changing
> RECompiler.  
No when I looked at doing it it required changes to both RE and RECompiler.
> 
> I can forward the code snippets I used to you, if you'd like to look at
> them.  
yep will be good to see another approach.

> 
> You said you had a solution(though it may be somewhat of a hack) for
> Reluctant Closures.  Does it work in all/most cases?  Is it checked into
> CVS now?
I have had a nightmare 5 days so have not checked it in.
Will do it tonight. About 9pm NZDT == 8am UTC.
It works in nearly all cases. But send all your test cases and I will add them to 
the test suite.

Michael
--- BEGIN GEEK CODE BLOCK ---
Version 3.12
GCS d+(-) s:- a-- C++(+++)$ UL++++(H)(S)$ P+++$ L+++$>++++
E--- W++ N++ o++ K? !w() O? !M V? PS+ PE+++ Y+ t+ 5++ X++ 
R(+) !tv b++(++++) D++ G>++ e++> h--()(*) r+ y+()
--- END GEEK CODE BLOCK ---

Re: Bugs I've found

Posted by Michael McCallum <gh...@xtra.co.nz>.
On Friday 23 February 2001 23:16, you wrote:
} I've found two bugs recently in regexp.  I'm new to the list, so I
} apologize if these are known issues.
New solutions are always good for comparison.
Esp when untainted by the previous ones. (Like the prime directive :)

}
} 1) RECompiler dies when compiling regular expressions with '*?('
} sequence of characters in the regexp.  Sometimes the next offset of a node
} has not been set to zero, so when next = node + instruction[node +
} offsetNext], next is very large, and you get an arrayoutOfBounds
} exception.  I added a check to make sure there was no array out of bounds
} case, and returned -1 in that case.  It appears to work, but there may be
} a more correct way to fix this bug.
I fixed this by making sure the nextOfEnd did not go past the list of currently defined nodes. 

}
} 2) The other problem is with reluctant closures.  Because reluctant
} closures are not recursive, cases like the following fail: b(aaa|aaaaa)*?b
} does not accept baaaaaaaaaab (10 a's), when it should.  I have tried to
} change around reluctant closures so they're implemented more similarly to
} greedy ones(with recursive or's), but I don't have it working yet.
I noticed when looking at this that the greedy and non-greedy closures were implemented differently.
Was not sure why. Do you think you can get the recursive or's working?
Because of the current implementation of the nongreedy closures you get infinite loops generated
I stoped this by not allowing the loop to be created but if youve fixed the non-greedy closures then 
I can get rid of that hack.

Send a patch for the fixes you came up with.
I'll put them in.

Michael