You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Erik Hatcher <er...@ehatchersolutions.com> on 2005/12/04 17:45:22 UTC
Re: Test code for regex queries
Following up on the (Span)RegexQuery topic, I've started working on
moving this code to contrib/regex so that it can leverage various
regex implementations. I'm making a generic interface that currently
(though subject to change) has these methods:
void compile(String pattern);
boolean match(String string);
int prefixLength();
I'm going to initially create an implementation for both Jakarta
Regexp and java.util.regex, and probably Jakarta ORO also. I've been
able to extract the prefix length using Jakarta Regexp, but I don't
believe this is possible with java.util.regex. I haven't looked into
Jakarta ORO deep enough yet to see if it makes this available.
(Span)RegexQuery will have a setter for specifying which
implementation to use, probably with the default for java.util.regex
to allow running without any dependencies.
An interesting thing to note...
Jakarta Regex: "a.c" matches "abcd"
java.util.regex: "a.c" does not match "abcd" using Matcher.matches
(), but it does match using Matcher.lookingAt()
In other words, if you want "a.*" to only match terms that begin with
"a", the regex logically must be specified as "^a.*". This is of no
real concern to the regex query really, but the underlying matching
implementation. And for query parsing, it would likely be desirable
to wrap all regex expressions with ^...$ (which is generally what
users would mean when saying "a.*").
I'm also considering having the implementation independent interface
specify a method to rotate an expression, though this is a more
advanced feature that perhaps belongs at a different layer.
I'm open to suggestions on all of this, with my main goal to provide
a general purpose regular expression query that can be as fast as
possible by minimizing term enumeration.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org