You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christopher M Collins <co...@us.ibm.com> on 2008/08/01 04:06:55 UTC
SpanRegexQuery
Hello,
I'm trying to use SpanRegexQuery as one of the clauses in my SpanQuery.
When I give it a regex like: "L[a-z]+ing" and do a rewrite on the final
query I get terms like "Labinger" and "Lackonsingh" along with the expected
terms "Labeling", "Lacing", etc. It's as if the regex is treated as a
"find()" and not a "match()" in Java. Is there a way to make it behave
like a full match, and not a prefix regex?
Thanks!
Christopher
______________________________________________________________
Christopher Collins \ http://www.cs.utoronto.ca/~ccollins
Department of Computer Science \ University of Toronto
Collaborative User Experience Group \ IBM Research
Re: SpanRegexQuery
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 31, 2008, at 10:06 PM, Christopher M Collins wrote:
> I'm trying to use SpanRegexQuery as one of the clauses in my
> SpanQuery.
> When I give it a regex like: "L[a-z]+ing" and do a rewrite on the
> final
> query I get terms like "Labinger" and "Lackonsingh" along with the
> expected
> terms "Labeling", "Lacing", etc. It's as if the regex is treated as a
> "find()" and not a "match()" in Java. Is there a way to make it
> behave
> like a full match, and not a prefix regex?
There are two implementations of the regex engine built into
SpanRegexQuery, one using Java's java.util.regex, the other using
Jakarta Regexp. The default implementation is java.util.regex, which
matches like this:
pattern.matcher(string).lookingAt()
And Jakarta Regexp matches like this:
regexp.match(string)
I'm not sure myself the differences in these two without doing some
tests, but certainly they should, ahem, match in at least the
expectation of whether there is an implied ^string$ or not. But at a
quick glance the respective javadocs, it does seem like the
java.util.regex implementation should be using
pattern.matcher(string).matches() instead. lookingAt() always starts
at the beginning, so there is an implied ^string effect, but not so
with the akarta Regexp implementation.
As Daniel mentioned, putting a $ at the end should do the trick, and
seems to me that it really should be necessary... but so should ^ in
front if you want it to start at the beginning and not match anywhere
in the string.
Changing JavaUtilRegexCapabilities to use matches() seems like the
right thing to do, but that'd break backwards compatibility. *ugh*
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: SpanRegexQuery
Posted by Daniel Noll <da...@nuix.com>.
Christopher M Collins wrote:
> Hello,
>
> I'm trying to use SpanRegexQuery as one of the clauses in my SpanQuery.
> When I give it a regex like: "L[a-z]+ing" and do a rewrite on the final
> query I get terms like "Labinger" and "Lackonsingh" along with the expected
> terms "Labeling", "Lacing", etc. It's as if the regex is treated as a
> "find()" and not a "match()" in Java. Is there a way to make it behave
> like a full match, and not a prefix regex?
Have you tried appending $ onto the end of it? I think we noticed the
same issue with regex queries here and had to apply a workaround of that
sort.
Daniel
--
Daniel Noll
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org