You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by code fx9 <co...@gmail.com> on 2015/04/05 23:08:13 UTC

Customizing Regexp syntax in Lucene

Hi,
We are using Lucene indirectly via ElasticSearch. We would like to use RE2
syntax for running regex queries against Lucene. We are already using RE2
syntax for other parts of our system, so not ability to use the same syntax
is a deal-breaker for us.

Recently Google has released a pure Java implementation of this library on
GitHub. Will it be possible to actually use RE2/J library to run regex
queries in Lucene? I understand that it might require customizing Lucene
source code. Can you give me any idea how complex and time consuming such
endeavor might be.

RE2 Syntax: https://re2.googlecode.com/hg/doc/syntax.html
RE2/J :https://github.com/google/re2j

Thanks.

Re: Customizing Regexp syntax in Lucene

Posted by Robert Muir <rc...@gmail.com>.
On Sun, Apr 5, 2015 at 5:08 PM, code fx9 <co...@gmail.com> wrote:
> Hi,
> We are using Lucene indirectly via ElasticSearch. We would like to use RE2
> syntax for running regex queries against Lucene. We are already using RE2
> syntax for other parts of our system, so not ability to use the same syntax
> is a deal-breaker for us.
>
> Recently Google has released a pure Java implementation of this library on
> GitHub. Will it be possible to actually use RE2/J library to run regex
> queries in Lucene? I understand that it might require customizing Lucene
> source code. Can you give me any idea how complex and time consuming such
> endeavor might be.
>
> RE2 Syntax: https://re2.googlecode.com/hg/doc/syntax.html
> RE2/J :https://github.com/google/re2j
>
> Thanks.

The only place in lucene that "knows" about syntax is RegexpQuery. It
only has logic for parsing that syntax into a state machine (Automaton
class), otherwise AutomatonQuery takes care of the execution.

Maybe you could create an Re2Query class that works in a similar way:
e.g. uses RE2/J library to parse the syntax into its state machine
representation and translates that to Automaton representation used by
Lucene.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org