You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Tavi Nathanson <ta...@gmail.com> on 2010/04/27 04:17:29 UTC

Complex Query Parsing and Tokenization: ANTLR, JavaCC, Solr

Hey everyone,

My organization uses our own homebrew QueryParser class, unrelated to
Lucene's JavaCC-based QueryParser, to parse our queries. We don't currently
use anything from Solr. Our QueryParser class has gotten quite cumbersome,
and I'm looking into alternatives. Grammar-based parsing seems like the way
to go, but I've got some questions:

- ANTLR seems to be very well-supported and well-liked, but I see that
Lucene's QueryParser and StandardTokenizer use JavaCC. Does anyone have
experience writing a Lucene or Solr parser using ANTLR? Any thoughts on
whether it would be helpful to stick with JavaCC, or problematic to use
ANTLR, in light of Lucene's default usage of JavaCC?
- Any experience using ANTLR for tokenization?
- I was told that Solr might be componentizing its query parsing in such a
way that we might be able to use that instead of a homebrew grammar-based
solution. However, I haven't found anything written about that. I don't know
much about Solr's query parsing, other than what I saw looking at
QParser.java and QParserPlugin.java: it seems that one can plug in any
parser needed. That doesn't really help us, as our goal is to simplify our
parsing logic. Is there any way to structure our query parsing logic without
needing to write a grammar from scratch, whether it's a Solr component or
something else?

In a nutshell, I'm trying to get a sense of the best practices in this
situation (namely, custom query parsing that's getting very complex) before
I dive into implementing a solution.

Thanks!
Tavi

Re: Complex Query Parsing and Tokenization: ANTLR, JavaCC, Solr

Posted by Earwin Burrfoot <ea...@gmail.com>.
We use ANTLR for query parsing. Works good for the lazy guys :)

On Tue, Apr 27, 2010 at 06:17, Tavi Nathanson <ta...@gmail.com> wrote:
> Hey everyone,
>
> My organization uses our own homebrew QueryParser class, unrelated to
> Lucene's JavaCC-based QueryParser, to parse our queries. We don't currently
> use anything from Solr. Our QueryParser class has gotten quite cumbersome,
> and I'm looking into alternatives. Grammar-based parsing seems like the way
> to go, but I've got some questions:
>
> - ANTLR seems to be very well-supported and well-liked, but I see that
> Lucene's QueryParser and StandardTokenizer use JavaCC. Does anyone have
> experience writing a Lucene or Solr parser using ANTLR? Any thoughts on
> whether it would be helpful to stick with JavaCC, or problematic to use
> ANTLR, in light of Lucene's default usage of JavaCC?
> - Any experience using ANTLR for tokenization?
> - I was told that Solr might be componentizing its query parsing in such a
> way that we might be able to use that instead of a homebrew grammar-based
> solution. However, I haven't found anything written about that. I don't know
> much about Solr's query parsing, other than what I saw looking at
> QParser.java and QParserPlugin.java: it seems that one can plug in any
> parser needed. That doesn't really help us, as our goal is to simplify our
> parsing logic. Is there any way to structure our query parsing logic without
> needing to write a grammar from scratch, whether it's a Solr component or
> something else?
>
> In a nutshell, I'm trying to get a sense of the best practices in this
> situation (namely, custom query parsing that's getting very complex) before
> I dive into implementing a solution.
>
> Thanks!
> Tavi
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org