You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Roman Chyla <ro...@gmail.com> on 2011/09/13 19:34:11 UTC

How to plug a new ANTLR grammar

Hi,

The standard lucene/solr parsing is nice but not really flexible. I
saw questions and discussion about ANTLR, but unfortunately never a
working grammar, so... maybe you find this useful:
https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr

In the grammar, the parsing is completely abstracted from the Lucene
objects, and the parser is not mixed with Java code. At first it
produces structures like this:
https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html

But now I have a problem. I don't know if I should use query parsing
framework in contrib.

It seems that the qParser in contrib can use different parser
generators (the default JavaCC, but also ANTLR). But I am confused and
I don't understand this new queryParser from contrib. It is really
very confusing to me. Is there any benefit in trying to plug the ANTLR
tree into it? Because looking at the AST pictures, it seems that with
a relatively simple tree walker we could build the same queries as the
current standard lucene query parser. And it would be much simpler and
flexible. Does it bring something new? I have a feeling I miss
something...

Many thanks for help,

  Roman

Re: How to plug a new ANTLR grammar

Posted by Peter Keegan <pe...@gmail.com>.
>Also, a question for Peter, at which stage do you use lucene analyzers
>on the query? After it was parsed into the tree, or before we start
>processing the query string?

I do the analysis before creating the tree. I'm pretty sure Lucene
QueryParser does this, too.

Peter

On Wed, Sep 14, 2011 at 5:15 AM, Roman Chyla <ro...@gmail.com> wrote:

> Hi Peter,
>
> Yes, with the tree it is pretty straightforward. I'd prefer to do it
> that way, but what is the purpose of the new qParser then? Is it just
> that the qParser was built with a different paradigms in mind where
> the parse tree was not in the equation? Anybody knows if there is any
> advantage?
>
> I looked bit more into the contrib
>
> org.apache.lucene.queryParser.standard.StandardQueryParser.java
> org.apache.lucene.queryParser.standard.QueryParserWrapper.java
>
> And some things there (like setting default fuzzy value) are in my
> case set directly in the grammar. So the query builder is still
> somehow involved in parsing (IMHO not good).
>
> But if someone knows some reasons to keep using the qParser, please
> let me know.
>
> Also, a question for Peter, at which stage do you use lucene analyzers
> on the query? After it was parsed into the tree, or before we start
> processing the query string?
>
> Thanks!
>
>  Roman
>
>
>
>
>
> On Tue, Sep 13, 2011 at 10:14 PM, Peter Keegan <pe...@gmail.com>
> wrote:
> > Roman,
> >
> > I'm not familiar with the contrib, but you can write your own Java code
> to
> > create Query objects from the tree produced by your lexer and parser
> > something like this:
> >
> > StandardLuceneGrammarLexer lexer = new ANTLRReaderStream(new
> > StringReader(queryString));
> > CommonTokenStream tokens = new CommonTokenStream(lexer);
> > StandardLuceneGrammarParser parser = new
> > StandardLuceneGrammarParser(tokens);
> > StandardLuceneGrammarParser.query_return ret = parser.mainQ();
> > CommonTree t = (CommonTree) ret.getTree();
> > parseTree(t);
> >
> > parseTree (Tree t) {
> >
> > // recursively parse the Tree, visit each node
> >
> >   visit (node);
> >
> > }
> >
> > visit (Tree node) {
> >
> > switch (node.getType()) {
> > case (StandardLuceneGrammarParser.AND:
> > // Create BooleanQuery, push onto stack
> > ...
> > }
> > }
> >
> > I use the stack to build up the final Query from the queries produced in
> the
> > tree parsing.
> >
> > Hope this helps.
> > Peter
> >
> >
> > On Tue, Sep 13, 2011 at 3:16 PM, Jason Toy <ja...@gmail.com> wrote:
> >
> >> I'd love to see the progress on this.
> >>
> >> On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla <ro...@gmail.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > The standard lucene/solr parsing is nice but not really flexible. I
> >> > saw questions and discussion about ANTLR, but unfortunately never a
> >> > working grammar, so... maybe you find this useful:
> >> >
> >> >
> >>
> https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr
> >> >
> >> > In the grammar, the parsing is completely abstracted from the Lucene
> >> > objects, and the parser is not mixed with Java code. At first it
> >> > produces structures like this:
> >> >
> >> >
> >>
> https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html
> >> >
> >> > But now I have a problem. I don't know if I should use query parsing
> >> > framework in contrib.
> >> >
> >> > It seems that the qParser in contrib can use different parser
> >> > generators (the default JavaCC, but also ANTLR). But I am confused and
> >> > I don't understand this new queryParser from contrib. It is really
> >> > very confusing to me. Is there any benefit in trying to plug the ANTLR
> >> > tree into it? Because looking at the AST pictures, it seems that with
> >> > a relatively simple tree walker we could build the same queries as the
> >> > current standard lucene query parser. And it would be much simpler and
> >> > flexible. Does it bring something new? I have a feeling I miss
> >> > something...
> >> >
> >> > Many thanks for help,
> >> >
> >> >  Roman
> >> >
> >>
> >>
> >>
> >> --
> >> - sent from my mobile
> >> 6176064373
> >>
> >
>

Re: How to plug a new ANTLR grammar

Posted by Roman Chyla <ro...@gmail.com>.
Hi Peter,

Yes, with the tree it is pretty straightforward. I'd prefer to do it
that way, but what is the purpose of the new qParser then? Is it just
that the qParser was built with a different paradigms in mind where
the parse tree was not in the equation? Anybody knows if there is any
advantage?

I looked bit more into the contrib

org.apache.lucene.queryParser.standard.StandardQueryParser.java
org.apache.lucene.queryParser.standard.QueryParserWrapper.java

And some things there (like setting default fuzzy value) are in my
case set directly in the grammar. So the query builder is still
somehow involved in parsing (IMHO not good).

But if someone knows some reasons to keep using the qParser, please
let me know.

Also, a question for Peter, at which stage do you use lucene analyzers
on the query? After it was parsed into the tree, or before we start
processing the query string?

Thanks!

  Roman





On Tue, Sep 13, 2011 at 10:14 PM, Peter Keegan <pe...@gmail.com> wrote:
> Roman,
>
> I'm not familiar with the contrib, but you can write your own Java code to
> create Query objects from the tree produced by your lexer and parser
> something like this:
>
> StandardLuceneGrammarLexer lexer = new ANTLRReaderStream(new
> StringReader(queryString));
> CommonTokenStream tokens = new CommonTokenStream(lexer);
> StandardLuceneGrammarParser parser = new
> StandardLuceneGrammarParser(tokens);
> StandardLuceneGrammarParser.query_return ret = parser.mainQ();
> CommonTree t = (CommonTree) ret.getTree();
> parseTree(t);
>
> parseTree (Tree t) {
>
> // recursively parse the Tree, visit each node
>
>   visit (node);
>
> }
>
> visit (Tree node) {
>
> switch (node.getType()) {
> case (StandardLuceneGrammarParser.AND:
> // Create BooleanQuery, push onto stack
> ...
> }
> }
>
> I use the stack to build up the final Query from the queries produced in the
> tree parsing.
>
> Hope this helps.
> Peter
>
>
> On Tue, Sep 13, 2011 at 3:16 PM, Jason Toy <ja...@gmail.com> wrote:
>
>> I'd love to see the progress on this.
>>
>> On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla <ro...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > The standard lucene/solr parsing is nice but not really flexible. I
>> > saw questions and discussion about ANTLR, but unfortunately never a
>> > working grammar, so... maybe you find this useful:
>> >
>> >
>> https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr
>> >
>> > In the grammar, the parsing is completely abstracted from the Lucene
>> > objects, and the parser is not mixed with Java code. At first it
>> > produces structures like this:
>> >
>> >
>> https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html
>> >
>> > But now I have a problem. I don't know if I should use query parsing
>> > framework in contrib.
>> >
>> > It seems that the qParser in contrib can use different parser
>> > generators (the default JavaCC, but also ANTLR). But I am confused and
>> > I don't understand this new queryParser from contrib. It is really
>> > very confusing to me. Is there any benefit in trying to plug the ANTLR
>> > tree into it? Because looking at the AST pictures, it seems that with
>> > a relatively simple tree walker we could build the same queries as the
>> > current standard lucene query parser. And it would be much simpler and
>> > flexible. Does it bring something new? I have a feeling I miss
>> > something...
>> >
>> > Many thanks for help,
>> >
>> >  Roman
>> >
>>
>>
>>
>> --
>> - sent from my mobile
>> 6176064373
>>
>

Re: How to plug a new ANTLR grammar

Posted by Peter Keegan <pe...@gmail.com>.
Roman,

I'm not familiar with the contrib, but you can write your own Java code to
create Query objects from the tree produced by your lexer and parser
something like this:

StandardLuceneGrammarLexer lexer = new ANTLRReaderStream(new
StringReader(queryString));
CommonTokenStream tokens = new CommonTokenStream(lexer);
StandardLuceneGrammarParser parser = new
StandardLuceneGrammarParser(tokens);
StandardLuceneGrammarParser.query_return ret = parser.mainQ();
CommonTree t = (CommonTree) ret.getTree();
parseTree(t);

parseTree (Tree t) {

// recursively parse the Tree, visit each node

   visit (node);

}

visit (Tree node) {

switch (node.getType()) {
case (StandardLuceneGrammarParser.AND:
// Create BooleanQuery, push onto stack
...
}
}

I use the stack to build up the final Query from the queries produced in the
tree parsing.

Hope this helps.
Peter


On Tue, Sep 13, 2011 at 3:16 PM, Jason Toy <ja...@gmail.com> wrote:

> I'd love to see the progress on this.
>
> On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla <ro...@gmail.com>
> wrote:
>
> > Hi,
> >
> > The standard lucene/solr parsing is nice but not really flexible. I
> > saw questions and discussion about ANTLR, but unfortunately never a
> > working grammar, so... maybe you find this useful:
> >
> >
> https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr
> >
> > In the grammar, the parsing is completely abstracted from the Lucene
> > objects, and the parser is not mixed with Java code. At first it
> > produces structures like this:
> >
> >
> https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html
> >
> > But now I have a problem. I don't know if I should use query parsing
> > framework in contrib.
> >
> > It seems that the qParser in contrib can use different parser
> > generators (the default JavaCC, but also ANTLR). But I am confused and
> > I don't understand this new queryParser from contrib. It is really
> > very confusing to me. Is there any benefit in trying to plug the ANTLR
> > tree into it? Because looking at the AST pictures, it seems that with
> > a relatively simple tree walker we could build the same queries as the
> > current standard lucene query parser. And it would be much simpler and
> > flexible. Does it bring something new? I have a feeling I miss
> > something...
> >
> > Many thanks for help,
> >
> >  Roman
> >
>
>
>
> --
> - sent from my mobile
> 6176064373
>

Re: How to plug a new ANTLR grammar

Posted by Jason Toy <ja...@gmail.com>.
I'd love to see the progress on this.

On Tue, Sep 13, 2011 at 10:34 AM, Roman Chyla <ro...@gmail.com> wrote:

> Hi,
>
> The standard lucene/solr parsing is nice but not really flexible. I
> saw questions and discussion about ANTLR, but unfortunately never a
> working grammar, so... maybe you find this useful:
>
> https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/iqp/antlr
>
> In the grammar, the parsing is completely abstracted from the Lucene
> objects, and the parser is not mixed with Java code. At first it
> produces structures like this:
>
> https://svnweb.cern.ch/trac/rcarepo/raw-attachment/wiki/MontySolrQueryParser/index.html
>
> But now I have a problem. I don't know if I should use query parsing
> framework in contrib.
>
> It seems that the qParser in contrib can use different parser
> generators (the default JavaCC, but also ANTLR). But I am confused and
> I don't understand this new queryParser from contrib. It is really
> very confusing to me. Is there any benefit in trying to plug the ANTLR
> tree into it? Because looking at the AST pictures, it seems that with
> a relatively simple tree walker we could build the same queries as the
> current standard lucene query parser. And it would be much simpler and
> flexible. Does it bring something new? I have a feeling I miss
> something...
>
> Many thanks for help,
>
>  Roman
>



-- 
- sent from my mobile
6176064373