You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Carsten Schnober <sc...@ids-mannheim.de> on 2013/01/30 17:12:16 UTC
Re: ANTLR and Custom Query Syntax/Parser

Am 29.01.2013 00:24, schrieb Trejkaz:
> On Tue, Jan 29, 2013 at 3:42 AM, Andrew Gilmartin
> <an...@andrewgilmartin.com> wrote:
>> When I first started using Lucene, Lucene's Query classes where not suitable
>> for use with the Visitor pattern and so I created my own query class
>> equivalants and other more specialized ones. Lucene's classes might have
>> changed since then (I do not know)
> 
> On that subject, the infrastructure behind StandardQueryParser is
> along those lines. Query itself is still not very flexible, but
> QueryNode is much more convenient and there are processors for walking
> the tree to do transformations.
> 
> We ended up using ANTLR to do the syntax parsing for our stuff and
> then using most of the standard transformations as-is, decorated in
> some cases (either to customise or to work around bugs.) Of course we
> had to add our own for all the new features, but we got a fair bit of
> reuse out of the new framework.

Hi,
thanks for your hints, everyone! I am still a little bit puzzled about
where to start though.
The general task is to generate SpanQueries from the tree provided by
the ANTLR query syntax parser. The special feature about that query
language (that I have not specified and that I cannot change) is that
there are binary operators such as "/s0" indicating that the payloads of
two tokens have to be identical and implying an AND. The query "A /s0 B"
means "find documents that contain A AND B where A and B have identical
payloads".
My intuitive solution would be to make a filter from a BooleanQuery with
A AND B and apply that filter in two separate SpanTermQuerys for A and
for B respectively. Then, I would perform an intersection on the hits
based on the payloads.
However, I am still puzzled how to approach this coming from a an
Antlr-generated tree. This may be due to a certain lack of routine
dealing with Antlr output, but when the parser returns an object of some
subclass of RuleReturnScope, how would I be able to derive appropriate
Lucene Query subclasses?
Best,
Carsten


-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org