You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucenenet.apache.org by Andy Berryman <to...@gmail.com> on 2006/12/22 00:30:59 UTC

Lucene Query Parsing versus Programmatic Query Building with API

I've been working for Lucene for a few months now and have implemented a new
search solution for the company that I work for which has turned out very
well.  As such, we are examining its use for other scenarios now.  I'm
trying to do some investigation that centers around the building of the
query used for searching.

Assume that I have an Index with a "Text" Field named "Contents" (to contain
the main text) and another "Keyword" Field named "DocID" (which contains a
text identifier kind of like a SKU which is lower-cased at index time).
When I index the data, I'm doing so using the StandardAnalyzer object.  My
current process builds the query as a string and hands that to the
QueryParser to return the Query.

Example:  Find all documents with the phrase "star wars dvd" in Contents
that have a DocID beginning with "ZXY".

(Contents:"star wars dvd") AND (DocID:ZXY*)

What I'd like to know is how to build this programmatically instead of using
the QueryParser.  I have a general idea, but I'm looking for some help and
validation from the group please.  Can someone out there give me a hand?

Also ... Another question I have is about the Analyzer's involvement here.
When you use the QueryParser object, you pass in the particular Analyzer (in
my case its the StandardAnalyzer) you would like to use when analyzing the
text provided in the parse string.  Why is it that when you are building the
query programmatically, that you dont ever have to specify an Analyzer
object?  I say this because I've been unable to find that
functionality/parameter/property anywhere.  I'm probably missing something
simple here, but it's just not making since to me.

Thanks in advance for the help everyone!
Andy

Re: Lucene Query Parsing versus Programmatic Query Building with API

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

Did anyone reply to this yet?

On Dec 21, 2006, at 6:30 PM, Andy Berryman wrote:
> Example:  Find all documents with the phrase "star wars dvd" in  
> Contents
> that have a DocID beginning with "ZXY".
>
> (Contents:"star wars dvd") AND (DocID:ZXY*)
>
> What I'd like to know is how to build this programmatically instead  
> of using
> the QueryParser.  I have a general idea, but I'm looking for some  
> help and
> validation from the group please.  Can someone out there give me a  
> hand?

Have a look at the javadocs for all the Query subclasses, and the  
source code to QueryParser (ignoring all the JavaCC gunk in the  
way).  Also, there are lots of examples of this in "Lucene in Action"  
and the free code you can snag at lucenebook.com.

> Also ... Another question I have is about the Analyzer's  
> involvement here.
> When you use the QueryParser object, you pass in the particular  
> Analyzer (in
> my case its the StandardAnalyzer) you would like to use when  
> analyzing the
> text provided in the parse string.  Why is it that when you are  
> building the
> query programmatically, that you dont ever have to specify an Analyzer
> object?  I say this because I've been unable to find that
> functionality/parameter/property anywhere.  I'm probably missing  
> something
> simple here, but it's just not making since to me.

Analyzing text is not something the Query subclasses should be doing,  
but rather, as you're experiencing, it is delegated to the client of  
the Query subclasses.  You'll find it quite easy (only a few lines of  
code).  Check out the AnalysisDemo code (particularly the analyze  
method) here:

	<http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html>

That'll give you exactly what you want, and just insert adding  
clauses to a BooleanQuery or something like that inside the loop of  
terms (like QueryParser does).

	Erik