You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Rajesh parab <ra...@yahoo.com> on 2009/01/13 02:43:01 UTC

Using analyzer while constructing Lucene queries

Hi,

For proper results during searches, the recommendation is to use same analyzer for indexing and querying. We can achieve this by passing the same analyzer, which was used for indexing, to QueryParser to construct Lucene query and use this query while searching the index.

The question is - How can we use the analyzer that was used for indexing, if we want to construct Lucene queries manually using Query classes (like BooleanQuery, TermQuery, PhraseQuery, etc) instead of using QueryParser?

Is there any way to achieve it?

Regards,
Rajesh


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Using analyzer while constructing Lucene queries

Posted by Chris Hostetter <ho...@fucit.org>.

: 2> Construct your queries by yourself, by using TermQuery,
: PhraseQuery etc. directly. There are no Analyzers used when
: choosing this option. You have to know what the effects of the
: Analyzer used during indexing had on the field you're searching
: and be sure you're doing something compatible.

...well, to be clear: you're writing the code, so you could use an 
Analyzer when choosing this option (and in many cases this probably makes 
a lot of sense).

The bottom line is that if you are implementing some code that takes a 
string as input, and produces a query as output you are essentially 
implementing your own query parser;  wether you implement it using an 
Analyzer or some other logic is up to you and will depend largely on the 
types of queries you aregenerating and the contents of your index.

: > What I am not really getting it how I can use the same analyzer during
: > searches is I am constructing queries manually.

call the tokenStream method of your analyzer, iterate over the tokens, 
build your query as you see fit.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Using analyzer while constructing Lucene queries

Posted by Erick Erickson <er...@gmail.com>.

I'm pretty sure that StandardAnalyzer does NOT stem, BTW.

But to your main question. I'm confused by your user of the
term "manually". When creating a query, you really have
two choices:

1> let the query parser do your work for you. The result of
the parse operation is a Query, which can be added
to a BooleanQuery. In this case you specify the analyzer
just as you would during indexing. PerFieldAnalyzerWrapper
is your friend.

2> Construct your queries by yourself, by using TermQuery,
PhraseQuery etc. directly. There are no Analyzers used when
choosing this option. You have to know what the effects of the
Analyzer used during indexing had on the field you're searching
and be sure you're doing something compatible.

HTH
Erick

On Wed, Jan 14, 2009 at 1:32 PM, Rajesh parab <ra...@yahoo.com>wrote:

>
> Thanks Ian.
>
> I agree with you on lowercasing of characters. My main concern is specific
> to stemming done by analyzers.
>
> For example, StandardAnalyzer will stem words like playing, played, plays,
> etc. to a common tokan "play" which will be stored in the index. Now, during
> searches, we would need same stemming to be performed on search tokens so
> that we can use equals searches and get correct results back. In this
> example, the search term "playing" or "plays" may not return the document as
> it is indexed with token "play".
>
> What I am not really getting it how I can use the same analyzer during
> searches is I am constructing queries manually.
>
> Regards,
> Rajesh
>
> --- On Tue, 1/13/09, Ian Lea <ia...@gmail.com> wrote:
>
> > From: Ian Lea <ia...@gmail.com>
> > Subject: Re: Using analyzer while constructing Lucene queries
> > To: java-user@lucene.apache.org, rajesh_parab_1@yahoo.com
> > Date: Tuesday, January 13, 2009, 9:33 AM
> > If you are building queries manually, bypassing analysis,
> > you just
> > need to make sure that you know what you are doing.  As a
> > trivial
> > example, if you are indexing with an analyzer that
> > downcases
> > everything then you need to pass lowercase terms to
> > TermQuery.
> >
> > You can still use an analyzer where appropriate e.g. to
> > parse a string
> > into a Query that you add to a BooleanQuery.
> >
> >
> > --
> > Ian.
> >
> >
> > On Tue, Jan 13, 2009 at 1:43 AM, Rajesh parab
> > <ra...@yahoo.com> wrote:
> > > Hi,
> > >
> > > For proper results during searches, the recommendation
> > is to use same analyzer for indexing and querying. We can
> > achieve this by passing the same analyzer, which was used
> > for indexing, to QueryParser to construct Lucene query and
> > use this query while searching the index.
> > >
> > > The question is - How can we use the analyzer that was
> > used for indexing, if we want to construct Lucene queries
> > manually using Query classes (like BooleanQuery, TermQuery,
> > PhraseQuery, etc) instead of using QueryParser?
> > >
> > > Is there any way to achieve it?
> > >
> > > Regards,
> > > Rajesh
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Using analyzer while constructing Lucene queries

Posted by Jack Stahl <ja...@yelp.com>.

Hi Rajesh,

TermQueries (and likewise other queries) take a Term object, which in turn
takes a String.  That String should be the analyzed version ("play") of your
originaly query word ("playing").  To get that, you need to feed your
analyzer a Reader of the string you wish to parse ("playing").  It will
output a TokenStream containing (in this case 1) tokens.  You can then use
the Token's .term() to get the analyzed version ("play") of the original
query word.  Note that when you feed it a multi-word input, or if you an
Analyzer which breaks tokens strangely, you will get multiple tokens back,
and it is up to you to decide what to do with them.  But when using your
index's analyzer, then each Token should correspond to it's own TermQuery,
since the terms in your index are broken up the same way.

Jack

On Wed, Jan 14, 2009 at 10:32 AM, Rajesh parab <ra...@yahoo.com>wrote:

>
> Thanks Ian.
>
> I agree with you on lowercasing of characters. My main concern is specific
> to stemming done by analyzers.
>
> For example, StandardAnalyzer will stem words like playing, played, plays,
> etc. to a common tokan "play" which will be stored in the index. Now, during
> searches, we would need same stemming to be performed on search tokens so
> that we can use equals searches and get correct results back. In this
> example, the search term "playing" or "plays" may not return the document as
> it is indexed with token "play".
>
> What I am not really getting it how I can use the same analyzer during
> searches is I am constructing queries manually.
>
> Regards,
> Rajesh
>
> --- On Tue, 1/13/09, Ian Lea <ia...@gmail.com> wrote:
>
> > From: Ian Lea <ia...@gmail.com>
> > Subject: Re: Using analyzer while constructing Lucene queries
> > To: java-user@lucene.apache.org, rajesh_parab_1@yahoo.com
> > Date: Tuesday, January 13, 2009, 9:33 AM
> > If you are building queries manually, bypassing analysis,
> > you just
> > need to make sure that you know what you are doing.  As a
> > trivial
> > example, if you are indexing with an analyzer that
> > downcases
> > everything then you need to pass lowercase terms to
> > TermQuery.
> >
> > You can still use an analyzer where appropriate e.g. to
> > parse a string
> > into a Query that you add to a BooleanQuery.
> >
> >
> > --
> > Ian.
> >
> >
> > On Tue, Jan 13, 2009 at 1:43 AM, Rajesh parab
> > <ra...@yahoo.com> wrote:
> > > Hi,
> > >
> > > For proper results during searches, the recommendation
> > is to use same analyzer for indexing and querying. We can
> > achieve this by passing the same analyzer, which was used
> > for indexing, to QueryParser to construct Lucene query and
> > use this query while searching the index.
> > >
> > > The question is - How can we use the analyzer that was
> > used for indexing, if we want to construct Lucene queries
> > manually using Query classes (like BooleanQuery, TermQuery,
> > PhraseQuery, etc) instead of using QueryParser?
> > >
> > > Is there any way to achieve it?
> > >
> > > Regards,
> > > Rajesh
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Using analyzer while constructing Lucene queries

Posted by Rajesh parab <ra...@yahoo.com>.

Thanks Ian.

I agree with you on lowercasing of characters. My main concern is specific to stemming done by analyzers.

For example, StandardAnalyzer will stem words like playing, played, plays, etc. to a common tokan "play" which will be stored in the index. Now, during searches, we would need same stemming to be performed on search tokens so that we can use equals searches and get correct results back. In this example, the search term "playing" or "plays" may not return the document as it is indexed with token "play".

What I am not really getting it how I can use the same analyzer during searches is I am constructing queries manually.

Regards,
Rajesh

--- On Tue, 1/13/09, Ian Lea <ia...@gmail.com> wrote:

> From: Ian Lea <ia...@gmail.com>
> Subject: Re: Using analyzer while constructing Lucene queries
> To: java-user@lucene.apache.org, rajesh_parab_1@yahoo.com
> Date: Tuesday, January 13, 2009, 9:33 AM
> If you are building queries manually, bypassing analysis,
> you just
> need to make sure that you know what you are doing.  As a
> trivial
> example, if you are indexing with an analyzer that
> downcases
> everything then you need to pass lowercase terms to
> TermQuery.
> 
> You can still use an analyzer where appropriate e.g. to
> parse a string
> into a Query that you add to a BooleanQuery.
> 
> 
> --
> Ian.
> 
> 
> On Tue, Jan 13, 2009 at 1:43 AM, Rajesh parab
> <ra...@yahoo.com> wrote:
> > Hi,
> >
> > For proper results during searches, the recommendation
> is to use same analyzer for indexing and querying. We can
> achieve this by passing the same analyzer, which was used
> for indexing, to QueryParser to construct Lucene query and
> use this query while searching the index.
> >
> > The question is - How can we use the analyzer that was
> used for indexing, if we want to construct Lucene queries
> manually using Query classes (like BooleanQuery, TermQuery,
> PhraseQuery, etc) instead of using QueryParser?
> >
> > Is there any way to achieve it?
> >
> > Regards,
> > Rajesh


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Using analyzer while constructing Lucene queries

Posted by Ian Lea <ia...@gmail.com>.

If you are building queries manually, bypassing analysis, you just
need to make sure that you know what you are doing.  As a trivial
example, if you are indexing with an analyzer that downcases
everything then you need to pass lowercase terms to TermQuery.

You can still use an analyzer where appropriate e.g. to parse a string
into a Query that you add to a BooleanQuery.

--
Ian.

On Tue, Jan 13, 2009 at 1:43 AM, Rajesh parab <ra...@yahoo.com> wrote:
> Hi,
>
> For proper results during searches, the recommendation is to use same analyzer for indexing and querying. We can achieve this by passing the same analyzer, which was used for indexing, to QueryParser to construct Lucene query and use this query while searching the index.
>
> The question is - How can we use the analyzer that was used for indexing, if we want to construct Lucene queries manually using Query classes (like BooleanQuery, TermQuery, PhraseQuery, etc) instead of using QueryParser?
>
> Is there any way to achieve it?
>
> Regards,
> Rajesh

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org