You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christopher Condit <co...@sdsc.edu> on 2011/03/30 00:56:52 UTC

Best practice for stemming and exact matching

I have Lucene indexes build using a shingled, stemmed custom analyzer.
I have a new requirement that exact searches match correctly.
ie: bar AND "nachos"
will only fetch results with plural nachos. Right now, with the
stemming, singular nacho results are returned as well. I realize that
I'm going to have to create a separate field for this so I made a new
field suffixed with _exact and used a SimpleAnalyzer on it.

The problem is that after my QueryParser parses the query it's already
mucked up the PhraseQuery with the custom analyzer (stemming down to
nacho). I thought I could traverse the query tree and switch the field
and the query text, but if I do that I don't know which part of the
query the clause came from. Ideally I'd like to have the parser use the
custom analyzer for everything unless it's going to parse a clause into
a PhraseQuery or a MultiPhraseQuery, in which case it uses the
SimpleAnalyzer and looks in the _exact field - but I can't figure out
the best way to accomplish this.

Has anyone else encountered the same problem?
Is there a best practice for doing this - or something much easier that
I'm missing?

Thanks,
Chris



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Best practice for stemming and exact matching

Posted by Christopher Condit <co...@sdsc.edu>.
>> Ideally I'd like to have the parser use the
>> custom analyzer for everything unless it's going to parse a clause into
>> a PhraseQuery or a MultiPhraseQuery, in which case it uses the
>> SimpleAnalyzer and looks in the _exact field - but I can't figure out
>> the best way to accomplish this.
> 
> You might be interested in the example i added to the QP tests in
> https://issues.apache.org/jira/browse/LUCENE-2892,
> it does something similar to what you are doing.
> 
> your situation is probably even easier than my example actually:
> 
> @Override
> protected Query getFieldQuery(String field, String queryText, boolean
> quoted) throws ParseException {
>   if (quoted) {
>     field = field + "_exact"; // lie about the field to superclass if
> the user quoted it.
>   }
>   return super.getFieldQuery(field, queryText, quoted);
> }
> 
> if you are using perfieldanalyzerwrapper so that these "_exact" fields
> don't use stemming, then it should just work.
> you'll need branch_3.x or trunk for this, but looks like 3.1 is close

Using the 3.1 jars from Maven central the following test case shows
different behavior:
http://www.pastie.org/1744131

The output is:
getFieldQuerySlop: foo bar
a:"foo bar" b:"foo bar"
getFieldQueryQuoted: nacho
getFieldQuerySlop: foo bar
getFieldQuerySlop: chip
getFieldQueryQuoted: baz
+(a:nacho b:nacho) +(a:"foo bar" b:"foo bar") +(a:chip b:chip) +(a:baz
b:baz)

So it would seem that all of the quoted phrases are going through the
slop method (with slop of 0). Would it be safe to alter the field in
that method instead?

Thanks,
-Chris


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Best practice for stemming and exact matching

Posted by Robert Muir <rc...@gmail.com>.
On Tue, Mar 29, 2011 at 6:56 PM, Christopher Condit <co...@sdsc.edu> wrote:

> Ideally I'd like to have the parser use the
> custom analyzer for everything unless it's going to parse a clause into
> a PhraseQuery or a MultiPhraseQuery, in which case it uses the
> SimpleAnalyzer and looks in the _exact field - but I can't figure out
> the best way to accomplish this.

You might be interested in the example i added to the QP tests in
https://issues.apache.org/jira/browse/LUCENE-2892,
it does something similar to what you are doing.

your situation is probably even easier than my example actually:

@Override
protected Query getFieldQuery(String field, String queryText, boolean
quoted) throws ParseException {
  if (quoted) {
    field = field + "_exact"; // lie about the field to superclass if
the user quoted it.
  }
  return super.getFieldQuery(field, queryText, quoted);
}

if you are using perfieldanalyzerwrapper so that these "_exact" fields
don't use stemming, then it should just work.
you'll need branch_3.x or trunk for this, but looks like 3.1 is close

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org