You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Dmitry Goldenberg <dm...@weblayers.com> on 2005/12/27 19:45:19 UTC

Proximity searches and Porter stemming - ??

Hello,
 
I tried using Porter stemming in our application and it worked great except it broke the proximity searches.  Is there any way at all that these two pieces of functionality could coexist peacefully?
 
I do not see any reason why they should not.  It seems to me that proximity query terms should be stemmed by the engine, then the query executed.  Personally, I would not care much if the following two proximity queries would bring back the same results:
 
"character encoding"~3
and
"characters encode"~3
 
I'd much rather they both returned the same results than no results at all, the latter being the case I've observed.
 
Any recommendations?
Thanks,
- Dmitry

RE: Proximity searches and Porter stemming - ??

Posted by Dmitry Goldenberg <dm...@weblayers.com>.

Erik, thanks for the quick response.  I believe we had an analyzer discrepancy.  I just retested and the queries work.

I see "character encode eliminate"~3 and "character encoding eliminates"~3 returning the same results.
Thanks!

________________________________

From: Erik Hatcher [mailto:erik@ehatchersolutions.com]
Sent: Tue 12/27/2005 10:56 AM
To: java-user@lucene.apache.org
Subject: Re: Proximity searches and Porter stemming - ??

On Dec 27, 2005, at 1:45 PM, Dmitry Goldenberg wrote:
> I tried using Porter stemming in our application and it worked 
> great except it broke the proximity searches.  Is there any way at 
> all that these two pieces of functionality could coexist peacefully?
>
> I do not see any reason why they should not.  It seems to me that 
> proximity query terms should be stemmed by the engine, then the 
> query executed.  Personally, I would not care much if the following 
> two proximity queries would bring back the same results:
>
> "character encoding"~3
> and
> "characters encode"~3
>
> I'd much rather they both returned the same results than no results 
> at all, the latter being the case I've observed.
>
> Any recommendations?

Are you using the same Analyzer with QueryParser as you are with 
IndexWriter?  What does the generated Query.toString yield?

This absolutely works provided the analyzers jive between indexing 
and searching.

        Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Proximity searches and Porter stemming - ??

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Dec 27, 2005, at 1:45 PM, Dmitry Goldenberg wrote:
> I tried using Porter stemming in our application and it worked  
> great except it broke the proximity searches.  Is there any way at  
> all that these two pieces of functionality could coexist peacefully?
>
> I do not see any reason why they should not.  It seems to me that  
> proximity query terms should be stemmed by the engine, then the  
> query executed.  Personally, I would not care much if the following  
> two proximity queries would bring back the same results:
>
> "character encoding"~3
> and
> "characters encode"~3
>
> I'd much rather they both returned the same results than no results  
> at all, the latter being the case I've observed.
>
> Any recommendations?

Are you using the same Analyzer with QueryParser as you are with  
IndexWriter?  What does the generated Query.toString yield?

This absolutely works provided the analyzers jive between indexing  
and searching.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org