You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by anuvenk <an...@hotmail.com> on 2007/12/11 21:53:35 UTC

Need an explanation for this synonym behaviour

Hi,

  I have some trouble with my synonyms..
I just added this synonym:
specialized license plate,personalized license plate

I tried the search term:
florida specialized license plate

The results i hoped for:
I have a personalized license form for the state florida in my index. I was
hoping that would come up first.

The results i got:
All specialized license plate forms from all other states in the top 50
I see the florida personalized license plate form only if i choose the state
florida filter.

Here is the parsedquery_tostring
<str name="parsedquery_toString">
+(((text:florida^0.8 | name:florida^2.0)~0.01 (text:special^0.8 |
name:special^2.0)~0.01 (text:licens^0.8 | name:licens^2.0)~0.01
(text:plate^0.8 | name:plate^2.0)~0.01)~3) (text:"florida (special person)
licens plate"~50^0.8 | name:"florida special licens plate"~50^2.0)~0.01
</str>


Also a search for 'specialized license plate' doesn't bring up personalized
licensed plate results at all.
Here is the parsedquery_tostring
<str name="parsedquery_toString">
+(((text:special^0.8 | name:special^2.0)~0.01 (text:licens^0.8 |
name:licens^2.0)~0.01 (text:plate^0.8 | name:plate^2.0)~0.01)~3)
(text:"(special person) licens plate"~50^0.8 | name:"special licens
plate"~50^2.0)~0.01
</str>

Why this behaviour. How should i modify the synonym mapping in order to get
the expected results?

I passing the synonym filter only at query time and have expand=true

I've been stuck with this issue for quite sometime..Help please !!!!
-- 
View this message in context: http://www.nabble.com/Need-an-explanation-for-this-synonym-behaviour-tp14283175p14283175.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: Need an explanation for this synonym behaviour

Posted by anuvenk <an...@hotmail.com>.
1) Did you mean i have to re-post this question at another location.? 
If so, could you provide me with the link?

2) I did read
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter
But i still wasn't clear (sorry!!! Need some spoon feeding) so was hoping if
someone could give a better explanation or suggestions as to how to use the
synonym filter in my case based on the examples i outlined below.

Thanks.


hossman wrote:
> 
> 
> 1) this question is really more suited for the solr-user list (since it is 
> a question about how to use a feature of solr)
> 
> ...
> : I passing the synonym filter only at query time and have expand=true
> ...
> 
> 2) please note the caveat about multiword synonyms on the wiki...
> 
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter
> 
>>> Keep in mind that while the SynonymFilter will happily work with 
>>> synonyms containing multiple words (ie: "sea biscuit, sea biscit,  
>>> seabiscuit") The recommended approach for dealing with synonyms like  
>>> this, is to expand the synonym when indexing. This is because there are 
>>> two potential issues that can arrise at query time:
>>>
>>>    1. The Lucene QueryParser tokenizes on white space before giving any
>>> text to the Analyzer, so if a person searches for the words sea biscit 
>>> the analyzer will be given the words "sea" and "biscit" seperately, and 
>>> will not know that they match a synonym.
>>>
>>>    2. Phrase searching (ie: "sea biscit") will cause the QueryParser to
>>> pass the entire string to the analyzer, but if the SynonymFilter is
>>> configured to expand the synonyms, then when the QueryParser gets the
>>> resulting list of tokens back from the Analyzer, it will construct a
>>> MultiPhraseQuery that will not have the desired effect. This is because 
>>> of the limited mechanism available for the Analyzer to indicate that 
>>> two terms occupy the same position: there is no way to indicate that a 
>>> "phrase" occupies the same position as a term. For our example the
>>> resulting MultiPhraseQuery would be "(sea | sea | seabiscuit) (biscuit 
>>> | biscit)" which would not match the simple case of "seabisuit" 
>>> occuring in a document
> 
> 
> 
> -Hoss
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Need-an-explanation-for-this-synonym-behaviour-tp14283175p14357425.html
Sent from the Solr - Dev mailing list archive at Nabble.com.


Re: Need an explanation for this synonym behaviour

Posted by Chris Hostetter <ho...@fucit.org>.
1) this question is really more suited for the solr-user list (since it is 
a question about how to use a feature of solr)

...
: I passing the synonym filter only at query time and have expand=true
...

2) please note the caveat about multiword synonyms on the wiki...

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter

>> Keep in mind that while the SynonymFilter will happily work with 
>> synonyms containing multiple words (ie: "sea biscuit, sea biscit,  
>> seabiscuit") The recommended approach for dealing with synonyms like  
>> this, is to expand the synonym when indexing. This is because there are 
>> two potential issues that can arrise at query time:
>>
>>    1. The Lucene QueryParser tokenizes on white space before giving any
>> text to the Analyzer, so if a person searches for the words sea biscit 
>> the analyzer will be given the words "sea" and "biscit" seperately, and 
>> will not know that they match a synonym.
>>
>>    2. Phrase searching (ie: "sea biscit") will cause the QueryParser to
>> pass the entire string to the analyzer, but if the SynonymFilter is
>> configured to expand the synonyms, then when the QueryParser gets the
>> resulting list of tokens back from the Analyzer, it will construct a
>> MultiPhraseQuery that will not have the desired effect. This is because 
>> of the limited mechanism available for the Analyzer to indicate that 
>> two terms occupy the same position: there is no way to indicate that a 
>> "phrase" occupies the same position as a term. For our example the
>> resulting MultiPhraseQuery would be "(sea | sea | seabiscuit) (biscuit 
>> | biscit)" which would not match the simple case of "seabisuit" 
>> occuring in a document



-Hoss