You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by darniz <rn...@edmunds.com> on 2009/10/02 20:40:49 UTC

Question regarding synonym

Hi 
i have a question regarding synonymfilter
i have a one way mapping defined 
austin martin, astonmartin => aston martin

what baffling me is that if i give at query time the word austin martin 

it first goes through white space and generate two words in analysis page
"austin" and  "martin"

then after synonym filter it replace it with words
aston martin

Thats good and thats what i want but i am wodering sicne it went to white
space tokeniser first and split the word in to two different word "austin"
and "martin" how come it was able to map the entire synonym and replace it.
If i give only austin the after passing thruough synonym filter it does not
replace it with aston.
That leads me to conclude that even though "austin martin" went thru
whitespace tokenizer factory and got split into two the word ordering is
still preserved to find a synonym match.

Can anybody please explain if my observation is correct. This is a very
critical aspect for my work.

Thanks
darniz 
-- 
View this message in context: http://www.nabble.com/Question-regarding-synonym-tp25720572p25720572.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question regarding synonym

Posted by Christian Zambrano <cz...@gmail.com>.

You are correct.

I would recommend to only use the Synonym TokenFilter at index time 
unless you have a very good reason to do it at query time.

On 10/05/2009 11:46 AM, darniz wrote:
> yes that's what we decided to expand these terms while indexing.
> if we have
> bayrische motoren werke =>  bmw
>
> and i have a document which has bmw in it, searching for text:bayrische does
> not give me results. i have to give
> text:"bayrische motoren werke" then it actually takes the synonym and gets
> me the document.
>
> Now if i change the synonym mapping to
> bayrische motoren werke , bmw with expand parameter to true and also use
> this file at indexing.
>
> now at the  time i index this document along with "bmw" i also index the
> following words "bayrische" "motoren" "werke"
>
> any text query like text:motoren or text:bayrische will give me results now.
>
> Please correct me if my assumption is wrong.
>
> Thanks
> darniz
>
>
>
>
>
>
>
>
>
> Christian Zambrano wrote:
>    
>>
>>
>> On 10/02/2009 06:02 PM, darniz wrote:
>>      
>>> Thanks
>>> As i said it even works by giving double quotes too.
>>> like carDescription:"austin martin"
>>>
>>> So is that the conclusion that in order to map two word synonym i have to
>>> always enclose in double quotes, so that it doen not split the words
>>>
>>>
>>>
>>>
>>>        
>> Yes, but there are things you need to keep in mind.
>>
>>   From the solr wiki:
>>
>> Keep in mind that while the SynonymFilter will happily work with
>> *synonyms* containing multiple words (ie:
>> "sea biscuit, sea biscit, seabiscuit") The recommended approach for
>> dealing with *synonyms* like this, is to expand the synonym when
>> indexing. This is because there are two potential issues that can arrise
>> at query time:
>>
>>     1.
>>
>>        The Lucene QueryParser tokenizes on white space before giving any
>>        text to the Analyzer, so if a person searches for the words
>>        sea biscit the analyzer will be given the words "sea" and "biscit"
>>        seperately, and will not know that they match a synonym.
>>
>>     2.
>>
>>        Phrase searching (ie: "sea biscit") will cause the QueryParser to
>>        pass the entire string to the analyzer, but if the SynonymFilter
>>        is configured to expand the *synonyms*, then when the QueryParser
>>        gets the resulting list of tokens back from the Analyzer, it will
>>        construct a MultiPhraseQuery that will not have the desired
>>        effect. This is because of the limited mechanism available for the
>>        Analyzer to indicate that two terms occupy the same position:
>>        there is no way to indicate that a "phrase" occupies the same
>>        position as a term. For our example the resulting MultiPhraseQuery
>>        would be "(sea | sea | seabiscuit) (biscuit | biscit)" which would
>>        not match the simple case of "seabisuit" occuring in a document
>>
>>
>>      
>>>
>>>
>>>
>>>
>>>
>>>
>>> Christian Zambrano wrote:
>>>
>>>        
>>>> When you use a field qualifier(fieldName:valueToLookFor) it only applies
>>>> to the word right after the semicolon. If you look at the debug
>>>> infomation you will notice that for the second word it is using the
>>>> default field.
>>>>
>>>> <str name="parsedquery_toString">carDescription:austin
>>>> *text*:martin</str>
>>>>
>>>> the following should word:
>>>>
>>>> carDescription:(austin martin)
>>>>
>>>>
>>>> On 10/02/2009 05:46 PM, darniz wrote:
>>>>
>>>>          
>>>>> This is not working when i search documents i have a document which
>>>>> contains
>>>>> text aston martin
>>>>>
>>>>> when i search carDescription:"austin martin" i get a match but when i
>>>>> dont
>>>>> give double quotes
>>>>>
>>>>> like carDescription:austin martin
>>>>> there is no match
>>>>>
>>>>> in the analyser if i give austin martin with out quotes, when it passes
>>>>> through synonym filter it matches aston martin ,
>>>>> may be by default analyser treats it as a phrase "austin martin" but
>>>>> when
>>>>> i
>>>>> try to do a query by typing
>>>>> carDescription:austin martin i get 0 documents. the following is the
>>>>> debug
>>>>> node info with debugQuery=on
>>>>>
>>>>> <str name="rawquerystring">carDescription:austin martin</str>
>>>>> <str name="querystring">carDescription:austin martin</str>
>>>>> <str name="parsedquery">carDescription:austin text:martin</str>
>>>>> <str name="parsedquery_toString">carDescription:austin
>>>>> text:martin</str>
>>>>>
>>>>> dont know why it breaks the word, may be its a desired behaviour
>>>>> when i give carDescription:"austin martin" of course in this its able
>>>>> to
>>>>> map
>>>>> to synonym and i get the desired result
>>>>>
>>>>> Any opinion
>>>>>
>>>>> darniz
>>>>>
>>>>>
>>>>>
>>>>> Ensdorf Ken wrote:
>>>>>
>>>>>
>>>>>            
>>>>>>
>>>>>>              
>>>>>>> Hi
>>>>>>> i have a question regarding synonymfilter
>>>>>>> i have a one way mapping defined
>>>>>>> austin martin, astonmartin =>    aston martin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> ...
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Can anybody please explain if my observation is correct. This is a
>>>>>>> very
>>>>>>> critical aspect for my work.
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> That is correct - the synonym filter can recognize multi-token
>>>>>> synonyms
>>>>>> from consecutive tokens in a stream.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>
>>>>>            
>>>>
>>>>          
>>>
>>>        
>>
>>      
>

Re: Question regarding synonym

Posted by darniz <rn...@edmunds.com>.

yes that's what we decided to expand these terms while indexing.
if we have
bayrische motoren werke => bmw

and i have a document which has bmw in it, searching for text:bayrische does
not give me results. i have to give
text:"bayrische motoren werke" then it actually takes the synonym and gets
me the document.

Now if i change the synonym mapping to 
bayrische motoren werke , bmw with expand parameter to true and also use
this file at indexing.

now at the  time i index this document along with "bmw" i also index the
following words "bayrische" "motoren" "werke"

any text query like text:motoren or text:bayrische will give me results now.

Please correct me if my assumption is wrong.

Thanks
darniz









Christian Zambrano wrote:
> 
> 
> 
> On 10/02/2009 06:02 PM, darniz wrote:
>> Thanks
>> As i said it even works by giving double quotes too.
>> like carDescription:"austin martin"
>>
>> So is that the conclusion that in order to map two word synonym i have to
>> always enclose in double quotes, so that it doen not split the words
>>
>>
>>
>>    
> Yes, but there are things you need to keep in mind.
> 
>  From the solr wiki:
> 
> Keep in mind that while the SynonymFilter will happily work with 
> *synonyms* containing multiple words (ie: 
> "sea biscuit, sea biscit, seabiscuit") The recommended approach for 
> dealing with *synonyms* like this, is to expand the synonym when 
> indexing. This is because there are two potential issues that can arrise 
> at query time:
> 
>    1.
> 
>       The Lucene QueryParser tokenizes on white space before giving any
>       text to the Analyzer, so if a person searches for the words
>       sea biscit the analyzer will be given the words "sea" and "biscit"
>       seperately, and will not know that they match a synonym.
> 
>    2.
> 
>       Phrase searching (ie: "sea biscit") will cause the QueryParser to
>       pass the entire string to the analyzer, but if the SynonymFilter
>       is configured to expand the *synonyms*, then when the QueryParser
>       gets the resulting list of tokens back from the Analyzer, it will
>       construct a MultiPhraseQuery that will not have the desired
>       effect. This is because of the limited mechanism available for the
>       Analyzer to indicate that two terms occupy the same position:
>       there is no way to indicate that a "phrase" occupies the same
>       position as a term. For our example the resulting MultiPhraseQuery
>       would be "(sea | sea | seabiscuit) (biscuit | biscit)" which would
>       not match the simple case of "seabisuit" occuring in a document
> 
> 
>>
>>
>>
>>
>>
>>
>>
>> Christian Zambrano wrote:
>>    
>>> When you use a field qualifier(fieldName:valueToLookFor) it only applies
>>> to the word right after the semicolon. If you look at the debug
>>> infomation you will notice that for the second word it is using the
>>> default field.
>>>
>>> <str name="parsedquery_toString">carDescription:austin
>>> *text*:martin</str>
>>>
>>> the following should word:
>>>
>>> carDescription:(austin martin)
>>>
>>>
>>> On 10/02/2009 05:46 PM, darniz wrote:
>>>      
>>>> This is not working when i search documents i have a document which
>>>> contains
>>>> text aston martin
>>>>
>>>> when i search carDescription:"austin martin" i get a match but when i
>>>> dont
>>>> give double quotes
>>>>
>>>> like carDescription:austin martin
>>>> there is no match
>>>>
>>>> in the analyser if i give austin martin with out quotes, when it passes
>>>> through synonym filter it matches aston martin ,
>>>> may be by default analyser treats it as a phrase "austin martin" but
>>>> when
>>>> i
>>>> try to do a query by typing
>>>> carDescription:austin martin i get 0 documents. the following is the
>>>> debug
>>>> node info with debugQuery=on
>>>>
>>>> <str name="rawquerystring">carDescription:austin martin</str>
>>>> <str name="querystring">carDescription:austin martin</str>
>>>> <str name="parsedquery">carDescription:austin text:martin</str>
>>>> <str name="parsedquery_toString">carDescription:austin
>>>> text:martin</str>
>>>>
>>>> dont know why it breaks the word, may be its a desired behaviour
>>>> when i give carDescription:"austin martin" of course in this its able
>>>> to
>>>> map
>>>> to synonym and i get the desired result
>>>>
>>>> Any opinion
>>>>
>>>> darniz
>>>>
>>>>
>>>>
>>>> Ensdorf Ken wrote:
>>>>
>>>>        
>>>>>
>>>>>          
>>>>>> Hi
>>>>>> i have a question regarding synonymfilter
>>>>>> i have a one way mapping defined
>>>>>> austin martin, astonmartin =>   aston martin
>>>>>>
>>>>>>
>>>>>>            
>>>>> ...
>>>>>
>>>>>          
>>>>>> Can anybody please explain if my observation is correct. This is a
>>>>>> very
>>>>>> critical aspect for my work.
>>>>>>
>>>>>>            
>>>>> That is correct - the synonym filter can recognize multi-token
>>>>> synonyms
>>>>> from consecutive tokens in a stream.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>
>>>>        
>>>
>>>      
>>    
> 
> 

-- 
View this message in context: http://www.nabble.com/Question-regarding-synonym-tp25720572p25754288.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question regarding synonym

Posted by Christian Zambrano <cz...@gmail.com>.


On 10/02/2009 06:02 PM, darniz wrote:
> Thanks
> As i said it even works by giving double quotes too.
> like carDescription:"austin martin"
>
> So is that the conclusion that in order to map two word synonym i have to
> always enclose in double quotes, so that it doen not split the words
>
>
>
>    
Yes, but there are things you need to keep in mind.

 From the solr wiki:

Keep in mind that while the SynonymFilter will happily work with 
*synonyms* containing multiple words (ie: 
"sea biscuit, sea biscit, seabiscuit") The recommended approach for 
dealing with *synonyms* like this, is to expand the synonym when 
indexing. This is because there are two potential issues that can arrise 
at query time:

   1.

      The Lucene QueryParser tokenizes on white space before giving any
      text to the Analyzer, so if a person searches for the words
      sea biscit the analyzer will be given the words "sea" and "biscit"
      seperately, and will not know that they match a synonym.

   2.

      Phrase searching (ie: "sea biscit") will cause the QueryParser to
      pass the entire string to the analyzer, but if the SynonymFilter
      is configured to expand the *synonyms*, then when the QueryParser
      gets the resulting list of tokens back from the Analyzer, it will
      construct a MultiPhraseQuery that will not have the desired
      effect. This is because of the limited mechanism available for the
      Analyzer to indicate that two terms occupy the same position:
      there is no way to indicate that a "phrase" occupies the same
      position as a term. For our example the resulting MultiPhraseQuery
      would be "(sea | sea | seabiscuit) (biscuit | biscit)" which would
      not match the simple case of "seabisuit" occuring in a document


>
>
>
>
>
>
>
> Christian Zambrano wrote:
>    
>> When you use a field qualifier(fieldName:valueToLookFor) it only applies
>> to the word right after the semicolon. If you look at the debug
>> infomation you will notice that for the second word it is using the
>> default field.
>>
>> <str name="parsedquery_toString">carDescription:austin *text*:martin</str>
>>
>> the following should word:
>>
>> carDescription:(austin martin)
>>
>>
>> On 10/02/2009 05:46 PM, darniz wrote:
>>      
>>> This is not working when i search documents i have a document which
>>> contains
>>> text aston martin
>>>
>>> when i search carDescription:"austin martin" i get a match but when i
>>> dont
>>> give double quotes
>>>
>>> like carDescription:austin martin
>>> there is no match
>>>
>>> in the analyser if i give austin martin with out quotes, when it passes
>>> through synonym filter it matches aston martin ,
>>> may be by default analyser treats it as a phrase "austin martin" but when
>>> i
>>> try to do a query by typing
>>> carDescription:austin martin i get 0 documents. the following is the
>>> debug
>>> node info with debugQuery=on
>>>
>>> <str name="rawquerystring">carDescription:austin martin</str>
>>> <str name="querystring">carDescription:austin martin</str>
>>> <str name="parsedquery">carDescription:austin text:martin</str>
>>> <str name="parsedquery_toString">carDescription:austin text:martin</str>
>>>
>>> dont know why it breaks the word, may be its a desired behaviour
>>> when i give carDescription:"austin martin" of course in this its able to
>>> map
>>> to synonym and i get the desired result
>>>
>>> Any opinion
>>>
>>> darniz
>>>
>>>
>>>
>>> Ensdorf Ken wrote:
>>>
>>>        
>>>>
>>>>          
>>>>> Hi
>>>>> i have a question regarding synonymfilter
>>>>> i have a one way mapping defined
>>>>> austin martin, astonmartin =>   aston martin
>>>>>
>>>>>
>>>>>            
>>>> ...
>>>>
>>>>          
>>>>> Can anybody please explain if my observation is correct. This is a very
>>>>> critical aspect for my work.
>>>>>
>>>>>            
>>>> That is correct - the synonym filter can recognize multi-token synonyms
>>>> from consecutive tokens in a stream.
>>>>
>>>>
>>>>
>>>>
>>>>          
>>>
>>>        
>>
>>      
>

Re: Question regarding synonym

Posted by darniz <rn...@edmunds.com>.

Thanks 
As i said it even works by giving double quotes too.
like carDescription:"austin martin"

So is that the conclusion that in order to map two word synonym i have to
always enclose in double quotes, so that it doen not split the words











Christian Zambrano wrote:
> 
> When you use a field qualifier(fieldName:valueToLookFor) it only applies 
> to the word right after the semicolon. If you look at the debug 
> infomation you will notice that for the second word it is using the 
> default field.
> 
> <str name="parsedquery_toString">carDescription:austin *text*:martin</str>
> 
> the following should word:
> 
> carDescription:(austin martin)
> 
> 
> On 10/02/2009 05:46 PM, darniz wrote:
>> This is not working when i search documents i have a document which
>> contains
>> text aston martin
>>
>> when i search carDescription:"austin martin" i get a match but when i
>> dont
>> give double quotes
>>
>> like carDescription:austin martin
>> there is no match
>>
>> in the analyser if i give austin martin with out quotes, when it passes
>> through synonym filter it matches aston martin ,
>> may be by default analyser treats it as a phrase "austin martin" but when
>> i
>> try to do a query by typing
>> carDescription:austin martin i get 0 documents. the following is the
>> debug
>> node info with debugQuery=on
>>
>> <str name="rawquerystring">carDescription:austin martin</str>
>> <str name="querystring">carDescription:austin martin</str>
>> <str name="parsedquery">carDescription:austin text:martin</str>
>> <str name="parsedquery_toString">carDescription:austin text:martin</str>
>>
>> dont know why it breaks the word, may be its a desired behaviour
>> when i give carDescription:"austin martin" of course in this its able to
>> map
>> to synonym and i get the desired result
>>
>> Any opinion
>>
>> darniz
>>
>>
>>
>> Ensdorf Ken wrote:
>>    
>>>      
>>>> Hi
>>>> i have a question regarding synonymfilter
>>>> i have a one way mapping defined
>>>> austin martin, astonmartin =>  aston martin
>>>>
>>>>        
>>> ...
>>>      
>>>> Can anybody please explain if my observation is correct. This is a very
>>>> critical aspect for my work.
>>>>        
>>> That is correct - the synonym filter can recognize multi-token synonyms
>>> from consecutive tokens in a stream.
>>>
>>>
>>>
>>>      
>>    
> 
> 

-- 
View this message in context: http://www.nabble.com/Question-regarding-synonym-tp25720572p25723980.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question regarding synonym

Posted by Christian Zambrano <cz...@gmail.com>.

When you use a field qualifier(fieldName:valueToLookFor) it only applies 
to the word right after the semicolon. If you look at the debug 
infomation you will notice that for the second word it is using the 
default field.

<str name="parsedquery_toString">carDescription:austin *text*:martin</str>

the following should word:

carDescription:(austin martin)


On 10/02/2009 05:46 PM, darniz wrote:
> This is not working when i search documents i have a document which contains
> text aston martin
>
> when i search carDescription:"austin martin" i get a match but when i dont
> give double quotes
>
> like carDescription:austin martin
> there is no match
>
> in the analyser if i give austin martin with out quotes, when it passes
> through synonym filter it matches aston martin ,
> may be by default analyser treats it as a phrase "austin martin" but when i
> try to do a query by typing
> carDescription:austin martin i get 0 documents. the following is the debug
> node info with debugQuery=on
>
> <str name="rawquerystring">carDescription:austin martin</str>
> <str name="querystring">carDescription:austin martin</str>
> <str name="parsedquery">carDescription:austin text:martin</str>
> <str name="parsedquery_toString">carDescription:austin text:martin</str>
>
> dont know why it breaks the word, may be its a desired behaviour
> when i give carDescription:"austin martin" of course in this its able to map
> to synonym and i get the desired result
>
> Any opinion
>
> darniz
>
>
>
> Ensdorf Ken wrote:
>    
>>      
>>> Hi
>>> i have a question regarding synonymfilter
>>> i have a one way mapping defined
>>> austin martin, astonmartin =>  aston martin
>>>
>>>        
>> ...
>>      
>>> Can anybody please explain if my observation is correct. This is a very
>>> critical aspect for my work.
>>>        
>> That is correct - the synonym filter can recognize multi-token synonyms
>> from consecutive tokens in a stream.
>>
>>
>>
>>      
>

RE: Question regarding synonym

Posted by darniz <rn...@edmunds.com>.

This is not working when i search documents i have a document which contains
text aston martin

when i search carDescription:"austin martin" i get a match but when i dont
give double quotes

like carDescription:austin martin
there is no match

in the analyser if i give austin martin with out quotes, when it passes
through synonym filter it matches aston martin ,
may be by default analyser treats it as a phrase "austin martin" but when i
try to do a query by typing
carDescription:austin martin i get 0 documents. the following is the debug
node info with debugQuery=on

<str name="rawquerystring">carDescription:austin martin</str>
<str name="querystring">carDescription:austin martin</str>
<str name="parsedquery">carDescription:austin text:martin</str>
<str name="parsedquery_toString">carDescription:austin text:martin</str>

dont know why it breaks the word, may be its a desired behaviour 
when i give carDescription:"austin martin" of course in this its able to map
to synonym and i get the desired result

Any opinion

darniz



Ensdorf Ken wrote:
> 
>> Hi
>> i have a question regarding synonymfilter
>> i have a one way mapping defined
>> austin martin, astonmartin => aston martin
>> 
> ...
>> 
>> Can anybody please explain if my observation is correct. This is a very
>> critical aspect for my work.
> 
> That is correct - the synonym filter can recognize multi-token synonyms
> from consecutive tokens in a stream.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Question-regarding-synonym-tp25720572p25723829.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Question regarding synonym

Posted by Ensdorf Ken <En...@zoominfo.com>.

> Hi
> i have a question regarding synonymfilter
> i have a one way mapping defined
> austin martin, astonmartin => aston martin
> 
...
> 
> Can anybody please explain if my observation is correct. This is a very
> critical aspect for my work.

That is correct - the synonym filter can recognize multi-token synonyms from consecutive tokens in a stream.