You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by darniz <rn...@edmunds.com> on 2009/10/02 20:40:49 UTC
Question regarding synonym
Hi
i have a question regarding synonymfilter
i have a one way mapping defined
austin martin, astonmartin => aston martin
what baffling me is that if i give at query time the word austin martin
it first goes through white space and generate two words in analysis page
"austin" and "martin"
then after synonym filter it replace it with words
aston martin
Thats good and thats what i want but i am wodering sicne it went to white
space tokeniser first and split the word in to two different word "austin"
and "martin" how come it was able to map the entire synonym and replace it.
If i give only austin the after passing thruough synonym filter it does not
replace it with aston.
That leads me to conclude that even though "austin martin" went thru
whitespace tokenizer factory and got split into two the word ordering is
still preserved to find a synonym match.
Can anybody please explain if my observation is correct. This is a very
critical aspect for my work.
Thanks
darniz
--
View this message in context: http://www.nabble.com/Question-regarding-synonym-tp25720572p25720572.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question regarding synonym
Posted by Christian Zambrano <cz...@gmail.com>.
You are correct.
I would recommend to only use the Synonym TokenFilter at index time
unless you have a very good reason to do it at query time.
On 10/05/2009 11:46 AM, darniz wrote:
> yes that's what we decided to expand these terms while indexing.
> if we have
> bayrische motoren werke => bmw
>
> and i have a document which has bmw in it, searching for text:bayrische does
> not give me results. i have to give
> text:"bayrische motoren werke" then it actually takes the synonym and gets
> me the document.
>
> Now if i change the synonym mapping to
> bayrische motoren werke , bmw with expand parameter to true and also use
> this file at indexing.
>
> now at the time i index this document along with "bmw" i also index the
> following words "bayrische" "motoren" "werke"
>
> any text query like text:motoren or text:bayrische will give me results now.
>
> Please correct me if my assumption is wrong.
>
> Thanks
> darniz
>
>
>
>
>
>
>
>
>
> Christian Zambrano wrote:
>
>>
>>
>> On 10/02/2009 06:02 PM, darniz wrote:
>>
>>> Thanks
>>> As i said it even works by giving double quotes too.
>>> like carDescription:"austin martin"
>>>
>>> So is that the conclusion that in order to map two word synonym i have to
>>> always enclose in double quotes, so that it doen not split the words
>>>
>>>
>>>
>>>
>>>
>> Yes, but there are things you need to keep in mind.
>>
>> From the solr wiki:
>>
>> Keep in mind that while the SynonymFilter will happily work with
>> *synonyms* containing multiple words (ie:
>> "sea biscuit, sea biscit, seabiscuit") The recommended approach for
>> dealing with *synonyms* like this, is to expand the synonym when
>> indexing. This is because there are two potential issues that can arrise
>> at query time:
>>
>> 1.
>>
>> The Lucene QueryParser tokenizes on white space before giving any
>> text to the Analyzer, so if a person searches for the words
>> sea biscit the analyzer will be given the words "sea" and "biscit"
>> seperately, and will not know that they match a synonym.
>>
>> 2.
>>
>> Phrase searching (ie: "sea biscit") will cause the QueryParser to
>> pass the entire string to the analyzer, but if the SynonymFilter
>> is configured to expand the *synonyms*, then when the QueryParser
>> gets the resulting list of tokens back from the Analyzer, it will
>> construct a MultiPhraseQuery that will not have the desired
>> effect. This is because of the limited mechanism available for the
>> Analyzer to indicate that two terms occupy the same position:
>> there is no way to indicate that a "phrase" occupies the same
>> position as a term. For our example the resulting MultiPhraseQuery
>> would be "(sea | sea | seabiscuit) (biscuit | biscit)" which would
>> not match the simple case of "seabisuit" occuring in a document
>>
>>
>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Christian Zambrano wrote:
>>>
>>>
>>>> When you use a field qualifier(fieldName:valueToLookFor) it only applies
>>>> to the word right after the semicolon. If you look at the debug
>>>> infomation you will notice that for the second word it is using the
>>>> default field.
>>>>
>>>> <str name="parsedquery_toString">carDescription:austin
>>>> *text*:martin</str>
>>>>
>>>> the following should word:
>>>>
>>>> carDescription:(austin martin)
>>>>
>>>>
>>>> On 10/02/2009 05:46 PM, darniz wrote:
>>>>
>>>>
>>>>> This is not working when i search documents i have a document which
>>>>> contains
>>>>> text aston martin
>>>>>
>>>>> when i search carDescription:"austin martin" i get a match but when i
>>>>> dont
>>>>> give double quotes
>>>>>
>>>>> like carDescription:austin martin
>>>>> there is no match
>>>>>
>>>>> in the analyser if i give austin martin with out quotes, when it passes
>>>>> through synonym filter it matches aston martin ,
>>>>> may be by default analyser treats it as a phrase "austin martin" but
>>>>> when
>>>>> i
>>>>> try to do a query by typing
>>>>> carDescription:austin martin i get 0 documents. the following is the
>>>>> debug
>>>>> node info with debugQuery=on
>>>>>
>>>>> <str name="rawquerystring">carDescription:austin martin</str>
>>>>> <str name="querystring">carDescription:austin martin</str>
>>>>> <str name="parsedquery">carDescription:austin text:martin</str>
>>>>> <str name="parsedquery_toString">carDescription:austin
>>>>> text:martin</str>
>>>>>
>>>>> dont know why it breaks the word, may be its a desired behaviour
>>>>> when i give carDescription:"austin martin" of course in this its able
>>>>> to
>>>>> map
>>>>> to synonym and i get the desired result
>>>>>
>>>>> Any opinion
>>>>>
>>>>> darniz
>>>>>
>>>>>
>>>>>
>>>>> Ensdorf Ken wrote:
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> Hi
>>>>>>> i have a question regarding synonymfilter
>>>>>>> i have a one way mapping defined
>>>>>>> austin martin, astonmartin => aston martin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ...
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Can anybody please explain if my observation is correct. This is a
>>>>>>> very
>>>>>>> critical aspect for my work.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> That is correct - the synonym filter can recognize multi-token
>>>>>> synonyms
>>>>>> from consecutive tokens in a stream.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
Re: Question regarding synonym
Posted by darniz <rn...@edmunds.com>.
yes that's what we decided to expand these terms while indexing.
if we have
bayrische motoren werke => bmw
and i have a document which has bmw in it, searching for text:bayrische does
not give me results. i have to give
text:"bayrische motoren werke" then it actually takes the synonym and gets
me the document.
Now if i change the synonym mapping to
bayrische motoren werke , bmw with expand parameter to true and also use
this file at indexing.
now at the time i index this document along with "bmw" i also index the
following words "bayrische" "motoren" "werke"
any text query like text:motoren or text:bayrische will give me results now.
Please correct me if my assumption is wrong.
Thanks
darniz
Christian Zambrano wrote:
>
>
>
> On 10/02/2009 06:02 PM, darniz wrote:
>> Thanks
>> As i said it even works by giving double quotes too.
>> like carDescription:"austin martin"
>>
>> So is that the conclusion that in order to map two word synonym i have to
>> always enclose in double quotes, so that it doen not split the words
>>
>>
>>
>>
> Yes, but there are things you need to keep in mind.
>
> From the solr wiki:
>
> Keep in mind that while the SynonymFilter will happily work with
> *synonyms* containing multiple words (ie:
> "sea biscuit, sea biscit, seabiscuit") The recommended approach for
> dealing with *synonyms* like this, is to expand the synonym when
> indexing. This is because there are two potential issues that can arrise
> at query time:
>
> 1.
>
> The Lucene QueryParser tokenizes on white space before giving any
> text to the Analyzer, so if a person searches for the words
> sea biscit the analyzer will be given the words "sea" and "biscit"
> seperately, and will not know that they match a synonym.
>
> 2.
>
> Phrase searching (ie: "sea biscit") will cause the QueryParser to
> pass the entire string to the analyzer, but if the SynonymFilter
> is configured to expand the *synonyms*, then when the QueryParser
> gets the resulting list of tokens back from the Analyzer, it will
> construct a MultiPhraseQuery that will not have the desired
> effect. This is because of the limited mechanism available for the
> Analyzer to indicate that two terms occupy the same position:
> there is no way to indicate that a "phrase" occupies the same
> position as a term. For our example the resulting MultiPhraseQuery
> would be "(sea | sea | seabiscuit) (biscuit | biscit)" which would
> not match the simple case of "seabisuit" occuring in a document
>
>
>>
>>
>>
>>
>>
>>
>>
>> Christian Zambrano wrote:
>>
>>> When you use a field qualifier(fieldName:valueToLookFor) it only applies
>>> to the word right after the semicolon. If you look at the debug
>>> infomation you will notice that for the second word it is using the
>>> default field.
>>>
>>> <str name="parsedquery_toString">carDescription:austin
>>> *text*:martin</str>
>>>
>>> the following should word:
>>>
>>> carDescription:(austin martin)
>>>
>>>
>>> On 10/02/2009 05:46 PM, darniz wrote:
>>>
>>>> This is not working when i search documents i have a document which
>>>> contains
>>>> text aston martin
>>>>
>>>> when i search carDescription:"austin martin" i get a match but when i
>>>> dont
>>>> give double quotes
>>>>
>>>> like carDescription:austin martin
>>>> there is no match
>>>>
>>>> in the analyser if i give austin martin with out quotes, when it passes
>>>> through synonym filter it matches aston martin ,
>>>> may be by default analyser treats it as a phrase "austin martin" but
>>>> when
>>>> i
>>>> try to do a query by typing
>>>> carDescription:austin martin i get 0 documents. the following is the
>>>> debug
>>>> node info with debugQuery=on
>>>>
>>>> <str name="rawquerystring">carDescription:austin martin</str>
>>>> <str name="querystring">carDescription:austin martin</str>
>>>> <str name="parsedquery">carDescription:austin text:martin</str>
>>>> <str name="parsedquery_toString">carDescription:austin
>>>> text:martin</str>
>>>>
>>>> dont know why it breaks the word, may be its a desired behaviour
>>>> when i give carDescription:"austin martin" of course in this its able
>>>> to
>>>> map
>>>> to synonym and i get the desired result
>>>>
>>>> Any opinion
>>>>
>>>> darniz
>>>>
>>>>
>>>>
>>>> Ensdorf Ken wrote:
>>>>
>>>>
>>>>>
>>>>>
>>>>>> Hi
>>>>>> i have a question regarding synonymfilter
>>>>>> i have a one way mapping defined
>>>>>> austin martin, astonmartin => aston martin
>>>>>>
>>>>>>
>>>>>>
>>>>> ...
>>>>>
>>>>>
>>>>>> Can anybody please explain if my observation is correct. This is a
>>>>>> very
>>>>>> critical aspect for my work.
>>>>>>
>>>>>>
>>>>> That is correct - the synonym filter can recognize multi-token
>>>>> synonyms
>>>>> from consecutive tokens in a stream.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
--
View this message in context: http://www.nabble.com/Question-regarding-synonym-tp25720572p25754288.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question regarding synonym
Posted by Christian Zambrano <cz...@gmail.com>.
On 10/02/2009 06:02 PM, darniz wrote:
> Thanks
> As i said it even works by giving double quotes too.
> like carDescription:"austin martin"
>
> So is that the conclusion that in order to map two word synonym i have to
> always enclose in double quotes, so that it doen not split the words
>
>
>
>
Yes, but there are things you need to keep in mind.
From the solr wiki:
Keep in mind that while the SynonymFilter will happily work with
*synonyms* containing multiple words (ie:
"sea biscuit, sea biscit, seabiscuit") The recommended approach for
dealing with *synonyms* like this, is to expand the synonym when
indexing. This is because there are two potential issues that can arrise
at query time:
1.
The Lucene QueryParser tokenizes on white space before giving any
text to the Analyzer, so if a person searches for the words
sea biscit the analyzer will be given the words "sea" and "biscit"
seperately, and will not know that they match a synonym.
2.
Phrase searching (ie: "sea biscit") will cause the QueryParser to
pass the entire string to the analyzer, but if the SynonymFilter
is configured to expand the *synonyms*, then when the QueryParser
gets the resulting list of tokens back from the Analyzer, it will
construct a MultiPhraseQuery that will not have the desired
effect. This is because of the limited mechanism available for the
Analyzer to indicate that two terms occupy the same position:
there is no way to indicate that a "phrase" occupies the same
position as a term. For our example the resulting MultiPhraseQuery
would be "(sea | sea | seabiscuit) (biscuit | biscit)" which would
not match the simple case of "seabisuit" occuring in a document
>
>
>
>
>
>
>
> Christian Zambrano wrote:
>
>> When you use a field qualifier(fieldName:valueToLookFor) it only applies
>> to the word right after the semicolon. If you look at the debug
>> infomation you will notice that for the second word it is using the
>> default field.
>>
>> <str name="parsedquery_toString">carDescription:austin *text*:martin</str>
>>
>> the following should word:
>>
>> carDescription:(austin martin)
>>
>>
>> On 10/02/2009 05:46 PM, darniz wrote:
>>
>>> This is not working when i search documents i have a document which
>>> contains
>>> text aston martin
>>>
>>> when i search carDescription:"austin martin" i get a match but when i
>>> dont
>>> give double quotes
>>>
>>> like carDescription:austin martin
>>> there is no match
>>>
>>> in the analyser if i give austin martin with out quotes, when it passes
>>> through synonym filter it matches aston martin ,
>>> may be by default analyser treats it as a phrase "austin martin" but when
>>> i
>>> try to do a query by typing
>>> carDescription:austin martin i get 0 documents. the following is the
>>> debug
>>> node info with debugQuery=on
>>>
>>> <str name="rawquerystring">carDescription:austin martin</str>
>>> <str name="querystring">carDescription:austin martin</str>
>>> <str name="parsedquery">carDescription:austin text:martin</str>
>>> <str name="parsedquery_toString">carDescription:austin text:martin</str>
>>>
>>> dont know why it breaks the word, may be its a desired behaviour
>>> when i give carDescription:"austin martin" of course in this its able to
>>> map
>>> to synonym and i get the desired result
>>>
>>> Any opinion
>>>
>>> darniz
>>>
>>>
>>>
>>> Ensdorf Ken wrote:
>>>
>>>
>>>>
>>>>
>>>>> Hi
>>>>> i have a question regarding synonymfilter
>>>>> i have a one way mapping defined
>>>>> austin martin, astonmartin => aston martin
>>>>>
>>>>>
>>>>>
>>>> ...
>>>>
>>>>
>>>>> Can anybody please explain if my observation is correct. This is a very
>>>>> critical aspect for my work.
>>>>>
>>>>>
>>>> That is correct - the synonym filter can recognize multi-token synonyms
>>>> from consecutive tokens in a stream.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
Re: Question regarding synonym
Posted by darniz <rn...@edmunds.com>.
Thanks
As i said it even works by giving double quotes too.
like carDescription:"austin martin"
So is that the conclusion that in order to map two word synonym i have to
always enclose in double quotes, so that it doen not split the words
Christian Zambrano wrote:
>
> When you use a field qualifier(fieldName:valueToLookFor) it only applies
> to the word right after the semicolon. If you look at the debug
> infomation you will notice that for the second word it is using the
> default field.
>
> <str name="parsedquery_toString">carDescription:austin *text*:martin</str>
>
> the following should word:
>
> carDescription:(austin martin)
>
>
> On 10/02/2009 05:46 PM, darniz wrote:
>> This is not working when i search documents i have a document which
>> contains
>> text aston martin
>>
>> when i search carDescription:"austin martin" i get a match but when i
>> dont
>> give double quotes
>>
>> like carDescription:austin martin
>> there is no match
>>
>> in the analyser if i give austin martin with out quotes, when it passes
>> through synonym filter it matches aston martin ,
>> may be by default analyser treats it as a phrase "austin martin" but when
>> i
>> try to do a query by typing
>> carDescription:austin martin i get 0 documents. the following is the
>> debug
>> node info with debugQuery=on
>>
>> <str name="rawquerystring">carDescription:austin martin</str>
>> <str name="querystring">carDescription:austin martin</str>
>> <str name="parsedquery">carDescription:austin text:martin</str>
>> <str name="parsedquery_toString">carDescription:austin text:martin</str>
>>
>> dont know why it breaks the word, may be its a desired behaviour
>> when i give carDescription:"austin martin" of course in this its able to
>> map
>> to synonym and i get the desired result
>>
>> Any opinion
>>
>> darniz
>>
>>
>>
>> Ensdorf Ken wrote:
>>
>>>
>>>> Hi
>>>> i have a question regarding synonymfilter
>>>> i have a one way mapping defined
>>>> austin martin, astonmartin => aston martin
>>>>
>>>>
>>> ...
>>>
>>>> Can anybody please explain if my observation is correct. This is a very
>>>> critical aspect for my work.
>>>>
>>> That is correct - the synonym filter can recognize multi-token synonyms
>>> from consecutive tokens in a stream.
>>>
>>>
>>>
>>>
>>
>
>
--
View this message in context: http://www.nabble.com/Question-regarding-synonym-tp25720572p25723980.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question regarding synonym
Posted by Christian Zambrano <cz...@gmail.com>.
When you use a field qualifier(fieldName:valueToLookFor) it only applies
to the word right after the semicolon. If you look at the debug
infomation you will notice that for the second word it is using the
default field.
<str name="parsedquery_toString">carDescription:austin *text*:martin</str>
the following should word:
carDescription:(austin martin)
On 10/02/2009 05:46 PM, darniz wrote:
> This is not working when i search documents i have a document which contains
> text aston martin
>
> when i search carDescription:"austin martin" i get a match but when i dont
> give double quotes
>
> like carDescription:austin martin
> there is no match
>
> in the analyser if i give austin martin with out quotes, when it passes
> through synonym filter it matches aston martin ,
> may be by default analyser treats it as a phrase "austin martin" but when i
> try to do a query by typing
> carDescription:austin martin i get 0 documents. the following is the debug
> node info with debugQuery=on
>
> <str name="rawquerystring">carDescription:austin martin</str>
> <str name="querystring">carDescription:austin martin</str>
> <str name="parsedquery">carDescription:austin text:martin</str>
> <str name="parsedquery_toString">carDescription:austin text:martin</str>
>
> dont know why it breaks the word, may be its a desired behaviour
> when i give carDescription:"austin martin" of course in this its able to map
> to synonym and i get the desired result
>
> Any opinion
>
> darniz
>
>
>
> Ensdorf Ken wrote:
>
>>
>>> Hi
>>> i have a question regarding synonymfilter
>>> i have a one way mapping defined
>>> austin martin, astonmartin => aston martin
>>>
>>>
>> ...
>>
>>> Can anybody please explain if my observation is correct. This is a very
>>> critical aspect for my work.
>>>
>> That is correct - the synonym filter can recognize multi-token synonyms
>> from consecutive tokens in a stream.
>>
>>
>>
>>
>
RE: Question regarding synonym
Posted by darniz <rn...@edmunds.com>.
This is not working when i search documents i have a document which contains
text aston martin
when i search carDescription:"austin martin" i get a match but when i dont
give double quotes
like carDescription:austin martin
there is no match
in the analyser if i give austin martin with out quotes, when it passes
through synonym filter it matches aston martin ,
may be by default analyser treats it as a phrase "austin martin" but when i
try to do a query by typing
carDescription:austin martin i get 0 documents. the following is the debug
node info with debugQuery=on
<str name="rawquerystring">carDescription:austin martin</str>
<str name="querystring">carDescription:austin martin</str>
<str name="parsedquery">carDescription:austin text:martin</str>
<str name="parsedquery_toString">carDescription:austin text:martin</str>
dont know why it breaks the word, may be its a desired behaviour
when i give carDescription:"austin martin" of course in this its able to map
to synonym and i get the desired result
Any opinion
darniz
Ensdorf Ken wrote:
>
>> Hi
>> i have a question regarding synonymfilter
>> i have a one way mapping defined
>> austin martin, astonmartin => aston martin
>>
> ...
>>
>> Can anybody please explain if my observation is correct. This is a very
>> critical aspect for my work.
>
> That is correct - the synonym filter can recognize multi-token synonyms
> from consecutive tokens in a stream.
>
>
>
--
View this message in context: http://www.nabble.com/Question-regarding-synonym-tp25720572p25723829.html
Sent from the Solr - User mailing list archive at Nabble.com.
RE: Question regarding synonym
Posted by Ensdorf Ken <En...@zoominfo.com>.
> Hi
> i have a question regarding synonymfilter
> i have a one way mapping defined
> austin martin, astonmartin => aston martin
>
...
>
> Can anybody please explain if my observation is correct. This is a very
> critical aspect for my work.
That is correct - the synonym filter can recognize multi-token synonyms from consecutive tokens in a stream.