You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by anuvenk <an...@hotmail.com> on 2009/06/04 00:44:03 UTC

Re: synonyms

I happened to revisit this post that I had started long time back. I'm still
using the same query time synonyms. Now i want to be able to map cities to
states in the synonyms and continuing to have this issue with the multi-word
synonyms. Could you please explain what you've done to overcome this issue
again please. I didn't quite understand what HIER_FAMILIY_01, SYN_FAMILY_01
are. Thanks.

lorenzo zhak wrote:
> 
> Hi,
> 
> I had to work with this kind of sides effects reguarding multiwords
> synonyms.
> We installed solr on our project that extensively uses synonyms, a big
> list that sometimes could bring out some wrong match as the one
> noticed by Anuvenk
> for instance
> 
>> dui => drunk driving defense
>>  or
>> dui,drunk driving defense,drunk driving law
>> query for "dui" matches "dui => drunk driving defense" and "dui,drunk
>> driving defense,drunk driving law"
> 
> in order to prevent this kind of behavior I gave for every "synonyms
> family" (saying a single line in the file) a unique identifier,
> so the list looks like :
> 
> dui => HIER_FAMILIY_01
> drunk driving defense => HIER_FAMILIY_01
> SYN_FAMILY_01, dui,drunk driving defense,drunk driving law
> 
> I also set the synonyms filter at index time with expand=false, and at
> query time with expand=false
> 
> so in this way, the matched synonyms (multi words or single words) in
> documents are replaced with their family identifier, and not all the
> possibilities. Indexing with expand=true will add words in documents
> that could be matched alone, ignoring the fact that they belong to
> multiwords expression, and this could end up with a wrong match
> (intending syns mix) at query time.
> 
> so in this way a query for "dui", will be changed by the synonym
> filter at query time with "HIER_FAMILIY_01" or "SYN_FAMILY_01" so
> documents that contains only single words like "drunk", "driving" or
> "law" will not be matched since only a document with the phrase "drunk
> driving law" would have been indexed with "SYN_FAMILY_01".
> 
> The approach worked pretty good on our project and we do not notice
> any sides effects on the searches, it only removes matched documents
> that were considered as "noise" of the synonyms mix issue.
> 
> I think this could be usefull to add this kind of approach on the solr
> synoyms filter section of the wiki,
> 
> Cheers
> 
> Laurent
> 
> 
> On Dec 2, 2007 3:41 PM, Otis Gospodnetic <ot...@yahoo.com>
> wrote:
>> Hi (changing to solr-user list)
>>
>> Yes it is, especially if the terms left of => are multi-spaced.  Check
>> out the Wiki, one page there explains this nicely.
>>
>> Otis
>> -
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>> ----- Original Message ----
>> From: anuvenk <an...@hotmail.com>
>> To: solr-dev@lucene.apache.org
>> Sent: Saturday, December 1, 2007 1:21:49 AM
>> Subject: Re: synonyms
>>
>>
>> Ideally, would it be a good idea to pass the index data through the
>>  synonyms
>> filter while indexing?
>> Also,
>> say i have this mapping
>> dui => drunk driving defense
>>  or
>> dui,drunk driving defense,drunk driving law
>>
>> so matches for dui, will also bring up matches for drunk driving law
>>  (the
>> whole phrase) or does it also bring up all matches for 'drunk' ,
>> 'driving','law'  ?
>>
>>
>>
>> Yonik Seeley wrote:
>> >
>> > On Nov 30, 2007 5:39 PM, anuvenk <an...@hotmail.com> wrote:
>> >> Should data be re-indexed everytime synonyms like
>> >> word1,word2
>> >> or
>> >> word1 => word2
>> >>
>> >> are added to synonyms.txt
>> >
>> > Yes, if it changes the index (if it's used in the index anaylzer as
>> > opposed to just the query analyzer).
>> >
>> > -Yonik
>> >
>> >
>>
>> --
>> View this message in context:
>>  http://www.nabble.com/synonyms-tf4925232.html#a14100346
>> Sent from the Solr - Dev mailing list archive at Nabble.com.
>>
>>
>>
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Re%3A-synonyms-tp14116132p23860862.html
Sent from the Solr - User mailing list archive at Nabble.com.