You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexander Ramos Jardim <al...@gmail.com> on 2009/01/02 13:34:28 UTC
Re: synonyms.txt file updated frequently

People,

Thanks for all the replies,

The business requirement I have is to update the synonyms list every time
someone from the sales department establishes a new dictionary (they do that
a couple times in a week) I must add the new synonyms to the index. I think
I will stick with query time synonyms only for Grant's reason.

At least bad is better than worse.

2008/12/31 Grant Ingersoll <gs...@apache.org>

>
> On Dec 30, 2008, at 4:38 PM, Smiley, David W. wrote:
>
>  Grant, the Solr wiki recommends doing expansion at index time and gives
>> reasons:
>>
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46
>>
>>
> I personally think "recommends" is too strong of a word, but the points are
> valid reasons to do index time synonyms.  In Alexandar's case, I think
> index-time is a bit more problematic, since he is frequently updating the
> synonym list, meaning he would have to reindex every time, otherwise his
> stats are going to be even more skewed.
>
> As for multi-word expansions, the query parser can be fixed or an alternate
> one used.
>
>
>
>  Query-time doesn't work for multi-word expansion.  For everyone's
>> convenience, I'll quote the remainder of the problems:
>>
>>
>> Even when you aren't worried about multi-word synonyms, idf differences
>> still make index time synonyms a good idea. Consider the following scenario:
>>
>>   *  An index with a "text" field, which at query time uses the
>> SynonymFilter with the synonym TV, Televesion and expand="true"
>>   *  Many thousands of documents containing the term "text:TV"
>>   *  A few hundred documents containing the term "text:Television"
>>
>> A query for text:TV will expand into (text:TV text:Television) and the
>> lower docFreq for text:Television will give the documents that match
>> "Television" a much higher score then docs that match "TV" comparably --
>> which may be somewhat counter intuitive to the client. Index time expansion
>> (or reduction) will result in the same idf for all documents regardless of
>> which term the original text contained.
>>
>> ~ David Smiley
>>
>> On 12/30/08 4:33 PM, "Grant Ingersoll" <gs...@apache.org> wrote:
>>
>>
>>
>> On Dec 30, 2008, at 11:05 AM, Alexander Ramos Jardim wrote:
>>
>>  Hey Grant,
>>>
>>> Thanks for the info!
>>>
>>> 2008/12/30 Grant Ingersoll <gs...@apache.org>
>>>
>>>  I'd probably write a new TokenFilter that was aware of the reload
>>>> policy
>>>> (in a generic way) such that I didn't have to go through a whole
>>>> core reload
>>>> every time.  Are you just using them during query time or also during
>>>> indexing?
>>>>
>>>>
>>> I am using it at indexing time.
>>>
>>
>> I think that is a bit more problematic.  How do you deal with new
>> documents having the new synonyms while old docs don't?
>>
>> Any particular reason you use syns at indexing and not search?  Not
>> saying there aren't reasons to do it, just query side usually works
>> better for this very reason.
>>
>>
> --------------------------
> Grant Ingersoll
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>


-- 
Alexander Ramos Jardim