You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Robert Petersen <ro...@buy.com> on 2011/08/26 22:52:16 UTC

synonyms vs replacements

Hello all,

 

Which is better?   Say you add an index time synonym between nunchuck
and nunchuk and then both words will be in the document and both will be
searchable.   I can get the same exact behavior by putting an index time
replacement of nunchuck => nunchuk and a search time replacement of the
same.  

 

I figured the replacement strategy keeps the the index size slightly
smaller by only having the one term in the index, but the synonym
strategy only requires you update the master, not the slave farm, and
requires slightly less work for the searchers during a user query.  Are
there any other considerations I should be aware of?  

 

Thanks

 

BTW nunchuk is the correct spelling.  J

 

 


Re: synonyms vs replacements

Posted by Erick Erickson <er...@gmail.com>.
See here abou the "multi word" problem....
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory

As for the rest, it's a tradeoff (surprise, surprise, surprise <G>).

You're right, expanding at index time leads to a somewhat
larger index, but less complex queries. And if you change
your synonyms file, you need to re-index from scratch

Indexing at query time lets you keep your synonyms up to
date. But the queries are more complex and somewhat
slower...

Which is "better" depends (tm), so pick your poison. One
strategy is to expand at index time, and *also* expand
at query time, but with a different synonym file. The idea
is that your query-time synonym file is the set of terms that
you want to add to your index-time expansion next
time you can re-index from scratch. Then periodically you
merge your query-time syns into your index-time syns, re-index
from scratch and empty your query-time syns. Rinse, repeat.

So, there isn't really a "right" answer. Personally I prefer to
expand at index time, but that's largely a preference.

Best
Erick

On Fri, Aug 26, 2011 at 4:52 PM, Robert Petersen <ro...@buy.com> wrote:
> Hello all,
>
>
>
> Which is better?   Say you add an index time synonym between nunchuck
> and nunchuk and then both words will be in the document and both will be
> searchable.   I can get the same exact behavior by putting an index time
> replacement of nunchuck => nunchuk and a search time replacement of the
> same.
>
>
>
> I figured the replacement strategy keeps the the index size slightly
> smaller by only having the one term in the index, but the synonym
> strategy only requires you update the master, not the slave farm, and
> requires slightly less work for the searchers during a user query.  Are
> there any other considerations I should be aware of?
>
>
>
> Thanks
>
>
>
> BTW nunchuk is the correct spelling.  J
>
>
>
>
>
>