You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Chris Hostetter <ho...@fucit.org> on 2009/09/04 00:36:25 UTC

Re: Clarifications to Synonym Filter Wiki entry? (2 of 2)

: Earlier on the thread repeats the claim that, if you use index side
: expansion, you won't have a problem.  But it doesn't explain how/why that
: fixes it, given that the Lucene parser still breaks on white space.

because at query time, nothing knows (or cares) that that multiple 
variants were indexed ... if your feld contains "sea" and 
"biscut" and "seabiscut" the query parser doesn't care .... a querystring 
whose parsed form results in the query (field:seabiscut) is going 
to match, ditto for (field:sea field:biscut) ... the only place things 
start getting interesting is with phrase queries: because the synonyms 
are put at the same term position, things typically work ok, but you 
sometimes (ie: when the synonyms have differnet number of tokens) need a 
non-zero slop factor to help bridge the gap.

: Later there's a clue, it seems that even single words of a multi-word
: thesaurus entry are matched - so I guess Lucene doesn't need to see both
: words in a multi-word query, it just picks up either word, so it works
: around the multi-word parsing problem, but adds the undesireable side effect
: of false positive matches?

no ... A multi word (phrase) query needs to match all the words ... what 
that's referign to is that if a document orriginall contained "seabiscut" 
and synonyms caused "sea" and "biscut" to be added, then a search for just 
the term "sea" will match.



-Hoss