You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2015/07/07 11:12:05 UTC

[jira] [Created] (LUCENE-6664) Replace SynonymFilter with SynonymGraphFilter

Michael McCandless created LUCENE-6664:
------------------------------------------

             Summary: Replace SynonymFilter with SynonymGraphFilter
                 Key: LUCENE-6664
                 URL: https://issues.apache.org/jira/browse/LUCENE-6664
             Project: Lucene - Core
          Issue Type: New Feature
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 5.3, Trunk


Spinoff from LUCENE-6582.

I created a new SynonymGraphFilter (to replace the current buggy
SynonymFilter), that produces correct graphs (does no "graph
flattening" itself).  I think this makes it simpler.

This means you must add the FlattenGraphFilter yourself, if you are
applying synonyms during indexing.

Index-time syn expansion is a necessarily "lossy" graph transformation
when multi-token (input or output) synonyms are applied, because the
index does not store {{posLength}}, so there will always be phrase
queries that should match but do not, and then phrase queries that
should not match but do.
http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
goes into detail about this.

However, with this new SynonymGraphFilter, if instead you do synonym
expansion at query time (and don't do the flattening), and you use
TermAutomatonQuery (future: somehow integrated into a query parser),
or maybe just "enumerate all paths and make union of PhraseQuery", you
should get 100% correct matches (not sure about "proper" scoring
though...).

This new syn filter still cannot consume an arbitrary graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org