You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2013/05/21 22:05:16 UTC

[jira] [Created] (LUCENE-5012) Make graph-based TokenFilters easier

Michael McCandless created LUCENE-5012:
------------------------------------------

             Summary: Make graph-based TokenFilters easier
                 Key: LUCENE-5012
                 URL: https://issues.apache.org/jira/browse/LUCENE-5012
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: Michael McCandless
            Assignee: Michael McCandless


SynonymFilter has two limitations today:

  * It cannot create positions, so eg dns -> domain name service
    creates blatantly wrong highlights (SOLR-3390, LUCENE-4499 and
    others).

  * It cannot consume a graph, so e.g. if you try to apply synonyms
    after Kuromoji tokenizer I'm not sure what will happen.

I've thought about how to fix these issues but it's really quite
difficult with the current PosInc/PosLen graph representation, so I'd
like to explore an alternative approach.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org