You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2008/07/02 20:33:45 UTC

[jira] Updated: (SOLR-14) Add the ability to preserve the original term when using WordDelimiterFilter

     [ https://issues.apache.org/jira/browse/SOLR-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yonik Seeley updated SOLR-14:
-----------------------------

    Attachment: SOLR-14.patch

Attaching new version of patch.

The previous version had position issues, and also had issues with certain flag combos.
I changed the strategy by handling "preserveOriginal" outside of the main loop (anywhere there is a "break" that falls through) and then just returning the original token first and adjusting the offset of the next token to overlap.

This should also be faster as it avoids token copying in the common case.


> Add the ability to preserve the original term when using WordDelimiterFilter
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-14
>                 URL: https://issues.apache.org/jira/browse/SOLR-14
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Richard "Trey" Hyde
>            Assignee: Yonik Seeley
>         Attachments: SOLR-14.patch, SOLR-14.patch, SOLR-14.patch, SOLR-14.patch, TokenizerFactory.java, WordDelimiterFilter.patch, WordDelimiterFilter.patch
>
>
> When doing prefix searching, you need to hang on to the original term othewise you'll miss many matches you should be making.
> Data: ABC-12345
> WordDelimiterFitler may change this into
> ABC 12345 ABC12345
> A user may enter a search such as 
>  ABC\-123*
> Which will fail to find a match given the above scenario.
> The attached patch will allow the use of the "preserveOriginal" option to WordDelimiterFilter and will analyse as
> ABC 12345 ABC12345  ABC-12345 
> in which case we will get a postive match.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.