You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2014/03/16 06:02:19 UTC

[jira] [Updated] (SOLR-1279) ApostropheTokenizer

     [ https://issues.apache.org/jira/browse/SOLR-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley updated SOLR-1279:
-------------------------------

    Fix Version/s:     (was: 4.7)
                   4.8

> ApostropheTokenizer
> -------------------
>
>                 Key: SOLR-1279
>                 URL: https://issues.apache.org/jira/browse/SOLR-1279
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Sergey Borisov
>            Priority: Minor
>             Fix For: 4.8
>
>         Attachments: ApostropheTokenizer.zip
>
>
> ApostropheTokenizer creates extra tokens during the analysis stage for the fields containing apostrophes. The reason for adding this is to ensure that documents that differ only by apostrophe have the same relevancy score. 
> For example, if the document contains string "McDonald's", it will be tokenized as "McDonald's McDonalds". This way when the search is performed against "McDonald's" or "McDonalds" will produce similar score.
> This code handles up to two apostrophes in a token.
> To use this tokenizer add the following line in schema.xml
> <analyzer type="index">
>       <filter class="org.apache.lucene.analysis.ApostropheTokenFactory"/>
> ...
> </analyzer>



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org