You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Tom Burton-West (JIRA)" <ji...@apache.org> on 2010/11/08 23:22:25 UTC

[jira] Updated: (SOLR-2211) Create Solr FilterFactory for Lucene StandardTokenizer with UAX#29 support

     [ https://issues.apache.org/jira/browse/SOLR-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom Burton-West updated SOLR-2211:
----------------------------------

    Attachment: SOLR-2211.patch

Patch implements Solr UAX29TokenizerFactory and TestUAX29TokenizerFactory.  

Tom

> Create Solr FilterFactory for Lucene StandardTokenizer with  UAX#29 support
> ---------------------------------------------------------------------------
>
>                 Key: SOLR-2211
>                 URL: https://issues.apache.org/jira/browse/SOLR-2211
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 3.1
>            Reporter: Tom Burton-West
>            Priority: Minor
>         Attachments: SOLR-2211.patch
>
>
> The Lucene 3.x StandardTokenizer with UAX#29 support provides benefits for non-English tokenizing.  Presently it can be invoked by using the StandardTokenizerFactory and setting the Version to 3.1.  However, it would be useful to be able to use the improved unicode processing without necessarily including the ip address and email address processing of StandardAnalyzer.   A FilterFactory that allowed the use of the StandardTokenizer with UAX#29 support on its own would be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org