You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucy.apache.org by "Nick Wellnhofer (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2011/11/19 18:10:51 UTC

[lucy-issues] [jira] [Issue Comment Edited] (LUCY-191) Unicode normalization

    [ https://issues.apache.org/jira/browse/LUCY-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13153529#comment-13153529 ] 

Nick Wellnhofer edited comment on LUCY-191 at 11/19/11 5:09 PM:
----------------------------------------------------------------

Initial implementation of Lucy::Analysis::Normalizer, mostly cargo culted from the snowball stemmer.
                
      was (Author: nwellnhof):
    Initial implementation of Lucy::Analysis::Normalizer
                  
> Unicode normalization
> ---------------------
>
>                 Key: LUCY-191
>                 URL: https://issues.apache.org/jira/browse/LUCY-191
>             Project: Lucy
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Nick Wellnhofer
>            Priority: Minor
>              Labels: patch
>         Attachments: LUCY-191-normalizer.patch
>
>
> As discussed on the mailing list, it would be nice to have Unicode normalization, Unicode case folding and stripping of accents as part of the analyzer chain. With the help of utf8proc this can be done in one pass. So I proposed a new analyzer Lucy::Analyzer::Normalizer with an interface described here:
> http://mail-archives.apache.org/mod_mbox/incubator-lucy-dev/201111.mbox/%3C4EC43816.1070107%40aevum.de%3E

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira