You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jörn Kottmann (JIRA)" <ji...@apache.org> on 2011/03/03 14:07:36 UTC

[jira] Created: (OPENNLP-141) Tokenizers alpha numeric optimization only recognizes a-z as alpha chars

Tokenizers alpha numeric optimization only recognizes a-z as alpha chars
------------------------------------------------------------------------

                 Key: OPENNLP-141
                 URL: https://issues.apache.org/jira/browse/OPENNLP-141
             Project: OpenNLP
          Issue Type: Bug
          Components: Tokenizer
    Affects Versions: tools-1.5.0-sourceforge
            Reporter: Jörn Kottmann
            Priority: Minor


The Tokenizer has an optimization which skips tokens which are only made of numerics or alpha chars. In foreign languages the alpha chars contain umlauts and other letters which are not included in the a-z range.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira