You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jörn Kottmann (JIRA)" <ji...@apache.org> on 2011/03/03 14:07:36 UTC
[jira] Created: (OPENNLP-141) Tokenizers alpha numeric optimization
only recognizes a-z as alpha chars
Tokenizers alpha numeric optimization only recognizes a-z as alpha chars
------------------------------------------------------------------------
Key: OPENNLP-141
URL: https://issues.apache.org/jira/browse/OPENNLP-141
Project: OpenNLP
Issue Type: Bug
Components: Tokenizer
Affects Versions: tools-1.5.0-sourceforge
Reporter: Jörn Kottmann
Priority: Minor
The Tokenizer has an optimization which skips tokens which are only made of numerics or alpha chars. In foreign languages the alpha chars contain umlauts and other letters which are not included in the a-z range.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira