You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (Created) (JIRA)" <ji...@apache.org> on 2011/10/17 10:50:11 UTC
[jira] [Created] (OPENNLP-327) Doccats bag of word feature
generator should not use numbers as features
Doccats bag of word feature generator should not use numbers as features
------------------------------------------------------------------------
Key: OPENNLP-327
URL: https://issues.apache.org/jira/browse/OPENNLP-327
Project: OpenNLP
Issue Type: Improvement
Components: Doccat
Reporter: Joern Kottmann
Assignee: Joern Kottmann
Priority: Minor
It turned out that Doccats bag of word feature generator can be very sensitive to numbers when used for language identification. Therefore numbers should not be included in the bag of words.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira