You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (Created) (JIRA)" <ji...@apache.org> on 2011/10/17 10:50:11 UTC

[jira] [Created] (OPENNLP-327) Doccats bag of word feature generator should not use numbers as features

Doccats bag of word feature generator should not use numbers as features
------------------------------------------------------------------------

                 Key: OPENNLP-327
                 URL: https://issues.apache.org/jira/browse/OPENNLP-327
             Project: OpenNLP
          Issue Type: Improvement
          Components: Doccat
            Reporter: Joern Kottmann
            Assignee: Joern Kottmann
            Priority: Minor


It turned out that Doccats bag of word feature generator can be very sensitive to numbers when used for language identification. Therefore numbers should not be included in the bag of words.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira