You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Bruno P. Kinoshita (Jira)" <ji...@apache.org> on 2023/03/05 10:16:00 UTC

[jira] [Created] (OPENNLP-1479) Write better tests for pattern verification (tokenizers)

Bruno P. Kinoshita created OPENNLP-1479:
-------------------------------------------

             Summary: Write better tests for pattern verification (tokenizers)
                 Key: OPENNLP-1479
                 URL: https://issues.apache.org/jira/browse/OPENNLP-1479
             Project: OpenNLP
          Issue Type: Improvement
          Components: Tokenizer
    Affects Versions: 2.1.1
            Reporter: Bruno P. Kinoshita
             Fix For: 2.1.2


From [https://github.com/apache/opennlp/pull/516#issuecomment-1455015772]

At the moment our tests verify that the tokenizer objects are created correctly (i.e. tests getters and setters, constructor, etc.), without verifying the actual behavior when used in conjunction with other classes (factory, tokenizer, trainers, etc).

It would be best to test the patterns used in the factories for different languages with some interesting sample data (maybe something from project gutenberg, open source news sites, etc.).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)