You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jeff Zemerick (Jira)" <ji...@apache.org> on 2022/09/17 18:38:00 UTC

[jira] [Updated] (OPENNLP-1385) Fix discrepancy in tokenizer documentation

     [ https://issues.apache.org/jira/browse/OPENNLP-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Zemerick updated OPENNLP-1385:
-----------------------------------
    Affects Version/s: 1.9.4

> Fix discrepancy in tokenizer documentation
> ------------------------------------------
>
>                 Key: OPENNLP-1385
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1385
>             Project: OpenNLP
>          Issue Type: Task
>          Components: Documentation, Tokenizer
>    Affects Versions: 1.9.4, 2.0.0
>            Reporter: Jeff Zemerick
>            Priority: Major
>
> In the tokenizer documentation in the user guide, the usage of the tool shows a cutoff option:
>         -cutoff num
>                 minimal number of times a feature must be seen, ignored if -params is used.
> However, this option is not present in the usage when running the CLI:
> {quote}Arguments description:
>         -factory factoryName
>                 A sub-class of TokenizerFactory where to get implementation and resources.
>         -abbDict path
>                 abbreviation dictionary in XML format.
>         -alphaNumOpt isAlphaNumOpt
>                 Optimization flag to skip alpha numeric tokens for further tokenization
>         -params paramsFile
>                 training parameters file.
>         -lang language
>                 language which is being processed.
>         -model modelFile
>                 output model file.
>         -data sampleData
>                 data to be used, usually a file name.
>         -encoding charsetName
>                 encoding for reading and writing text, if absent the system default is used.
> {quote}
> The CLI does not recognize cutoff as an option so it is likely the documentation is incorrect but a review of the code should probably be done first to be sure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)