You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jeff Zemerick (Jira)" <ji...@apache.org> on 2022/09/17 18:38:00 UTC
[jira] [Updated] (OPENNLP-1385) Fix discrepancy in tokenizer documentation
[ https://issues.apache.org/jira/browse/OPENNLP-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Zemerick updated OPENNLP-1385:
-----------------------------------
Affects Version/s: 1.9.4
> Fix discrepancy in tokenizer documentation
> ------------------------------------------
>
> Key: OPENNLP-1385
> URL: https://issues.apache.org/jira/browse/OPENNLP-1385
> Project: OpenNLP
> Issue Type: Task
> Components: Documentation, Tokenizer
> Affects Versions: 1.9.4, 2.0.0
> Reporter: Jeff Zemerick
> Priority: Major
>
> In the tokenizer documentation in the user guide, the usage of the tool shows a cutoff option:
> -cutoff num
> minimal number of times a feature must be seen, ignored if -params is used.
> However, this option is not present in the usage when running the CLI:
> {quote}Arguments description:
> -factory factoryName
> A sub-class of TokenizerFactory where to get implementation and resources.
> -abbDict path
> abbreviation dictionary in XML format.
> -alphaNumOpt isAlphaNumOpt
> Optimization flag to skip alpha numeric tokens for further tokenization
> -params paramsFile
> training parameters file.
> -lang language
> language which is being processed.
> -model modelFile
> output model file.
> -data sampleData
> data to be used, usually a file name.
> -encoding charsetName
> encoding for reading and writing text, if absent the system default is used.
> {quote}
> The CLI does not recognize cutoff as an option so it is likely the documentation is incorrect but a review of the code should probably be done first to be sure.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)