You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Suneel Marthi (JIRA)" <ji...@apache.org> on 2017/01/20 14:25:26 UTC
[jira] [Resolved] (OPENNLP-697) Tokenizer class is hardcoded in the
DocumentSampleStream class.
[ https://issues.apache.org/jira/browse/OPENNLP-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Suneel Marthi resolved OPENNLP-697.
-----------------------------------
Resolution: Won't Fix
Fix Version/s: 1.7.1
> Tokenizer class is hardcoded in the DocumentSampleStream class.
> ----------------------------------------------------------------
>
> Key: OPENNLP-697
> URL: https://issues.apache.org/jira/browse/OPENNLP-697
> Project: OpenNLP
> Issue Type: Bug
> Components: Doccat, Tokenizer
> Affects Versions: 1.6.0
> Reporter: Praveena B
> Fix For: 1.7.1
>
>
> While training the DocumentCategorizerME it is possible to set the type of Tokenizer that the categorizer should use.
> i,e doccatFactory.setTokenizer(SemicolonTokenizer.INSTANCE);
> But the Tokenizer class is hardcoded to WhitespaceTokenizer in the DocumentSampleStream class.
> So it is not possible to modify the default tokenizing behaviour even after setting it in the doccatFactory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)