You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Suneel Marthi (JIRA)" <ji...@apache.org> on 2017/01/20 14:25:26 UTC

[jira] [Resolved] (OPENNLP-697) Tokenizer class is hardcoded in the DocumentSampleStream class.

     [ https://issues.apache.org/jira/browse/OPENNLP-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi resolved OPENNLP-697.
-----------------------------------
       Resolution: Won't Fix
    Fix Version/s: 1.7.1

> Tokenizer class is hardcoded in the DocumentSampleStream class. 
> ----------------------------------------------------------------
>
>                 Key: OPENNLP-697
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-697
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Doccat, Tokenizer
>    Affects Versions: 1.6.0
>            Reporter: Praveena B
>             Fix For: 1.7.1
>
>
> While training the DocumentCategorizerME it is possible to set the type of Tokenizer that the categorizer should use.
> i,e doccatFactory.setTokenizer(SemicolonTokenizer.INSTANCE); 
> But the Tokenizer class is hardcoded to WhitespaceTokenizer in the DocumentSampleStream class. 
> So it is not possible to modify the default tokenizing behaviour even after setting it in the doccatFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)