You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (JIRA)" <ji...@apache.org> on 2013/11/14 23:03:20 UTC

[jira] [Closed] (OPENNLP-618) Tokenize "can't"

     [ https://issues.apache.org/jira/browse/OPENNLP-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann closed OPENNLP-618.
----------------------------------

    Resolution: Won't Fix

Please post questions on the OpenNLP user mailing list.

> Tokenize "can't"
> ----------------
>
>                 Key: OPENNLP-618
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-618
>             Project: OpenNLP
>          Issue Type: Question
>          Components: OpenNLP ML
>         Environment: Windows 7 Enterprise, Java SE 1.7.0_40-b43
>            Reporter: tonylxc
>
> When I use OpenNLP's tokenizer to tokenize this sentence "I can't do it.", I get tokens like "[I] [ca] [n't] [do] [it] [.]". Isn't it supposed to be something like "[I] [can] ['] [t] [do] [it] [.]" or "[I] [can't] [do] [it] [.]"? I know LanguageTool's tokenizer gives tokens like "[I] [can] ['] [t] [do] [it] [.]".



--
This message was sent by Atlassian JIRA
(v6.1#6144)