You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Tommaso Teofili (JIRA)" <ji...@apache.org> on 2015/06/04 14:55:38 UTC

[jira] [Commented] (OPENNLP-788) Add a language detection component

    [ https://issues.apache.org/jira/browse/OPENNLP-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572709#comment-14572709 ] 

Tommaso Teofili commented on OPENNLP-788:
-----------------------------------------

good point, that would be a very valuable contribution. 
I was just having a look at [how Apache Solr|https://cwiki.apache.org/confluence/display/solr/Detecting+Languages+During+Indexing] handles that and it turns out it's exactly those 2 solutions.

> Add a language detection component
> ----------------------------------
>
>                 Key: OPENNLP-788
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-788
>             Project: OpenNLP
>          Issue Type: Improvement
>            Reporter: Joern Kottmann
>
> Many of the components in OpenNLP are sensitive to the input language. It would be nice if OpenNLP would have a component to detect the language of an input text.
> Two commonly used solutions today are:
> Apache Tikas Language Identifier
> Language Detection from Shuyo, Nakatani



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)