You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Suneel Marthi (Jira)" <ji...@apache.org> on 2020/01/25 20:08:00 UTC

[jira] [Commented] (OPENNLP-1270) Add new languages to the language detector

    [ https://issues.apache.org/jira/browse/OPENNLP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023639#comment-17023639 ] 

Suneel Marthi commented on OPENNLP-1270:
----------------------------------------

Could we also look at Europaarl corpus maybe?  [https://www.statmt.org/europarl/]

> Add new languages to the language detector
> ------------------------------------------
>
>                 Key: OPENNLP-1270
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1270
>             Project: OpenNLP
>          Issue Type: Task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 1.9.3
>
>         Attachments: report.txt, report.txt
>
>
> Leipzig has several other languages that might be useful to add to the language detector.  I've selected some with > 10k sentences.  Once I build the model and evaluate performance, I'll share the reports, the model and a tgz of the *-sentences.txt files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)