You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Suneel Marthi (Jira)" <ji...@apache.org> on 2020/01/25 20:08:00 UTC
[jira] [Commented] (OPENNLP-1270) Add new languages to the language
detector
[ https://issues.apache.org/jira/browse/OPENNLP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023639#comment-17023639 ]
Suneel Marthi commented on OPENNLP-1270:
----------------------------------------
Could we also look at Europaarl corpus maybe? [https://www.statmt.org/europarl/]
> Add new languages to the language detector
> ------------------------------------------
>
> Key: OPENNLP-1270
> URL: https://issues.apache.org/jira/browse/OPENNLP-1270
> Project: OpenNLP
> Issue Type: Task
> Reporter: Tim Allison
> Assignee: Tim Allison
> Priority: Major
> Fix For: 1.9.3
>
> Attachments: report.txt, report.txt
>
>
> Leipzig has several other languages that might be useful to add to the language detector. I've selected some with > 10k sentences. Once I build the model and evaluate performance, I'll share the reports, the model and a tgz of the *-sentences.txt files.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)