You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@opennlp.apache.org by "Chris Krol / IBM (JIRA)" <ji...@apache.org> on 2014/06/12 13:39:02 UTC

[jira] [Comment Edited] (OPENNLP-701) Polish language support - Maxent binaries

    [ https://issues.apache.org/jira/browse/OPENNLP-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029059#comment-14029059 ] 

Chris Krol / IBM edited comment on OPENNLP-701 at 6/12/14 11:37 AM:
--------------------------------------------------------------------

Thanks for your response. 

Could you point me to the important packages or interfaces that would have to be implemented in order for the added support to fit well into the general OpenNLP design? 

My current idea is an  opennlp.tools.lang.polish dedicated Parser class for the corpus native format. 

I would be still contributing at least sentence detection and tokenizer, because they were created using a huge plaintext data set that's ready as provided. 


was (Author: kris.chris):
Thanks for your response. 

Could you point me to the important packages or interfaces that would have to be implemented in order to fit well into the general OpenNLP design? 

My current idea is an  opennlp.tools.lang.polish dedicated Parser class for the corpus native format. 

I would be still contributing at least sentence detection and tokenizer, because they were created using a huge plaintext data set that's ready as provided. 

> Polish language support - Maxent binaries
> -----------------------------------------
>
>                 Key: OPENNLP-701
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-701
>             Project: OpenNLP
>          Issue Type: New Feature
>            Reporter: Chris Krol / IBM
>            Priority: Minor
>
> Hi, 
> Currently I'm working at IBM Poland and my manager approved the idea of contributing various Maxent binaries for Polish language (sentence split, sentence detection, POS tagging and morphological analysis, NER). 
> You could possibly put them on your download page. 
> We trained them using the Golden Standard human-annotated Polish National Corpus (GPL 3.0). 
> Would this be also possible to give some credit (or any) to the fact that the job's been done at IBM?
> I've already sent a mail to the devs,  but haven't seen any response for two weeks now. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)