You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Olivier Binda <ol...@wanadoo.fr> on 2018/03/26 16:03:48 UTC

How to train a parser from conllu data

Hello

I have tried to train a Parser model from the data provided by 
http://universaldependencies.org/ in the conllu format,
but I failed.

The documentation for the training process (with the java API) is 
lacking (how to read from conllu files ?)
and it looks like the command line utility expects things to be in the 
openNlp format (what is that ? is there some utility that can convert 
from conllu format to openNlp ?)

It is really frustrating, not to be able to train a model (I just spent 
days trying) especially since using pre-trained model is a breeze and 
make you believe that parsing is right at the tip of your fingers (it 
you manage to create an ObjectStream<Parse> from a conllu file, that 
is...which I miserably failed)

Can someone please help, by providing a working sample of training a 
Parser from a conllu file ?
or for creating an ObjectStream<Parse> from a conllu treebank (like 
universal dependency)

(the one provided in the openNlp documentation 
(http://opennlp.apache.org/docs/1.8.4/manual/opennlp.html#tools.parser.parsing.api) 
is somewhat, by the way : it doesn't tell how to create mlParameters, 
etc...)

best regards,
Olivier





Re: How to train a parser from conllu data

Posted by Rodrigo Agerri <ro...@ehu.eus>.
Hello,

http://opennlp.apache.org/docs/1.8.4/manual/opennlp.html#tools.parser.training

Here you can read that the parser requires Penn Treebank format. It
will not run with the universal dependencies format. Also note that
the Penn Treebank syntax is constituents syntax and therefore the OpenNLP parser
is a constituent parser, not a dependency parser.

There are some conversors from constituents (penn treebank) to
dependencies, but I have never try them.

HTH,

R

On Mon, Mar 26, 2018 at 6:03 PM, Olivier Binda <ol...@wanadoo.fr> wrote:
> Hello
>
> I have tried to train a Parser model from the data provided by
> http://universaldependencies.org/ in the conllu format,
> but I failed.
>
> The documentation for the training process (with the java API) is lacking
> (how to read from conllu files ?)
> and it looks like the command line utility expects things to be in the
> openNlp format (what is that ? is there some utility that can convert from
> conllu format to openNlp ?)
>
> It is really frustrating, not to be able to train a model (I just spent days
> trying) especially since using pre-trained model is a breeze and make you
> believe that parsing is right at the tip of your fingers (it you manage to
> create an ObjectStream<Parse> from a conllu file, that is...which I
> miserably failed)
>
> Can someone please help, by providing a working sample of training a Parser
> from a conllu file ?
> or for creating an ObjectStream<Parse> from a conllu treebank (like
> universal dependency)
>
> (the one provided in the openNlp documentation
> (http://opennlp.apache.org/docs/1.8.4/manual/opennlp.html#tools.parser.parsing.api)
> is somewhat, by the way : it doesn't tell how to create mlParameters,
> etc...)
>
> best regards,
> Olivier
>
>
>
>