You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Olivier Binda <ol...@wanadoo.fr> on 2018/03/26 16:03:48 UTC
How to train a parser from conllu data
Hello
I have tried to train a Parser model from the data provided by
http://universaldependencies.org/ in the conllu format,
but I failed.
The documentation for the training process (with the java API) is
lacking (how to read from conllu files ?)
and it looks like the command line utility expects things to be in the
openNlp format (what is that ? is there some utility that can convert
from conllu format to openNlp ?)
It is really frustrating, not to be able to train a model (I just spent
days trying) especially since using pre-trained model is a breeze and
make you believe that parsing is right at the tip of your fingers (it
you manage to create an ObjectStream<Parse> from a conllu file, that
is...which I miserably failed)
Can someone please help, by providing a working sample of training a
Parser from a conllu file ?
or for creating an ObjectStream<Parse> from a conllu treebank (like
universal dependency)
(the one provided in the openNlp documentation
(http://opennlp.apache.org/docs/1.8.4/manual/opennlp.html#tools.parser.parsing.api)
is somewhat, by the way : it doesn't tell how to create mlParameters,
etc...)
best regards,
Olivier
Re: How to train a parser from conllu data
Posted by Rodrigo Agerri <ro...@ehu.eus>.
Hello,
http://opennlp.apache.org/docs/1.8.4/manual/opennlp.html#tools.parser.training
Here you can read that the parser requires Penn Treebank format. It
will not run with the universal dependencies format. Also note that
the Penn Treebank syntax is constituents syntax and therefore the OpenNLP parser
is a constituent parser, not a dependency parser.
There are some conversors from constituents (penn treebank) to
dependencies, but I have never try them.
HTH,
R
On Mon, Mar 26, 2018 at 6:03 PM, Olivier Binda <ol...@wanadoo.fr> wrote:
> Hello
>
> I have tried to train a Parser model from the data provided by
> http://universaldependencies.org/ in the conllu format,
> but I failed.
>
> The documentation for the training process (with the java API) is lacking
> (how to read from conllu files ?)
> and it looks like the command line utility expects things to be in the
> openNlp format (what is that ? is there some utility that can convert from
> conllu format to openNlp ?)
>
> It is really frustrating, not to be able to train a model (I just spent days
> trying) especially since using pre-trained model is a breeze and make you
> believe that parsing is right at the tip of your fingers (it you manage to
> create an ObjectStream<Parse> from a conllu file, that is...which I
> miserably failed)
>
> Can someone please help, by providing a working sample of training a Parser
> from a conllu file ?
> or for creating an ObjectStream<Parse> from a conllu treebank (like
> universal dependency)
>
> (the one provided in the openNlp documentation
> (http://opennlp.apache.org/docs/1.8.4/manual/opennlp.html#tools.parser.parsing.api)
> is somewhat, by the way : it doesn't tell how to create mlParameters,
> etc...)
>
> best regards,
> Olivier
>
>
>
>