You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Rodrigo Agerri <ag...@gmail.com> on 2013/04/28 20:44:58 UTC

parser training parameters

Hi,

I have tried to change the training parameters in the
lang/TrainerParams.txt but the ParserTrainer does not seem to be taking
them, it still does the same number of iterations (100) and cutoff 5 (using
1.5.3 release).

Any ideas?

Cheers,

Rodrigo

Re: parser training parameters

Posted by Rodrigo Agerri <ag...@gmail.com>.

Another clarification on this, just in case it is useful:

The overall order of training in the chunking parser is "build,
tagger, chunker and check" and you need to specify each of these steps
as prefixes in a training parameters file. Like this, for example:

Algorithm=MAXENT
build.Iterations=200
tagger.Iterations=200
chunker.Iterations=200
check.Iterations=200
build.Cutoff=4
tagger.Cutoff=4
chunker.Cutoff=4
check.Cutoff=4
build.Threads=4
tagger.Threads=4
chunker.Threads=4
check.Threads=4

Of course, if you insert a better POS model into the chunking-parse
model you can just ignore the tagger parameters, etc.

Cheers,

Rodrigo

On Thu, May 2, 2013 at 4:18 PM, Rodrigo Agerri <ag...@gmail.com> wrote:
> Thanks Jörn, that worked.
>
> Just in case anyone is wondering about the 4 steps Jörn mentioned, I
> looked at the chunking/Parser.java code again and found the reference
> to the author of the parsing approach used by the chunker parser
> (based on MaxEnt), whose thesis can be found here:
>
> http://www.ircs.upenn.edu/download/techreports/1998/98-15.pdf
>
> As the first two steps (tag and chunk, in this order) are already
> provided by the training data you can configure the other two (build
> and check, in this order) in the lang/TrainerParams.txt as you
> suggested:
>
> build.Cuttoff=2
> build.Iterations=200
> build.Threads=4
>
> check.Cuttoff=2
> check.Iterations=200
> check.Threads=4
>
> Cheers,
>
> Rodrigo
>
> On Tue, Apr 30, 2013 at 9:46 PM, Joern Kottmann <ko...@gmail.com> wrote:
>> Short answer from my phone, instead of Cutoff the parameter name is
>> check.Cutoff=0 for example. I will have a closer look tomorrow and reply on
>> the list, would be nice to have a sample parameter file for the parser be
>> checked in.
>>
>> Cheers Jörn
>>
>> On Apr 30, 2013 7:50 PM, "Rodrigo Agerri" <ag...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> Thanks for your answers, I will explain myself better.
>>>
>>> I edit the lang/TrainerParams.txt file where I specify, for example:
>>>
>>> Algorithm=MAXENT
>>> Iterations=1000
>>> Cutoff=0
>>> Threads=4
>>>
>>> Then I run the ParserTrainer from the CLI:
>>>
>>> bin/opennlp ParserTrainer -headRules
>>> /home/ragerri/experiments/parsing/opennlp/es/data/es-head-rules
>>> -parserType CHUNKING -params lang/TrainerParams.txt -lang es -model
>>> test.bin -encoding UTF-8 -data
>>> /home/ragerri/experiments/parsing/ancora-2.0/ancora2.treebank
>>>
>>> It trains fine, and the model works fine in a system using Apache
>>> OpenNLP API, but it still uses the cutoff 5 and 100 iterations that
>>> seems to be the default specification training parameters for
>>> ParserTrainer.
>>>
>>> I can change these parameters for parser training using the API, that
>>> works fine, but I cannot manage to do it from the command line.
>>>
>>> I did not understand your suggestion, Jörn, could you please provide
>>> an example?
>>>
>>> Thanks,
>>>
>>> Rodrigo
>>>
>>>
>>>
>>> On Tue, Apr 30, 2013 at 4:21 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>>> > On 04/30/2013 04:03 PM, William Colen wrote:
>>> >>
>>> >> Are you using the command line tool? If yes, you should pass the path
>>> >> to
>>> >> the parameters file in the command line argument -params <file-path>
>>> >>
>>> >
>>> > The parser trains multiple models, to make the parameters work they are
>>> > prefixed,
>>> > the prefixes for the four models are: tagger, chunker, check and build.
>>> > Just
>>> > put them in front
>>> > of the usual parameter names.
>>> >
>>> > HTH,
>>> > Jörn

Re: parser training parameters

Posted by Rodrigo Agerri <ag...@gmail.com>.

Thanks Jörn, that worked.

Just in case anyone is wondering about the 4 steps Jörn mentioned, I
looked at the chunking/Parser.java code again and found the reference
to the author of the parsing approach used by the chunker parser
(based on MaxEnt), whose thesis can be found here:

http://www.ircs.upenn.edu/download/techreports/1998/98-15.pdf

As the first two steps (tag and chunk, in this order) are already
provided by the training data you can configure the other two (build
and check, in this order) in the lang/TrainerParams.txt as you
suggested:

build.Cuttoff=2
build.Iterations=200
build.Threads=4

check.Cuttoff=2
check.Iterations=200
check.Threads=4

Cheers,

Rodrigo

On Tue, Apr 30, 2013 at 9:46 PM, Joern Kottmann <ko...@gmail.com> wrote:
> Short answer from my phone, instead of Cutoff the parameter name is
> check.Cutoff=0 for example. I will have a closer look tomorrow and reply on
> the list, would be nice to have a sample parameter file for the parser be
> checked in.
>
> Cheers Jörn
>
> On Apr 30, 2013 7:50 PM, "Rodrigo Agerri" <ag...@gmail.com> wrote:
>>
>> Hi,
>>
>> Thanks for your answers, I will explain myself better.
>>
>> I edit the lang/TrainerParams.txt file where I specify, for example:
>>
>> Algorithm=MAXENT
>> Iterations=1000
>> Cutoff=0
>> Threads=4
>>
>> Then I run the ParserTrainer from the CLI:
>>
>> bin/opennlp ParserTrainer -headRules
>> /home/ragerri/experiments/parsing/opennlp/es/data/es-head-rules
>> -parserType CHUNKING -params lang/TrainerParams.txt -lang es -model
>> test.bin -encoding UTF-8 -data
>> /home/ragerri/experiments/parsing/ancora-2.0/ancora2.treebank
>>
>> It trains fine, and the model works fine in a system using Apache
>> OpenNLP API, but it still uses the cutoff 5 and 100 iterations that
>> seems to be the default specification training parameters for
>> ParserTrainer.
>>
>> I can change these parameters for parser training using the API, that
>> works fine, but I cannot manage to do it from the command line.
>>
>> I did not understand your suggestion, Jörn, could you please provide
>> an example?
>>
>> Thanks,
>>
>> Rodrigo
>>
>>
>>
>> On Tue, Apr 30, 2013 at 4:21 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>> > On 04/30/2013 04:03 PM, William Colen wrote:
>> >>
>> >> Are you using the command line tool? If yes, you should pass the path
>> >> to
>> >> the parameters file in the command line argument -params <file-path>
>> >>
>> >
>> > The parser trains multiple models, to make the parameters work they are
>> > prefixed,
>> > the prefixes for the four models are: tagger, chunker, check and build.
>> > Just
>> > put them in front
>> > of the usual parameter names.
>> >
>> > HTH,
>> > Jörn

Re: parser training parameters

Posted by Rodrigo Agerri <ag...@gmail.com>.

Hi,

Thanks for your answers, I will explain myself better.

I edit the lang/TrainerParams.txt file where I specify, for example:

Algorithm=MAXENT
Iterations=1000
Cutoff=0
Threads=4

Then I run the ParserTrainer from the CLI:

bin/opennlp ParserTrainer -headRules
/home/ragerri/experiments/parsing/opennlp/es/data/es-head-rules
-parserType CHUNKING -params lang/TrainerParams.txt -lang es -model
test.bin -encoding UTF-8 -data
/home/ragerri/experiments/parsing/ancora-2.0/ancora2.treebank

It trains fine, and the model works fine in a system using Apache
OpenNLP API, but it still uses the cutoff 5 and 100 iterations that
seems to be the default specification training parameters for
ParserTrainer.

I can change these parameters for parser training using the API, that
works fine, but I cannot manage to do it from the command line.

I did not understand your suggestion, Jörn, could you please provide
an example?

Thanks,

Rodrigo

On Tue, Apr 30, 2013 at 4:21 PM, Jörn Kottmann <ko...@gmail.com> wrote:
> On 04/30/2013 04:03 PM, William Colen wrote:
>>
>> Are you using the command line tool? If yes, you should pass the path to
>> the parameters file in the command line argument -params <file-path>
>>
>
> The parser trains multiple models, to make the parameters work they are
> prefixed,
> the prefixes for the four models are: tagger, chunker, check and build. Just
> put them in front
> of the usual parameter names.
>
> HTH,
> Jörn

Re: parser training parameters

Posted by Jörn Kottmann <ko...@gmail.com>.

On 04/30/2013 04:03 PM, William Colen wrote:
> Are you using the command line tool? If yes, you should pass the path to
> the parameters file in the command line argument -params <file-path>
>

The parser trains multiple models, to make the parameters work they are 
prefixed,
the prefixes for the four models are: tagger, chunker, check and build. 
Just put them in front
of the usual parameter names.

HTH,
Jörn

Re: parser training parameters

Posted by William Colen <wi...@gmail.com>.

Are you using the command line tool? If yes, you should pass the path to
the parameters file in the command line argument -params <file-path>

On Sun, Apr 28, 2013 at 3:44 PM, Rodrigo Agerri <ag...@gmail.com>wrote:

> Hi,
>
> I have tried to change the training parameters in the
> lang/TrainerParams.txt but the ParserTrainer does not seem to be taking
> them, it still does the same number of iterations (100) and cutoff 5 (using
> 1.5.3 release).
>
> Any ideas?
>
> Cheers,
>
> Rodrigo
>