You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by "Richard Head Jr." <hs...@yahoo.com> on 2013/04/23 02:56:31 UTC
TokenNameFinderTrainer Error: Model not compatible with name finder!
Using 1.5.2. My training data looks like this:
Guacamole Dip: 5 Hass <start:term> Avocados <end>, <start:term>
Jalapeno <end> Puree with <start:term> Salt <end> and <start:term> BHT <end> (preservative).
Here's the command I'm using:
opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data terms.train -model terms.bin
I found a message on this list acknowledging this as a bug that should have been fixed in 1.5.1: http://www.mail-archive.com/opennlp-users@incubator.apache.org/msg00162.html
I should also note that the docs and the above message say that entities should be marked using the "<START:xxx> <END>" format. When I use uppercase tags I receive the following error:
Computing event counts... java.io.IOException: Found unexpected annotation while handling a name sequence: meal <END>, ###<START:term>### sugar <END>,
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
at opennlp.maxent.GIS.trainModel(GIS.java:256)
at opennlp.model.TrainUtil.train(TrainUtil.java:182)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:360)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:426)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:458)
at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:201)
at opennlp.tools.cmdline.CLI.main(CLI.java:191)
Re: TokenNameFinderTrainer Error: Model not compatible with name
finder!
Posted by James Kosin <ja...@gmail.com>.
On 4/23/2013 2:23 AM, Jörn Kottmann wrote:
> On 04/23/2013 02:56 AM, Richard Head Jr. wrote:
>> I should also note that the docs and the above message say that
>> entities should be marked using the "<START:xxx> <END>" format. When
>> I use uppercase tags I receive the following error:
>
> Only upper case tags are recognized, lower case tags are ignored, the
> "Model not compatible execption" is due
> to the fact that the model is expected to contain all outcomes, but
> that is not the case if it only contains "other"
> (a statistical model is also of no use, if it can only predict one
> outcome).
>
> Jörn
>
Maybe we could build a series of validation tools?
Re: TokenNameFinderTrainer Error: Model not compatible with name
finder!
Posted by Jörn Kottmann <ko...@gmail.com>.
On 04/23/2013 02:56 AM, Richard Head Jr. wrote:
> I should also note that the docs and the above message say that entities should be marked using the "<START:xxx> <END>" format. When I use uppercase tags I receive the following error:
Only upper case tags are recognized, lower case tags are ignored, the
"Model not compatible execption" is due
to the fact that the model is expected to contain all outcomes, but that
is not the case if it only contains "other"
(a statistical model is also of no use, if it can only predict one outcome).
Jörn
Re: TokenNameFinderTrainer Error: Model not compatible with name
finder!
Posted by James Kosin <ja...@gmail.com>.
On 4/26/2013 2:23 AM, Jörn Kottmann wrote:
> On 04/26/2013 01:06 AM, James Kosin wrote:
>> Jorn,
>>
>> Wouldn't this be hard because the training doesn't happen until the
>> sentences have gone through the event streaming process?
>
>
> All the messages which are print to the console about a sample object
> are hard to track down without an id, to fix this the id must probably
> come from the sample object itself. It should be possible to extend
> the sample objects with ids and also the training formats.
>
> Jörn
>
>
I see, we could set this ID to be the line number in the original input
stream.
Re: TokenNameFinderTrainer Error: Model not compatible with name
finder!
Posted by Jörn Kottmann <ko...@gmail.com>.
On 04/26/2013 01:06 AM, James Kosin wrote:
> Jorn,
>
> Wouldn't this be hard because the training doesn't happen until the
> sentences have gone through the event streaming process?
All the messages which are print to the console about a sample object
are hard to track down without an id, to fix this the id must probably
come from the sample object itself. It should be possible to extend the
sample objects with ids and also the training formats.
Jörn
Re: TokenNameFinderTrainer Error: Model not compatible with name
finder!
Posted by James Kosin <ja...@gmail.com>.
Jorn,
Wouldn't this be hard because the training doesn't happen until the
sentences have gone through the event streaming process?
James
On 4/25/2013 2:36 AM, Jörn Kottmann wrote:
> On 04/25/2013 06:53 AM, Richard Head Jr. wrote:
>> Thanks for clarifying. Once I fixed this I ran into similar errors
>> with different sentences in the training file. It would be really
>> helpful if a line/column number was included in the message. I had a
>> lot of sentences (relatively speaking) so some of the errors were
>> hard to track down.
>
> Yes, good point, thats a problem I have all the time too. We should
> use a line number (or sentence counter)
> and an id a user can specify for a document. These should be used in
> the log messages we produce during
> training and evaluation.
>
> Jörn
>
Re: TokenNameFinderTrainer Error: Model not compatible with name
finder!
Posted by Jörn Kottmann <ko...@gmail.com>.
On 04/25/2013 06:53 AM, Richard Head Jr. wrote:
> Thanks for clarifying. Once I fixed this I ran into similar errors with different sentences in the training file. It would be really helpful if a line/column number was included in the message. I had a lot of sentences (relatively speaking) so some of the errors were hard to track down.
Yes, good point, thats a problem I have all the time too. We should use
a line number (or sentence counter)
and an id a user can specify for a document. These should be used in the
log messages we produce during
training and evaluation.
Jörn
Re: TokenNameFinderTrainer Error: Model not compatible with name finder!
Posted by "Richard Head Jr." <hs...@yahoo.com>.
Thanks for clarifying. Once I fixed this I ran into similar errors with different sentences in the training file. It would be really helpful if a line/column number was included in the message. I had a lot of sentences (relatively speaking) so some of the errors were hard to track down.
--- On Mon, 4/22/13, James Kosin <ja...@gmail.com> wrote:
> From: James Kosin <ja...@gmail.com>
> Subject: Re: TokenNameFinderTrainer Error: Model not compatible with name finder!
> To: users@opennlp.apache.org
> Date: Monday, April 22, 2013, 6:58 PM
> Richard,
>
> The problem is the ',' after then <END> tag.
>
> <START:term> Avocados <END> , ....
>
> The error is because <END>, is not an <END>
> token with the ',' butted
> against it.
>
> Lower case may seem to work; but, then you don't have any
> tokens... and
> thereby no data to train.
>
> James
>
> On 4/22/2013 8:56 PM, Richard Head Jr. wrote:
> > Using 1.5.2. My training data looks like this:
> >
> > Guacamole Dip: 5 Hass <start:term> Avocados
> <end>, <start:term>
> > Jalapeno <end> Puree with <start:term> Salt
> <end> and <start:term> BHT <end>
> (preservative).
> >
> > Here's the command I'm using:
> >
> > opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en
> -data terms.train -model terms.bin
> >
> > I found a message on this list acknowledging this as a
> bug that should have been fixed in 1.5.1: http://www.mail-archive.com/opennlp-users@incubator.apache.org/msg00162.html
> >
> > I should also note that the docs and the above message
> say that entities should be marked using the
> "<START:xxx> <END>" format. When I use uppercase
> tags I receive the following error:
> >
> > Computing event counts... java.io.IOException:
> Found unexpected annotation while handling a name sequence:
> meal <END>, ###<START:term>### sugar
> <END>,
> > Incorporating indexed data for training...
> > Exception in thread "main"
> java.lang.NullPointerException
> > at
> opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
> > at
> opennlp.maxent.GIS.trainModel(GIS.java:256)
> > at
> opennlp.model.TrainUtil.train(TrainUtil.java:182)
> > at
> opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:360)
> > at
> opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:426)
> > at
> opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:458)
> > at
> opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:201)
> > at
> opennlp.tools.cmdline.CLI.main(CLI.java:191)
> >
> >
> >
> >
>
>
Re: TokenNameFinderTrainer Error: Model not compatible with name
finder!
Posted by James Kosin <ja...@gmail.com>.
Richard,
The problem is the ',' after then <END> tag.
<START:term> Avocados <END> , ....
The error is because <END>, is not an <END> token with the ',' butted
against it.
Lower case may seem to work; but, then you don't have any tokens... and
thereby no data to train.
James
On 4/22/2013 8:56 PM, Richard Head Jr. wrote:
> Using 1.5.2. My training data looks like this:
>
> Guacamole Dip: 5 Hass <start:term> Avocados <end>, <start:term>
> Jalapeno <end> Puree with <start:term> Salt <end> and <start:term> BHT <end> (preservative).
>
> Here's the command I'm using:
>
> opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data terms.train -model terms.bin
>
> I found a message on this list acknowledging this as a bug that should have been fixed in 1.5.1: http://www.mail-archive.com/opennlp-users@incubator.apache.org/msg00162.html
>
> I should also note that the docs and the above message say that entities should be marked using the "<START:xxx> <END>" format. When I use uppercase tags I receive the following error:
>
> Computing event counts... java.io.IOException: Found unexpected annotation while handling a name sequence: meal <END>, ###<START:term>### sugar <END>,
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
> at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
> at opennlp.maxent.GIS.trainModel(GIS.java:256)
> at opennlp.model.TrainUtil.train(TrainUtil.java:182)
> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:360)
> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:426)
> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:458)
> at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:201)
> at opennlp.tools.cmdline.CLI.main(CLI.java:191)
>
>
>
>