You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by "Richard Head Jr." <hs...@yahoo.com> on 2013/04/23 02:56:31 UTC

TokenNameFinderTrainer Error: Model not compatible with name finder!

Using 1.5.2. My training data looks like this: 

Guacamole Dip: 5 Hass <start:term> Avocados <end>, <start:term>
Jalapeno <end> Puree with <start:term> Salt <end> and <start:term> BHT <end> (preservative).

Here's the command I'm using: 

opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data terms.train -model terms.bin

I found a message on this list acknowledging this as a bug that should have been fixed in 1.5.1: http://www.mail-archive.com/opennlp-users@incubator.apache.org/msg00162.html

I should also note that the docs and the above message say that entities should be marked using the "<START:xxx> <END>" format. When I use uppercase tags I receive the following error: 

Computing event counts...  java.io.IOException: Found unexpected annotation while handling a name sequence: meal <END>, ###<START:term>### sugar <END>,
Incorporating indexed data for training...  
Exception in thread "main" java.lang.NullPointerException
	at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
	at opennlp.maxent.GIS.trainModel(GIS.java:256)
	at opennlp.model.TrainUtil.train(TrainUtil.java:182)
	at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:360)
	at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:426)
	at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:458)
	at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:201)
	at opennlp.tools.cmdline.CLI.main(CLI.java:191)

Re: TokenNameFinderTrainer Error: Model not compatible with name finder!

Posted by James Kosin <ja...@gmail.com>.

On 4/23/2013 2:23 AM, Jörn Kottmann wrote:
> On 04/23/2013 02:56 AM, Richard Head Jr. wrote:
>> I should also note that the docs and the above message say that 
>> entities should be marked using the "<START:xxx> <END>" format. When 
>> I use uppercase tags I receive the following error:
>
> Only upper case tags are recognized, lower case tags are ignored, the 
> "Model not compatible execption" is due
> to the fact that the model is expected to contain all outcomes, but 
> that is not the case if it only contains "other"
> (a statistical model is also of no use, if it can only predict one 
> outcome).
>
> Jörn
>
Maybe we could build a series of validation tools?

Re: TokenNameFinderTrainer Error: Model not compatible with name finder!

Posted by Jörn Kottmann <ko...@gmail.com>.

On 04/23/2013 02:56 AM, Richard Head Jr. wrote:
> I should also note that the docs and the above message say that entities should be marked using the "<START:xxx> <END>" format. When I use uppercase tags I receive the following error:

Only upper case tags are recognized, lower case tags are ignored, the 
"Model not compatible execption" is due
to the fact that the model is expected to contain all outcomes, but that 
is not the case if it only contains "other"
(a statistical model is also of no use, if it can only predict one outcome).

Jörn

Re: TokenNameFinderTrainer Error: Model not compatible with name finder!

Posted by James Kosin <ja...@gmail.com>.

On 4/26/2013 2:23 AM, Jörn Kottmann wrote:
> On 04/26/2013 01:06 AM, James Kosin wrote:
>> Jorn,
>>
>> Wouldn't this be hard because the training doesn't happen until the 
>> sentences have gone through the event streaming process?
>
>
> All the messages which are print to the console about a sample object 
> are hard to track down without an id, to fix this the id must probably
> come from the sample object itself. It should be possible to extend 
> the sample objects with ids and also the training formats.
>
> Jörn
>
>
I see, we could set this ID to be the line number in the original input 
stream.

Re: TokenNameFinderTrainer Error: Model not compatible with name finder!

Posted by Jörn Kottmann <ko...@gmail.com>.

On 04/26/2013 01:06 AM, James Kosin wrote:
> Jorn,
>
> Wouldn't this be hard because the training doesn't happen until the 
> sentences have gone through the event streaming process?

All the messages which are print to the console about a sample object 
are hard to track down without an id, to fix this the id must probably
come from the sample object itself. It should be possible to extend the 
sample objects with ids and also the training formats.

Jörn

Re: TokenNameFinderTrainer Error: Model not compatible with name finder!

Posted by James Kosin <ja...@gmail.com>.

Jorn,

Wouldn't this be hard because the training doesn't happen until the 
sentences have gone through the event streaming process?

James

On 4/25/2013 2:36 AM, Jörn Kottmann wrote:
> On 04/25/2013 06:53 AM, Richard Head Jr. wrote:
>> Thanks for clarifying. Once I fixed this I ran into similar errors 
>> with different sentences in the training file. It would be really 
>> helpful if a line/column number was included in the message. I had a 
>> lot of sentences (relatively speaking) so some of the errors were 
>> hard to track down.
>
> Yes, good point, thats a problem I have all the time too. We should 
> use a line number (or sentence counter)
> and an id a user can specify for a document. These should be used in 
> the log messages we produce during
> training and evaluation.
>
> Jörn
>

Re: TokenNameFinderTrainer Error: Model not compatible with name finder!

Posted by Jörn Kottmann <ko...@gmail.com>.

On 04/25/2013 06:53 AM, Richard Head Jr. wrote:
> Thanks for clarifying. Once I fixed this I ran into similar errors with different sentences in the training file. It would be really helpful if a line/column number was included in the message. I had a lot of sentences (relatively speaking) so some of the errors were hard to track down.

Yes, good point, thats a problem I have all the time too. We should use 
a line number (or sentence counter)
and an id a user can specify for a document. These should be used in the 
log messages we produce during
training and evaluation.

Jörn

Re: TokenNameFinderTrainer Error: Model not compatible with name finder!

Posted by "Richard Head Jr." <hs...@yahoo.com>.

Thanks for clarifying. Once I fixed this I ran into similar errors with different sentences in the training file. It would be really helpful if a line/column number was included in the message. I had a lot of sentences (relatively speaking) so some of the errors were hard to track down. 

--- On Mon, 4/22/13, James Kosin <ja...@gmail.com> wrote:
> From: James Kosin <ja...@gmail.com>
> Subject: Re: TokenNameFinderTrainer Error: Model not compatible with name finder!
> To: users@opennlp.apache.org
> Date: Monday, April 22, 2013, 6:58 PM
> Richard,
> 
> The problem is the ',' after then <END> tag.
> 
> <START:term> Avocados <END> , ....
> 
> The error is because <END>, is not an <END>
> token with the ',' butted 
> against it.
> 
> Lower case may seem to work; but, then you don't have any
> tokens... and 
> thereby no data to train.
> 
> James
> 
> On 4/22/2013 8:56 PM, Richard Head Jr. wrote:
> > Using 1.5.2. My training data looks like this:
> >
> > Guacamole Dip: 5 Hass <start:term> Avocados
> <end>, <start:term>
> > Jalapeno <end> Puree with <start:term> Salt
> <end> and <start:term> BHT <end>
> (preservative).
> >
> > Here's the command I'm using:
> >
> > opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en
> -data terms.train -model terms.bin
> >
> > I found a message on this list acknowledging this as a
> bug that should have been fixed in 1.5.1: http://www.mail-archive.com/opennlp-users@incubator.apache.org/msg00162.html
> >
> > I should also note that the docs and the above message
> say that entities should be marked using the
> "<START:xxx> <END>" format. When I use uppercase
> tags I receive the following error:
> >
> > Computing event counts...  java.io.IOException:
> Found unexpected annotation while handling a name sequence:
> meal <END>, ###<START:term>### sugar
> <END>,
> > Incorporating indexed data for training...
> > Exception in thread "main"
> java.lang.NullPointerException
> >     at
> opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
> >     at
> opennlp.maxent.GIS.trainModel(GIS.java:256)
> >     at
> opennlp.model.TrainUtil.train(TrainUtil.java:182)
> >     at
> opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:360)
> >     at
> opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:426)
> >     at
> opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:458)
> >     at
> opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:201)
> >     at
> opennlp.tools.cmdline.CLI.main(CLI.java:191)
> >
> >     
> >
> >
> 
>

Re: TokenNameFinderTrainer Error: Model not compatible with name finder!

Posted by James Kosin <ja...@gmail.com>.

Richard,

The problem is the ',' after then <END> tag.

<START:term> Avocados <END> , ....

The error is because <END>, is not an <END> token with the ',' butted 
against it.

Lower case may seem to work; but, then you don't have any tokens... and 
thereby no data to train.

James

On 4/22/2013 8:56 PM, Richard Head Jr. wrote:
> Using 1.5.2. My training data looks like this:
>
> Guacamole Dip: 5 Hass <start:term> Avocados <end>, <start:term>
> Jalapeno <end> Puree with <start:term> Salt <end> and <start:term> BHT <end> (preservative).
>
> Here's the command I'm using:
>
> opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data terms.train -model terms.bin
>
> I found a message on this list acknowledging this as a bug that should have been fixed in 1.5.1: http://www.mail-archive.com/opennlp-users@incubator.apache.org/msg00162.html
>
> I should also note that the docs and the above message say that entities should be marked using the "<START:xxx> <END>" format. When I use uppercase tags I receive the following error:
>
> Computing event counts...  java.io.IOException: Found unexpected annotation while handling a name sequence: meal <END>, ###<START:term>### sugar <END>,
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
> 	at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
> 	at opennlp.maxent.GIS.trainModel(GIS.java:256)
> 	at opennlp.model.TrainUtil.train(TrainUtil.java:182)
> 	at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:360)
> 	at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:426)
> 	at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:458)
> 	at opennlp.tools.cmdline.namefind.TokenNameFinderTrainerTool.run(TokenNameFinderTrainerTool.java:201)
> 	at opennlp.tools.cmdline.CLI.main(CLI.java:191)
>
>     
>
>