You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by "A. Allen" <th...@gmail.com> on 2010/12/08 20:24:08 UTC

name finder training tool

Hello,

Has anyone been able to train the name finder? I followed the instructions
in the wiki and used pieces of the sample code, but keep getting the
following:

Indexing events using cutoff of 5

Computing event counts...  done. 29376 events
Indexing...  done.
Sorting and merging events... done. Reduced 29376 events to 8313.
Done indexing.
Incorporating indexed data for training...
done.
Number of Event Tokens: 8313
    Number of Outcomes: 1
  Number of Predicates: 11869
...done.
Computing model parameters...
Performing 100 iterations.
  1:  .. loglikelihood=0.0 1.0
  2:  .. loglikelihood=0.0 1.0
Exception in thread "main" java.lang.IllegalArgumentException: Model not
compatible with name finder!
at
opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
at NameTrainer.main(NameTrainer.java:21)

My training data looks like this:
<START:person>Neil Abercrombie<END>
<START:person>Anibal Acevedo-Vila<END>
<START:person>Gary Ackerman<END>
<START:person>Robert Aderholt<END>
<START:person>Daniel Akaka<END>
<START:person>Todd Akin<END>
<START:person>Lamar Alexander<END>
<START:person>Rodney Alexander<END>

I appreciate any help that can be provided . Thank you.

-AA

Re: name finder training tool

Posted by Jörn Kottmann <ko...@gmail.com>.
The default cutoff value is 5, so you need to mention a token
at least 5 times to be included. Add a few more other tokens and
it should run through.

Jörn

On 12/10/10 6:14 PM, A. Allen wrote:
> Thank you for the response. I made changes to my training data to include
> data that aren't names. I used old search term data. I received the same
> error. A sample of the new training data is listed below.
>
> <START:person>cantor<END>
> crs
> debt commission
> hr 4213
> hr3081
> hr5297
> <START:person>johnny isakson<END>
> lame duck session
> paycheck fairness act
> pigford
> unemployment insurance
> <START:person>wyden<END>
> 112th
> 112th Congress
> Dream Act
> GAO
> HR 5712
> Lame Duck
> <START:person>boehner<END>
>
> -AA
>
> On Wed, Dec 8, 2010 at 2:37 PM, Jörn Kottmann<ko...@gmail.com>  wrote:
>
>> Hello,
>>
>> your training data only contains tokens which are
>> the begin or a continuation of a name, but zero "other"
>> tokens.
>>
>> If the name finder would be trained like this, it will always
>> estimate that these are the two only valid outcomes. That should
>> be possible actually (but maybe not useful).
>>
>> I didn't look at the source code, but I guess the error is caused by
>> a bug in the outcome validating code. We should add your case
>> to the unit test and fix the problem
>> .
>> To work around the problem just add a few sentences to your training
>> data which contain normal plain text without names.
>>
>> Please feel free to open a jira issue.
>>
>> Thanks,
>> Jörn
>>
>>
>> On 12/8/10 8:24 PM, A. Allen wrote:
>>
>>> Hello,
>>>
>>> Has anyone been able to train the name finder? I followed the instructions
>>> in the wiki and used pieces of the sample code, but keep getting the
>>> following:
>>>
>>> Indexing events using cutoff of 5
>>>
>>> Computing event counts...  done. 29376 events
>>> Indexing...  done.
>>> Sorting and merging events... done. Reduced 29376 events to 8313.
>>> Done indexing.
>>> Incorporating indexed data for training...
>>> done.
>>> Number of Event Tokens: 8313
>>>      Number of Outcomes: 1
>>>    Number of Predicates: 11869
>>> ...done.
>>> Computing model parameters...
>>> Performing 100 iterations.
>>>    1:  .. loglikelihood=0.0 1.0
>>>    2:  .. loglikelihood=0.0 1.0
>>> Exception in thread "main" java.lang.IllegalArgumentException: Model not
>>> compatible with name finder!
>>> at
>>>
>>> opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
>>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
>>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
>>> at NameTrainer.main(NameTrainer.java:21)
>>>
>>> My training data looks like this:
>>> <START:person>Neil Abercrombie<END>
>>> <START:person>Anibal Acevedo-Vila<END>
>>> <START:person>Gary Ackerman<END>
>>> <START:person>Robert Aderholt<END>
>>> <START:person>Daniel Akaka<END>
>>> <START:person>Todd Akin<END>
>>> <START:person>Lamar Alexander<END>
>>> <START:person>Rodney Alexander<END>
>>>
>>> I appreciate any help that can be provided . Thank you.
>>>
>>> -AA
>>>
>>>


Re: name finder training tool

Posted by Jörn Kottmann <ko...@gmail.com>.
Hello,

we fixed one bug which we believe causes your problem.
Would be nice if you can test with the current trunk code.

Here is the link to the issue:
https://issues.apache.org/jira/browse/OPENNLP-9

Please report your test results to the issue, if it works for
you now, you can just close it. Otherwise leave us a comment.

Thanks,
Jörn

On 12/10/10 6:14 PM, A. Allen wrote:
> Thank you for the response. I made changes to my training data to include
> data that aren't names. I used old search term data. I received the same
> error. A sample of the new training data is listed below.
>
> <START:person>cantor<END>
> crs
> debt commission
> hr 4213
> hr3081
> hr5297
> <START:person>johnny isakson<END>
> lame duck session
> paycheck fairness act
> pigford
> unemployment insurance
> <START:person>wyden<END>
> 112th
> 112th Congress
> Dream Act
> GAO
> HR 5712
> Lame Duck
> <START:person>boehner<END>
>
> -AA
>
> On Wed, Dec 8, 2010 at 2:37 PM, Jörn Kottmann<ko...@gmail.com>  wrote:
>
>> Hello,
>>
>> your training data only contains tokens which are
>> the begin or a continuation of a name, but zero "other"
>> tokens.
>>
>> If the name finder would be trained like this, it will always
>> estimate that these are the two only valid outcomes. That should
>> be possible actually (but maybe not useful).
>>
>> I didn't look at the source code, but I guess the error is caused by
>> a bug in the outcome validating code. We should add your case
>> to the unit test and fix the problem
>> .
>> To work around the problem just add a few sentences to your training
>> data which contain normal plain text without names.
>>
>> Please feel free to open a jira issue.
>>
>> Thanks,
>> Jörn
>>
>>
>> On 12/8/10 8:24 PM, A. Allen wrote:
>>
>>> Hello,
>>>
>>> Has anyone been able to train the name finder? I followed the instructions
>>> in the wiki and used pieces of the sample code, but keep getting the
>>> following:
>>>
>>> Indexing events using cutoff of 5
>>>
>>> Computing event counts...  done. 29376 events
>>> Indexing...  done.
>>> Sorting and merging events... done. Reduced 29376 events to 8313.
>>> Done indexing.
>>> Incorporating indexed data for training...
>>> done.
>>> Number of Event Tokens: 8313
>>>      Number of Outcomes: 1
>>>    Number of Predicates: 11869
>>> ...done.
>>> Computing model parameters...
>>> Performing 100 iterations.
>>>    1:  .. loglikelihood=0.0 1.0
>>>    2:  .. loglikelihood=0.0 1.0
>>> Exception in thread "main" java.lang.IllegalArgumentException: Model not
>>> compatible with name finder!
>>> at
>>>
>>> opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
>>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
>>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
>>> at NameTrainer.main(NameTrainer.java:21)
>>>
>>> My training data looks like this:
>>> <START:person>Neil Abercrombie<END>
>>> <START:person>Anibal Acevedo-Vila<END>
>>> <START:person>Gary Ackerman<END>
>>> <START:person>Robert Aderholt<END>
>>> <START:person>Daniel Akaka<END>
>>> <START:person>Todd Akin<END>
>>> <START:person>Lamar Alexander<END>
>>> <START:person>Rodney Alexander<END>
>>>
>>> I appreciate any help that can be provided . Thank you.
>>>
>>> -AA
>>>
>>>


Re: name finder training tool

Posted by James Kosin <ja...@gmail.com>.
Jorn,

Wouldn't the name dictionary be more of what A. Allen is looking for?

James K

On 12/10/2010 12:14 PM, A. Allen wrote:
> Thank you for the response. I made changes to my training data to include
> data that aren't names. I used old search term data. I received the same
> error. A sample of the new training data is listed below.
>
> <START:person>cantor<END>
> crs
> debt commission
> hr 4213
> hr3081
> hr5297
> <START:person>johnny isakson<END>
> lame duck session
> paycheck fairness act
> pigford
> unemployment insurance
> <START:person>wyden<END>
> 112th
> 112th Congress
> Dream Act
> GAO
> HR 5712
> Lame Duck
> <START:person>boehner<END>
>
> -AA
>
> On Wed, Dec 8, 2010 at 2:37 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>
>> Hello,
>>
>> your training data only contains tokens which are
>> the begin or a continuation of a name, but zero "other"
>> tokens.
>>
>> If the name finder would be trained like this, it will always
>> estimate that these are the two only valid outcomes. That should
>> be possible actually (but maybe not useful).
>>
>> I didn't look at the source code, but I guess the error is caused by
>> a bug in the outcome validating code. We should add your case
>> to the unit test and fix the problem
>> .
>> To work around the problem just add a few sentences to your training
>> data which contain normal plain text without names.
>>
>> Please feel free to open a jira issue.
>>
>> Thanks,
>> Jörn
>>
>>
>> On 12/8/10 8:24 PM, A. Allen wrote:
>>
>>> Hello,
>>>
>>> Has anyone been able to train the name finder? I followed the instructions
>>> in the wiki and used pieces of the sample code, but keep getting the
>>> following:
>>>
>>> Indexing events using cutoff of 5
>>>
>>> Computing event counts...  done. 29376 events
>>> Indexing...  done.
>>> Sorting and merging events... done. Reduced 29376 events to 8313.
>>> Done indexing.
>>> Incorporating indexed data for training...
>>> done.
>>> Number of Event Tokens: 8313
>>>     Number of Outcomes: 1
>>>   Number of Predicates: 11869
>>> ...done.
>>> Computing model parameters...
>>> Performing 100 iterations.
>>>   1:  .. loglikelihood=0.0 1.0
>>>   2:  .. loglikelihood=0.0 1.0
>>> Exception in thread "main" java.lang.IllegalArgumentException: Model not
>>> compatible with name finder!
>>> at
>>>
>>> opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
>>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
>>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
>>> at NameTrainer.main(NameTrainer.java:21)
>>>
>>> My training data looks like this:
>>> <START:person>Neil Abercrombie<END>
>>> <START:person>Anibal Acevedo-Vila<END>
>>> <START:person>Gary Ackerman<END>
>>> <START:person>Robert Aderholt<END>
>>> <START:person>Daniel Akaka<END>
>>> <START:person>Todd Akin<END>
>>> <START:person>Lamar Alexander<END>
>>> <START:person>Rodney Alexander<END>
>>>
>>> I appreciate any help that can be provided . Thank you.
>>>
>>> -AA
>>>
>>>


Re: name finder training tool

Posted by "A. Allen" <th...@gmail.com>.
Thank you for the response. I made changes to my training data to include
data that aren't names. I used old search term data. I received the same
error. A sample of the new training data is listed below.

<START:person>cantor<END>
crs
debt commission
hr 4213
hr3081
hr5297
<START:person>johnny isakson<END>
lame duck session
paycheck fairness act
pigford
unemployment insurance
<START:person>wyden<END>
112th
112th Congress
Dream Act
GAO
HR 5712
Lame Duck
<START:person>boehner<END>

-AA

On Wed, Dec 8, 2010 at 2:37 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> Hello,
>
> your training data only contains tokens which are
> the begin or a continuation of a name, but zero "other"
> tokens.
>
> If the name finder would be trained like this, it will always
> estimate that these are the two only valid outcomes. That should
> be possible actually (but maybe not useful).
>
> I didn't look at the source code, but I guess the error is caused by
> a bug in the outcome validating code. We should add your case
> to the unit test and fix the problem
> .
> To work around the problem just add a few sentences to your training
> data which contain normal plain text without names.
>
> Please feel free to open a jira issue.
>
> Thanks,
> Jörn
>
>
> On 12/8/10 8:24 PM, A. Allen wrote:
>
>> Hello,
>>
>> Has anyone been able to train the name finder? I followed the instructions
>> in the wiki and used pieces of the sample code, but keep getting the
>> following:
>>
>> Indexing events using cutoff of 5
>>
>> Computing event counts...  done. 29376 events
>> Indexing...  done.
>> Sorting and merging events... done. Reduced 29376 events to 8313.
>> Done indexing.
>> Incorporating indexed data for training...
>> done.
>> Number of Event Tokens: 8313
>>     Number of Outcomes: 1
>>   Number of Predicates: 11869
>> ...done.
>> Computing model parameters...
>> Performing 100 iterations.
>>   1:  .. loglikelihood=0.0 1.0
>>   2:  .. loglikelihood=0.0 1.0
>> Exception in thread "main" java.lang.IllegalArgumentException: Model not
>> compatible with name finder!
>> at
>>
>> opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
>> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
>> at NameTrainer.main(NameTrainer.java:21)
>>
>> My training data looks like this:
>> <START:person>Neil Abercrombie<END>
>> <START:person>Anibal Acevedo-Vila<END>
>> <START:person>Gary Ackerman<END>
>> <START:person>Robert Aderholt<END>
>> <START:person>Daniel Akaka<END>
>> <START:person>Todd Akin<END>
>> <START:person>Lamar Alexander<END>
>> <START:person>Rodney Alexander<END>
>>
>> I appreciate any help that can be provided . Thank you.
>>
>> -AA
>>
>>
>

Re: name finder training tool

Posted by Jörn Kottmann <ko...@gmail.com>.
Hello,

your training data only contains tokens which are
the begin or a continuation of a name, but zero "other"
tokens.

If the name finder would be trained like this, it will always
estimate that these are the two only valid outcomes. That should
be possible actually (but maybe not useful).

I didn't look at the source code, but I guess the error is caused by
a bug in the outcome validating code. We should add your case
to the unit test and fix the problem
.
To work around the problem just add a few sentences to your training
data which contain normal plain text without names.

Please feel free to open a jira issue.

Thanks,
Jörn

On 12/8/10 8:24 PM, A. Allen wrote:
> Hello,
>
> Has anyone been able to train the name finder? I followed the instructions
> in the wiki and used pieces of the sample code, but keep getting the
> following:
>
> Indexing events using cutoff of 5
>
> Computing event counts...  done. 29376 events
> Indexing...  done.
> Sorting and merging events... done. Reduced 29376 events to 8313.
> Done indexing.
> Incorporating indexed data for training...
> done.
> Number of Event Tokens: 8313
>      Number of Outcomes: 1
>    Number of Predicates: 11869
> ...done.
> Computing model parameters...
> Performing 100 iterations.
>    1:  .. loglikelihood=0.0 1.0
>    2:  .. loglikelihood=0.0 1.0
> Exception in thread "main" java.lang.IllegalArgumentException: Model not
> compatible with name finder!
> at
> opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
> at NameTrainer.main(NameTrainer.java:21)
>
> My training data looks like this:
> <START:person>Neil Abercrombie<END>
> <START:person>Anibal Acevedo-Vila<END>
> <START:person>Gary Ackerman<END>
> <START:person>Robert Aderholt<END>
> <START:person>Daniel Akaka<END>
> <START:person>Todd Akin<END>
> <START:person>Lamar Alexander<END>
> <START:person>Rodney Alexander<END>
>
> I appreciate any help that can be provided . Thank you.
>
> -AA
>