You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by "Jim foo.bar" <ji...@gmail.com> on 2012/12/11 13:07:24 UTC

can someone PLEASE explain the PriorOutcome feature generator?

I've asked a couple of times before but I got no answer! Jorn replied to 
me at some point but his response was very brief and confused me even more!

I'm begging you!!! I'm writing a paper for BMC bioinformatics and I 
cannot explain this feature properly! I'm struggling to find information 
on the web...


PLEASE, PLEASE, PLEASE, devote 5 minutes to explain what this feature 
does and how it works. An example would be awesome...

thanks in advance...

Jim

Re: can someone PLEASE explain the PriorOutcome feature generator?

Posted by "Jim foo.bar" <ji...@gmail.com>.

I found this answer of Jorn's to someone else but I still do not 
understand how the

*OutcomePriorFeatureGenerator*is used!!!


TokenFeatureGenerator:
+ lower cased token, with a window of 2

A window of two means that the feature is generated
for two previous and two next words also.

TokenClassFeatureGenerator:
+ token class (that contains things like first letter is capital)
+ token class combined the the lower cased token

Both featurs are generated with a window length of 2

PreviousMapFeatureGenerator:
+ previous decision features, if the word has been seen before
     in the document

BigramNameFeatureGenerator:
+ token bigram feature, with previous word
+ token bigram feature with previous token class
+ token bigram feature with next word
+ token bigram feature with next token class

SentenceFeatureGenerator
+ Sentence begin feature
*
OutcomePriorFeatureGenerator
+ always generates a default feature -> what does that mean???

*
Jim*
*





On 11/12/12 12:07, Jim foo.bar wrote:
> I've asked a couple of times before but I got no answer! Jorn replied 
> to me at some point but his response was very brief and confused me 
> even more!
>
> I'm begging you!!! I'm writing a paper for BMC bioinformatics and I 
> cannot explain this feature properly! I'm struggling to find 
> information on the web...
>
>
> PLEASE, PLEASE, PLEASE, devote 5 minutes to explain what this feature 
> does and how it works. An example would be awesome...
>
> thanks in advance...
>
> Jim

Re: can someone PLEASE explain the PriorOutcome feature generator?

Posted by Jörn Kottmann <ko...@gmail.com>.

The numbers depend on your training data. The outcomes in the name 
finder are start, cont, and other.
Depending on how you train they might contain the type e.g. 
start-person, cont-person, other.

We are speaking here about the outcomes used for the classifier which 
the name finder uses to predict
which tokens belong to an entity or not.

Jörn

On 12/12/2012 06:39 PM, Jim - FooBar(); wrote:
> Hmmm... this definitely makes more sense...Thank you Jorn. So, to come 
> back to the NameFinder, what would be the outcomes? In a previous 
> response you say:
>
> E.g. for the name it could be:
> start 5%
> cont 10%
> other 85%
>
> Now, from what I understand your 2 responses contradict each 
> other...Judging by your latest answer, the outcomes for the NameFinder 
> would be the entities we're trying to find plus something like "no" 
> (for tokens that are NOT any of the entities we 're looking for). So 
> if we're looking for a single type of entity (e.g person), then the 
> outcomes would be "Person" & "None" - yes?  I apologise for asking 
> again and again but I need to be clear about what this feature does 
> (in NER context - not tokenizing), if I'm going to include it in my 
> publication...
>
> Thanks again, I really appreciate your time and your responses...
>
> Jim
>
>
> On 12/12/12 16:38, Jörn Kottmann wrote:
>> The default feature (the prior feature) produced by the prior feature 
>> generator
>> is the same for every context and can be used to measure the 
>> distribution of the outcomes in the training data.
>> Some outcomes are usually much more frequent than others, depending 
>> on the task,
>> e.g. in the tokenizer NO_SPLIT is much more common than SPLIT.
>>
>> HTH,
>> Jörn 
>

Re: can someone PLEASE explain the PriorOutcome feature generator?

Posted by "Jim - FooBar();" <ji...@gmail.com>.

Hmmm... this definitely makes more sense...Thank you Jorn. So, to come 
back to the NameFinder, what would be the outcomes? In a previous 
response you say:

E.g. for the name it could be:
start 5%
cont 10%
other 85%

Now, from what I understand your 2 responses contradict each 
other...Judging by your latest answer, the outcomes for the NameFinder 
would be the entities we're trying to find plus something like "no" (for 
tokens that are NOT any of the entities we 're looking for). So if we're 
looking for a single type of entity (e.g person), then the outcomes 
would be "Person" & "None" - yes?  I apologise for asking again and 
again but I need to be clear about what this feature does (in NER 
context - not tokenizing), if I'm going to include it in my publication...

Thanks again, I really appreciate your time and your responses...

Jim

On 12/12/12 16:38, Jörn Kottmann wrote:
> The default feature (the prior feature) produced by the prior feature 
> generator
> is the same for every context and can be used to measure the 
> distribution of the outcomes in the training data.
> Some outcomes are usually much more frequent than others, depending on 
> the task,
> e.g. in the tokenizer NO_SPLIT is much more common than SPLIT.
>
> HTH,
> Jörn

Re: can someone PLEASE explain the PriorOutcome feature generator?

Posted by Jörn Kottmann <ko...@gmail.com>.

The default feature (the prior feature) produced by the prior feature 
generator
is the same for every context and can be used to measure the 
distribution of the outcomes in the training data.
Some outcomes are usually much more frequent than others, depending on 
the task,
e.g. in the tokenizer NO_SPLIT is much more common than SPLIT.

HTH,
Jörn

On 12/11/2012 01:07 PM, Jim foo.bar wrote:
> I've asked a couple of times before but I got no answer! Jorn replied 
> to me at some point but his response was very brief and confused me 
> even more!
>
> I'm begging you!!! I'm writing a paper for BMC bioinformatics and I 
> cannot explain this feature properly! I'm struggling to find 
> information on the web...
>
>
> PLEASE, PLEASE, PLEASE, devote 5 minutes to explain what this feature 
> does and how it works. An example would be awesome...
>
> thanks in advance...
>
> Jim