You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by "Jim foo.bar" <ji...@gmail.com> on 2012/12/11 13:07:24 UTC
can someone PLEASE explain the PriorOutcome feature generator?
I've asked a couple of times before but I got no answer! Jorn replied to
me at some point but his response was very brief and confused me even more!
I'm begging you!!! I'm writing a paper for BMC bioinformatics and I
cannot explain this feature properly! I'm struggling to find information
on the web...
PLEASE, PLEASE, PLEASE, devote 5 minutes to explain what this feature
does and how it works. An example would be awesome...
thanks in advance...
Jim
Re: can someone PLEASE explain the PriorOutcome feature generator?
Posted by "Jim foo.bar" <ji...@gmail.com>.
I found this answer of Jorn's to someone else but I still do not
understand how the
*OutcomePriorFeatureGenerator*is used!!!
TokenFeatureGenerator:
+ lower cased token, with a window of 2
A window of two means that the feature is generated
for two previous and two next words also.
TokenClassFeatureGenerator:
+ token class (that contains things like first letter is capital)
+ token class combined the the lower cased token
Both featurs are generated with a window length of 2
PreviousMapFeatureGenerator:
+ previous decision features, if the word has been seen before
in the document
BigramNameFeatureGenerator:
+ token bigram feature, with previous word
+ token bigram feature with previous token class
+ token bigram feature with next word
+ token bigram feature with next token class
SentenceFeatureGenerator
+ Sentence begin feature
*
OutcomePriorFeatureGenerator
+ always generates a default feature -> what does that mean???
*
Jim*
*
On 11/12/12 12:07, Jim foo.bar wrote:
> I've asked a couple of times before but I got no answer! Jorn replied
> to me at some point but his response was very brief and confused me
> even more!
>
> I'm begging you!!! I'm writing a paper for BMC bioinformatics and I
> cannot explain this feature properly! I'm struggling to find
> information on the web...
>
>
> PLEASE, PLEASE, PLEASE, devote 5 minutes to explain what this feature
> does and how it works. An example would be awesome...
>
> thanks in advance...
>
> Jim
Re: can someone PLEASE explain the PriorOutcome feature generator?
Posted by Jörn Kottmann <ko...@gmail.com>.
The numbers depend on your training data. The outcomes in the name
finder are start, cont, and other.
Depending on how you train they might contain the type e.g.
start-person, cont-person, other.
We are speaking here about the outcomes used for the classifier which
the name finder uses to predict
which tokens belong to an entity or not.
Jörn
On 12/12/2012 06:39 PM, Jim - FooBar(); wrote:
> Hmmm... this definitely makes more sense...Thank you Jorn. So, to come
> back to the NameFinder, what would be the outcomes? In a previous
> response you say:
>
> E.g. for the name it could be:
> start 5%
> cont 10%
> other 85%
>
> Now, from what I understand your 2 responses contradict each
> other...Judging by your latest answer, the outcomes for the NameFinder
> would be the entities we're trying to find plus something like "no"
> (for tokens that are NOT any of the entities we 're looking for). So
> if we're looking for a single type of entity (e.g person), then the
> outcomes would be "Person" & "None" - yes? I apologise for asking
> again and again but I need to be clear about what this feature does
> (in NER context - not tokenizing), if I'm going to include it in my
> publication...
>
> Thanks again, I really appreciate your time and your responses...
>
> Jim
>
>
> On 12/12/12 16:38, Jörn Kottmann wrote:
>> The default feature (the prior feature) produced by the prior feature
>> generator
>> is the same for every context and can be used to measure the
>> distribution of the outcomes in the training data.
>> Some outcomes are usually much more frequent than others, depending
>> on the task,
>> e.g. in the tokenizer NO_SPLIT is much more common than SPLIT.
>>
>> HTH,
>> Jörn
>
Re: can someone PLEASE explain the PriorOutcome feature generator?
Posted by "Jim - FooBar();" <ji...@gmail.com>.
Hmmm... this definitely makes more sense...Thank you Jorn. So, to come
back to the NameFinder, what would be the outcomes? In a previous
response you say:
E.g. for the name it could be:
start 5%
cont 10%
other 85%
Now, from what I understand your 2 responses contradict each
other...Judging by your latest answer, the outcomes for the NameFinder
would be the entities we're trying to find plus something like "no" (for
tokens that are NOT any of the entities we 're looking for). So if we're
looking for a single type of entity (e.g person), then the outcomes
would be "Person" & "None" - yes? I apologise for asking again and
again but I need to be clear about what this feature does (in NER
context - not tokenizing), if I'm going to include it in my publication...
Thanks again, I really appreciate your time and your responses...
Jim
On 12/12/12 16:38, Jörn Kottmann wrote:
> The default feature (the prior feature) produced by the prior feature
> generator
> is the same for every context and can be used to measure the
> distribution of the outcomes in the training data.
> Some outcomes are usually much more frequent than others, depending on
> the task,
> e.g. in the tokenizer NO_SPLIT is much more common than SPLIT.
>
> HTH,
> Jörn
Re: can someone PLEASE explain the PriorOutcome feature generator?
Posted by Jörn Kottmann <ko...@gmail.com>.
The default feature (the prior feature) produced by the prior feature
generator
is the same for every context and can be used to measure the
distribution of the outcomes in the training data.
Some outcomes are usually much more frequent than others, depending on
the task,
e.g. in the tokenizer NO_SPLIT is much more common than SPLIT.
HTH,
Jörn
On 12/11/2012 01:07 PM, Jim foo.bar wrote:
> I've asked a couple of times before but I got no answer! Jorn replied
> to me at some point but his response was very brief and confused me
> even more!
>
> I'm begging you!!! I'm writing a paper for BMC bioinformatics and I
> cannot explain this feature properly! I'm struggling to find
> information on the web...
>
>
> PLEASE, PLEASE, PLEASE, devote 5 minutes to explain what this feature
> does and how it works. An example would be awesome...
>
> thanks in advance...
>
> Jim