You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by "Jim foo.bar" <ji...@gmail.com> on 2012/11/20 13:08:56 UTC
better explanation of some features?
Hi all,
I am trying to properly understand all the built-in features of openNLP
but I'm having some trouble with some of them...
The maxent introduction page [1] mentions:
> So, say you want to implement a program which uses maxent to find
> names in a text., such as:
>
> /He succeeds Terrence D. Daniels, formerly a W.R. Grace vice
> chairman, who resigned./
>
> If you are currently looking at the word /Terrence/ and are trying to
> decide if it is a name or not, examples of the kinds of features you
> might use are "previous=succeeds", "current=Terrence", "next=D.", and
> "currentWordIsCapitalized". You might even add a feature that says
> that "Terrence" was seen as a name before.
>
I am particularly interested in the last sentence: *" You might even add
a feature that says that "Terrence" was seen as a name before. "*
Does this refer to the "PreviousMapFeatureGenerator" ?
also, a while back I had asked about the *OutcomePriorFeatureGenerator*
and Jorn replied with this:
_it is there to measure the distribution of the outcome_
E.g. for the name it could be:
start 5%
cont 10%
other 85%
In a name-finding context, what does the above example mean? The outcome
is either TRUE or FALSE yes? So the name-finder either recognizes a name
or it doesn't. If Jorn had not shown this example I would understand
that this feature-generator calculates distributions for these 2 boolean
values...However, Jorn's example shows something different which I don't
understand...what is 'start', 'cont' & 'end'? How are these outcomes and
how does that help the name-finder?
thanks in advance...
Jim
[1] http://maxent.sourceforge.net/howto.html
Re: better explanation of some features?
Posted by "Jim foo.bar" <ji...@gmail.com>.
Can anyone help me with this? I'm struggling to find extra documentation
regarding feature generation...
Jim
On 20/11/12 12:08, Jim foo.bar wrote:
> Hi all,
>
> I am trying to properly understand all the built-in features of
> openNLP but I'm having some trouble with some of them...
>
> The maxent introduction page [1] mentions:
>
>> So, say you want to implement a program which uses maxent to find
>> names in a text., such as:
>>
>> /He succeeds Terrence D. Daniels, formerly a W.R. Grace vice
>> chairman, who resigned./
>>
>> If you are currently looking at the word /Terrence/ and are trying to
>> decide if it is a name or not, examples of the kinds of features you
>> might use are "previous=succeeds", "current=Terrence", "next=D.", and
>> "currentWordIsCapitalized". You might even add a feature that says
>> that "Terrence" was seen as a name before.
>>
>
> I am particularly interested in the last sentence: *" You might even
> add a feature that says that "Terrence" was seen as a name before. "*
>
> Does this refer to the "PreviousMapFeatureGenerator" ?
>
> also, a while back I had asked about the
> *OutcomePriorFeatureGenerator* and Jorn replied with this:
>
> _it is there to measure the distribution of the outcome_
>
> E.g. for the name it could be:
> start 5%
> cont 10%
> other 85%
>
> In a name-finding context, what does the above example mean? The
> outcome is either TRUE or FALSE yes? So the name-finder either
> recognizes a name or it doesn't. If Jorn had not shown this example I
> would understand that this feature-generator calculates distributions
> for these 2 boolean values...However, Jorn's example shows something
> different which I don't understand...what is 'start', 'cont' & 'end'?
> How are these outcomes and how does that help the name-finder?
>
> thanks in advance...
>
> Jim
>
> [1] http://maxent.sourceforge.net/howto.html
Re: better explanation of some features?
Posted by "Jim foo.bar" <ji...@gmail.com>.
Hi Jorn,
thanks a lot for replying...I've found the 12 'classes' you're
describing in the 'StringPattern.java' file located in the 'featuregen'
package. This helps me a lot... :-)
How about the other 2 feature generators that I asked about
(PreviousMapFeatureGenerator & OutcomePriorFeatureGenerator )? Could you
elaborate a bit further? I'm really sorry but I've not understood your
example about the *OutcomePriorFeatureGenerator* ...
Documentation says:
*" You might even add a feature that says that "Terrence" was seen as a
name before. "
*Does that refer to the**PreviousMapFeatureGenerator?
thanks a million...
Jim
On 26/11/12 13:02, Jörn Kottmann wrote:
> On 11/20/2012 01:33 PM, Jim foo.bar wrote:
>> also, the only information that I could find about the
>> *TokenClassFeatureGenerator* is this oddly phrased sentence:
>>
>> _"Generates features for different for the class of the token."_
>>
>> How does this generator work?
>> What 'class' does this refer to in a name-finding context? semantic
>> class? If we're looking for genes and drugs, would the classes be
>> "gene", "drug" & presumably "none"?
>
> It assigns a category to a token based on the characters used in it,
> for example:
> - token is initial capital
> - token is all upper case
> - token is numeric
> - token is alpha numeric
> ...
>
> Have a look at the code to see all the classes and on which conditions
> they are assigned.
>
> Jörn
>
Re: better explanation of some features?
Posted by Jörn Kottmann <ko...@gmail.com>.
On 11/20/2012 01:33 PM, Jim foo.bar wrote:
> also, the only information that I could find about the
> *TokenClassFeatureGenerator* is this oddly phrased sentence:
>
> _"Generates features for different for the class of the token."_
>
> How does this generator work?
> What 'class' does this refer to in a name-finding context? semantic
> class? If we're looking for genes and drugs, would the classes be
> "gene", "drug" & presumably "none"?
It assigns a category to a token based on the characters used in it, for
example:
- token is initial capital
- token is all upper case
- token is numeric
- token is alpha numeric
...
Have a look at the code to see all the classes and on which conditions
they are assigned.
Jörn
Re: better explanation of some features?
Posted by "Jim foo.bar" <ji...@gmail.com>.
also, the only information that I could find about the
*TokenClassFeatureGenerator* is this oddly phrased sentence:
_"Generates features for different for the class of the token."_
How does this generator work?
What 'class' does this refer to in a name-finding context? semantic
class? If we're looking for genes and drugs, would the classes be
"gene", "drug" & presumably "none"?
I generally think that openNLP feature-generation should be documented a
bit better/clearer...It is rather important...
again, thanks a lot in advance
Jim
On 20/11/12 12:08, Jim foo.bar wrote:
> Hi all,
>
> I am trying to properly understand all the built-in features of
> openNLP but I'm having some trouble with some of them...
>
> The maxent introduction page [1] mentions:
>
>> So, say you want to implement a program which uses maxent to find
>> names in a text., such as:
>>
>> /He succeeds Terrence D. Daniels, formerly a W.R. Grace vice
>> chairman, who resigned./
>>
>> If you are currently looking at the word /Terrence/ and are trying to
>> decide if it is a name or not, examples of the kinds of features you
>> might use are "previous=succeeds", "current=Terrence", "next=D.", and
>> "currentWordIsCapitalized". You might even add a feature that says
>> that "Terrence" was seen as a name before.
>>
>
> I am particularly interested in the last sentence: *" You might even
> add a feature that says that "Terrence" was seen as a name before. "*
>
> Does this refer to the "PreviousMapFeatureGenerator" ?
>
> also, a while back I had asked about the
> *OutcomePriorFeatureGenerator* and Jorn replied with this:
>
> _it is there to measure the distribution of the outcome_
>
> E.g. for the name it could be:
> start 5%
> cont 10%
> other 85%
>
> In a name-finding context, what does the above example mean? The
> outcome is either TRUE or FALSE yes? So the name-finder either
> recognizes a name or it doesn't. If Jorn had not shown this example I
> would understand that this feature-generator calculates distributions
> for these 2 boolean values...However, Jorn's example shows something
> different which I don't understand...what is 'start', 'cont' & 'end'?
> How are these outcomes and how does that help the name-finder?
>
> thanks in advance...
>
> Jim
>
> [1] http://maxent.sourceforge.net/howto.html