You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by "Jim foo.bar" <ji...@gmail.com> on 2012/11/20 13:08:56 UTC

better explanation of some features?

Hi all,

I am trying to properly understand all the built-in features of openNLP 
but I'm having some trouble with some of them...

The maxent introduction page [1] mentions:

> So, say you want to implement a program which uses maxent to find 
> names in a text., such as:
>
>     /He succeeds Terrence D. Daniels, formerly a W.R. Grace vice
>     chairman, who resigned./ 
>
> If you are currently looking at the word /Terrence/ and are trying to 
> decide if it is a name or not, examples of the kinds of features you 
> might use are "previous=succeeds", "current=Terrence", "next=D.", and 
> "currentWordIsCapitalized".  You might even add a feature that says 
> that "Terrence" was seen as a name before.
>

I am particularly interested in the last sentence: *" You might even add 
a feature that says that "Terrence" was seen as a name before. "*

Does this refer to the "PreviousMapFeatureGenerator" ?

also, a while back I had asked about the *OutcomePriorFeatureGenerator* 
and Jorn replied with this:

_it is there to measure the distribution of the outcome_

E.g. for the name it could be:
start 5%
cont 10%
other 85%

In a name-finding context, what does the above example mean? The outcome 
is either TRUE or FALSE yes? So the name-finder either recognizes a name 
or it doesn't. If Jorn had not shown this example I would understand 
that this feature-generator calculates distributions for these 2 boolean 
values...However, Jorn's example shows something different which I don't 
understand...what is 'start', 'cont' & 'end'? How are these outcomes and 
how does that help the name-finder?

thanks in advance...

Jim

[1] http://maxent.sourceforge.net/howto.html

Re: better explanation of some features?

Posted by "Jim foo.bar" <ji...@gmail.com>.

Can anyone help me with this? I'm struggling to find extra documentation 
regarding feature generation...

Jim


On 20/11/12 12:08, Jim foo.bar wrote:
> Hi all,
>
> I am trying to properly understand all the built-in features of 
> openNLP but I'm having some trouble with some of them...
>
> The maxent introduction page [1] mentions:
>
>> So, say you want to implement a program which uses maxent to find 
>> names in a text., such as:
>>
>>     /He succeeds Terrence D. Daniels, formerly a W.R. Grace vice
>>     chairman, who resigned./ 
>>
>> If you are currently looking at the word /Terrence/ and are trying to 
>> decide if it is a name or not, examples of the kinds of features you 
>> might use are "previous=succeeds", "current=Terrence", "next=D.", and 
>> "currentWordIsCapitalized".  You might even add a feature that says 
>> that "Terrence" was seen as a name before.
>>
>
> I am particularly interested in the last sentence: *" You might even 
> add a feature that says that "Terrence" was seen as a name before. "*
>
> Does this refer to the "PreviousMapFeatureGenerator" ?
>
> also, a while back I had asked about the 
> *OutcomePriorFeatureGenerator* and Jorn replied with this:
>
> _it is there to measure the distribution of the outcome_
>
> E.g. for the name it could be:
> start 5%
> cont 10%
> other 85%
>
> In a name-finding context, what does the above example mean? The 
> outcome is either TRUE or FALSE yes? So the name-finder either 
> recognizes a name or it doesn't. If Jorn had not shown this example I 
> would understand that this feature-generator calculates distributions 
> for these 2 boolean values...However, Jorn's example shows something 
> different which I don't understand...what is 'start', 'cont' & 'end'? 
> How are these outcomes and how does that help the name-finder?
>
> thanks in advance...
>
> Jim
>
> [1] http://maxent.sourceforge.net/howto.html

Re: better explanation of some features?

Posted by "Jim foo.bar" <ji...@gmail.com>.

Hi Jorn,

thanks a lot for replying...I've found the 12 'classes' you're 
describing in the 'StringPattern.java' file located in the 'featuregen' 
package. This helps me a lot... :-)

How about the other 2 feature generators that I asked about 
(PreviousMapFeatureGenerator & OutcomePriorFeatureGenerator )? Could you 
elaborate a bit further? I'm really sorry but I've not understood your 
example about the *OutcomePriorFeatureGenerator* ...

Documentation says:
*" You might even add a feature that says that "Terrence" was seen as a 
name before. "

*Does that refer to the**PreviousMapFeatureGenerator?

thanks a million...

Jim

On 26/11/12 13:02, Jörn Kottmann wrote:
> On 11/20/2012 01:33 PM, Jim foo.bar wrote:
>> also, the only information that I could find about the 
>> *TokenClassFeatureGenerator* is this oddly phrased sentence:
>>
>> _"Generates features for different for the class of the token."_
>>
>> How does this generator work?
>> What 'class' does this refer to in a name-finding context? semantic 
>> class? If we're looking for genes and drugs, would the classes be 
>> "gene", "drug" & presumably "none"? 
>
> It assigns a category to a token based on the characters used in it, 
> for example:
> - token is initial capital
> - token is all upper case
> - token is numeric
> - token is alpha numeric
> ...
>
> Have a look at the code to see all the classes and on which conditions 
> they are assigned.
>
> Jörn
>

Re: better explanation of some features?

Posted by Jörn Kottmann <ko...@gmail.com>.

On 11/20/2012 01:33 PM, Jim foo.bar wrote:
> also, the only information that I could find about the 
> *TokenClassFeatureGenerator* is this oddly phrased sentence:
>
> _"Generates features for different for the class of the token."_
>
> How does this generator work?
> What 'class' does this refer to in a name-finding context? semantic 
> class? If we're looking for genes and drugs, would the classes be 
> "gene", "drug" & presumably "none"? 

It assigns a category to a token based on the characters used in it, for 
example:
- token is initial capital
- token is all upper case
- token is numeric
- token is alpha numeric
...

Have a look at the code to see all the classes and on which conditions 
they are assigned.

Jörn

Re: better explanation of some features?

Posted by "Jim foo.bar" <ji...@gmail.com>.

also, the only information that I could find about the 
*TokenClassFeatureGenerator* is this oddly phrased sentence:

_"Generates features for different for the class of the token."_

How does this generator work?
What 'class' does this refer to in a name-finding context? semantic 
class? If we're looking for genes and drugs, would the classes be 
"gene", "drug" & presumably "none"?

I generally think that openNLP feature-generation should be documented a 
bit better/clearer...It is rather important...

again, thanks a lot in advance

Jim


On 20/11/12 12:08, Jim foo.bar wrote:
> Hi all,
>
> I am trying to properly understand all the built-in features of 
> openNLP but I'm having some trouble with some of them...
>
> The maxent introduction page [1] mentions:
>
>> So, say you want to implement a program which uses maxent to find 
>> names in a text., such as:
>>
>>     /He succeeds Terrence D. Daniels, formerly a W.R. Grace vice
>>     chairman, who resigned./ 
>>
>> If you are currently looking at the word /Terrence/ and are trying to 
>> decide if it is a name or not, examples of the kinds of features you 
>> might use are "previous=succeeds", "current=Terrence", "next=D.", and 
>> "currentWordIsCapitalized".  You might even add a feature that says 
>> that "Terrence" was seen as a name before.
>>
>
> I am particularly interested in the last sentence: *" You might even 
> add a feature that says that "Terrence" was seen as a name before. "*
>
> Does this refer to the "PreviousMapFeatureGenerator" ?
>
> also, a while back I had asked about the 
> *OutcomePriorFeatureGenerator* and Jorn replied with this:
>
> _it is there to measure the distribution of the outcome_
>
> E.g. for the name it could be:
> start 5%
> cont 10%
> other 85%
>
> In a name-finding context, what does the above example mean? The 
> outcome is either TRUE or FALSE yes? So the name-finder either 
> recognizes a name or it doesn't. If Jorn had not shown this example I 
> would understand that this feature-generator calculates distributions 
> for these 2 boolean values...However, Jorn's example shows something 
> different which I don't understand...what is 'start', 'cont' & 'end'? 
> How are these outcomes and how does that help the name-finder?
>
> thanks in advance...
>
> Jim
>
> [1] http://maxent.sourceforge.net/howto.html