You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Jean-Philippe Fauconnier <je...@irit.fr> on 2012/12/11 12:08:00 UTC

N-Gram feature

Hello,

I'm new in the OpenNLP's community. I use the MaxEnt library for a 
extraction relation task in a corpus of enumeratives structures.

For example, for this enumerative structure as follows ;

"Under the IAU definitions, there are eight planets :
- earth,
- mars,
- etc.
"
This enumerative structure present a ontological relation "IS-A" between 
the classifier "planets" and its items.


I use binaries features, like "has_Classifier" or 
"has_Identical_Tokens_In_Items", etc. But,
intuitively, I think that a N-Gram lemmes features could capture most 
interesting regularities.

For this purpose, I want implement manullay a N-Gram lemmas feature. My 
question is as follows. If n is 3, how I can create a predicate that 
takes into account three lemmas?
Is it necessary to hang lemmas them?

For example, with this sentence :
"The little boy eats an apple."

Is that the predicate can be :
"DET_ADJ_N     ADJ_N_VER    N_VER_DET VER_DET_N      DET_N_PONCT    
myOutcome"
?


Thank you in advance


With regards

J. Fauconnier


Re: N-Gram feature

Posted by "Jim foo.bar" <ji...@gmail.com>.
On 11/12/12 11:59, Jean-Philippe Fauconnier wrote:
>
> In fact, I really want to understand how it works internally. 

well, all you need to do then is to have a look at the code of 
NGramFeaureGenerator and how it encodes this feature so you can 
implement your own version.

Jim

Re: N-Gram feature

Posted by Jean-Philippe Fauconnier <je...@gmail.com>.
Hello,

Yes and no.

Yes, because it's the very simple way to obtain this feature.
No, because I want use the identical predicate an another 
MaxEntClassifier (like MegaM).

In fact, I really want to understand how it works internally.

J. Fauconnier






Le 11/12/2012 12:54, Jim foo.bar a écrit :
> openNLP has a built-in n-grams feature generator which accepts a 
> window (i.e. 2 previous tokens + 2 next tokens)
>
> Is this what you want?
>
> Jim
>
>
> On 11/12/12 11:08, Jean-Philippe Fauconnier wrote:
>>
>> Hello,
>>
>> I'm new in the OpenNLP's community. I use the MaxEnt library for a 
>> extraction relation task in a corpus of enumeratives structures.
>>
>> For example, for this enumerative structure as follows ;
>>
>> "Under the IAU definitions, there are eight planets :
>> - earth,
>> - mars,
>> - etc.
>> "
>> This enumerative structure present a ontological relation "IS-A" 
>> between the classifier "planets" and its items.
>>
>>
>> I use binaries features, like "has_Classifier" or 
>> "has_Identical_Tokens_In_Items", etc. But,
>> intuitively, I think that a N-Gram lemmes features could capture most 
>> interesting regularities.
>>
>> For this purpose, I want implement manullay a N-Gram lemmas feature. 
>> My question is as follows. If n is 3, how I can create a predicate 
>> that takes into account three lemmas?
>> Is it necessary to hang lemmas them?
>>
>> For example, with this sentence :
>> "The little boy eats an apple."
>>
>> Is that the predicate can be :
>> "DET_ADJ_N     ADJ_N_VER    N_VER_DET VER_DET_N DET_N_PONCT myOutcome"
>> ?
>>
>>
>> Thank you in advance
>>
>>
>> With regards
>>
>> J. Fauconnier
>>
>


Re: N-Gram feature

Posted by "Jim foo.bar" <ji...@gmail.com>.
openNLP has a built-in n-grams feature generator which accepts a window 
(i.e. 2 previous tokens + 2 next tokens)

Is this what you want?

Jim


On 11/12/12 11:08, Jean-Philippe Fauconnier wrote:
>
> Hello,
>
> I'm new in the OpenNLP's community. I use the MaxEnt library for a 
> extraction relation task in a corpus of enumeratives structures.
>
> For example, for this enumerative structure as follows ;
>
> "Under the IAU definitions, there are eight planets :
> - earth,
> - mars,
> - etc.
> "
> This enumerative structure present a ontological relation "IS-A" 
> between the classifier "planets" and its items.
>
>
> I use binaries features, like "has_Classifier" or 
> "has_Identical_Tokens_In_Items", etc. But,
> intuitively, I think that a N-Gram lemmes features could capture most 
> interesting regularities.
>
> For this purpose, I want implement manullay a N-Gram lemmas feature. 
> My question is as follows. If n is 3, how I can create a predicate 
> that takes into account three lemmas?
> Is it necessary to hang lemmas them?
>
> For example, with this sentence :
> "The little boy eats an apple."
>
> Is that the predicate can be :
> "DET_ADJ_N     ADJ_N_VER    N_VER_DET VER_DET_N DET_N_PONCT    myOutcome"
> ?
>
>
> Thank you in advance
>
>
> With regards
>
> J. Fauconnier
>