You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Amal Elmah <am...@hotmail.com> on 2011/08/01 07:22:20 UTC

default feature generation

Hi there ,
I am currently using Opennlp tool to train a new model for detection names using a specific-domain corpus. I managed to get a relatively good performance After reading the documentation, I know that Opennlp defines a default feature generation but what are these features? do they include initial capitalization and lower case Or what are they exactly ??and how Opennlp tool uses them with maximum entropy to detect the names. I really want to participate in the opennlp project but I am currently busy! Once I finished the work under my hand I will contribute to Opennlp since I have spent approximately 3 months reading about it and using its tools.
thanks in advance,Amal

Re: default feature generation

Posted by Jörn Kottmann <ko...@gmail.com>.

Exactly, the OpenNLP Name Finder defines a default feature generation. 
The code which
does it can be found in the NameFinderME.createFeatureGenerator method.

There we instantiate the following feature generators.

TokenFeatureGenerator:
+ lower cased token, with a window of 2

A window of two means that the feature is generated
for two previous and two next words also.

TokenClassFeatureGenerator:
+ token class (that contains things like first letter is capital)
+ token class combined the the lower cased token

Both featurs are generated with a window length of 2

PreviousMapFeatureGenerator:
+ previous decision features, if the word has been seen before
     in the document

BigramNameFeatureGenerator:
+ token bigram feature, with previous word
+ token bigram feature with previous token class
+ token bigram feature with next word
+ token bigram feature with next token class

SentenceFeatureGenerator
+ Sentence begin feature

OutcomePriorFeatureGenerator
+ always generates a default feature

Hope this helps,
Jörn

On 8/1/11 7:22 AM, Amal Elmah wrote:
> Hi there ,
> I am currently using Opennlp tool to train a new model for detection names using a specific-domain corpus. I managed to get a relatively good performance After reading the documentation, I know that Opennlp defines a default feature generation but what are these features? do they include initial capitalization and lower case Or what are they exactly ??and how Opennlp tool uses them with maximum entropy to detect the names. I really want to participate in the opennlp project but I am currently busy! Once I finished the work under my hand I will contribute to Opennlp since I have spent approximately 3 months reading about it and using its tools.
> thanks in advance,Amal