You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Amal Elmah <am...@hotmail.com> on 2011/08/01 07:22:20 UTC
default feature generation
Hi there ,
I am currently using Opennlp tool to train a new model for detection names using a specific-domain corpus. I managed to get a relatively good performance After reading the documentation, I know that Opennlp defines a default feature generation but what are these features? do they include initial capitalization and lower case Or what are they exactly ??and how Opennlp tool uses them with maximum entropy to detect the names. I really want to participate in the opennlp project but I am currently busy! Once I finished the work under my hand I will contribute to Opennlp since I have spent approximately 3 months reading about it and using its tools.
thanks in advance,Amal
Re: default feature generation
Posted by Jörn Kottmann <ko...@gmail.com>.
Exactly, the OpenNLP Name Finder defines a default feature generation.
The code which
does it can be found in the NameFinderME.createFeatureGenerator method.
There we instantiate the following feature generators.
TokenFeatureGenerator:
+ lower cased token, with a window of 2
A window of two means that the feature is generated
for two previous and two next words also.
TokenClassFeatureGenerator:
+ token class (that contains things like first letter is capital)
+ token class combined the the lower cased token
Both featurs are generated with a window length of 2
PreviousMapFeatureGenerator:
+ previous decision features, if the word has been seen before
in the document
BigramNameFeatureGenerator:
+ token bigram feature, with previous word
+ token bigram feature with previous token class
+ token bigram feature with next word
+ token bigram feature with next token class
SentenceFeatureGenerator
+ Sentence begin feature
OutcomePriorFeatureGenerator
+ always generates a default feature
Hope this helps,
Jörn
On 8/1/11 7:22 AM, Amal Elmah wrote:
> Hi there ,
> I am currently using Opennlp tool to train a new model for detection names using a specific-domain corpus. I managed to get a relatively good performance After reading the documentation, I know that Opennlp defines a default feature generation but what are these features? do they include initial capitalization and lower case Or what are they exactly ??and how Opennlp tool uses them with maximum entropy to detect the names. I really want to participate in the opennlp project but I am currently busy! Once I finished the work under my hand I will contribute to Opennlp since I have spent approximately 3 months reading about it and using its tools.
> thanks in advance,Amal