You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Thomas Gilbert <th...@hotmail.co.uk> on 2011/12/18 21:41:09 UTC

Feature definition & NER sensitivity

Hello there,
A couple of questions (please feel free to refer me to resources which could explain, and that I might have missed):
1/ With regard to the feature cut-off parameter for namefinder model training, what is the definition of a 'feature'? - the entire tagged string, including all tokens (i.e. in the tag <START> John Smith <END>, the entity 'John Smith' is the feature)? Or each token inside a tag (i.e. both 'John' and 'Smith' are features)? Or neither?!
2/ Is there a way to adjust the sensitivity of the named entity recognition so as to favour precision over recall, or vice versa? Does the algorithm automatically adjust the NER sensitivity to maximise the F-measure?
Again, if I've missed something and you know of reading material which might explain this, please point me to it!
Thanks for your time,
Tom