You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Amir Pouya Agha Sadeghi <am...@gmail.com> on 2014/02/10 11:56:22 UTC

OpenNLO Name Finder Custom Feature Generation

Hello Guys, I'm new to OpenNLP. I'm currently developing Persian NER with
it , but unfortunately my recall is to low  and I decided to work with the
custom feature generator. I need to add some feature like word suffix and
work prefix or pass the word POS tag to it. I can not find any
understandable manual for it and because of that I'm literally lost.  Can
any one help me?

Re: OpenNLO Name Finder Custom Feature Generation

Posted by Jörn Kottmann <ko...@gmail.com>.
On 02/10/2014 11:56 AM, Amir Pouya Agha Sadeghi wrote:
> Hello Guys, I'm new to OpenNLP. I'm currently developing Persian NER with
> it , but unfortunately my recall is to low  and I decided to work with the
> custom feature generator. I need to add some feature like word suffix and
> work prefix or pass the word POS tag to it. I can not find any
> understandable manual for it and because of that I'm literally lost.  Can
> any one help me?
>

You should first create a custom feature generator descriptor and train 
with it.
Have a look here:
http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind.training.featuregen

To get started, try the default one.
Which is:
<generators>
   <cache>
     <generators>
       <window prevLength = "2" nextLength = "2">
         <tokenclass/>
       </window>
       <window prevLength = "2" nextLength = "2">
         <token/>
       </window>
       <definition/>
       <prevmap/>
       <bigram/>
       <sentence begin="true" end="false"/>
     </generators>
   </cache>
</generators>

You need to copy that into a file and then use the -featuregen option of the
training tool to point to it.

Maybe like that:
<generators>
   <cache>
     <generators>
       <window prevLength = "2" nextLength = "2">
         <tokenclass/>
       </window>
       <window prevLength = "2" nextLength = "2">
         <token/>
       </window>
       <definition/>
       <prevmap/>
       <bigram/>
       <sentence begin="true" end="false"/>
       <custom class="com.mydomain.PersonSuffixFeatureGenerator"/>
     </generators>
   </cache>
</generators>

After that you can extend the generator description with the custom tag and
point it to a feature generator you implemented. OpenNLP contains many good
samples in the opennlp.tools.util.featuregen package e.g. 
SuffixFeatureGenerator.

The documentation contains a table explaining what the different feature 
generators are doing,
some can be chained, like the window feature generator.

The custom feature generator needs to be on the classpath, otherwise OpenNLP
can't load it. This you could achieve by just starting the 
opennlp.tools.cmdline.CLI.main
from your IDE.

HTH,
Jörn