You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Michael Rüegg <ru...@gmail.com> on 2014/07/23 18:36:30 UTC

Extend OpenNLP POS tagger model

Hi,

I'm currently using OpenNLP's POS tagger with the model
'en-pos-maxent.bin'. I would like to extend this model because it is
missing a few words I need to have tagged as proper nouns (NNP). How
can I achieve this?

I was not able to find the source of the 'en-pos-maxent' model, only
its binary. I don't want to create my own model, just extend the
existing one.

Any help would be much appreciated.

Thanks,
Michael

Re: Extend OpenNLP POS tagger model

Posted by Michael Rüegg <ru...@gmail.com>.
Hi Jörn,

Thanks for your reply. But how can I then retrain the existing model?
This is what I currently use to tag words:

File posModelFile = new File(getModelDir(), "en-pos-maxent.bin");
FileInputStream posModelStream = new FileInputStream(posModelFile);
POSModel model = new POSModel(posModelStream);
POSTaggerME tagger = new POSTaggerME(model);
String[] words = SimpleTokenizer.INSTANCE.tokenize("my text to classify.");
String[] result = tagger.tag(words);

I already figured out how to create my own model:

FileInputStream dataIn = new FileInputStream("my-en-pos.train.txt");
ObjectStream<String> os = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<POSSample> sampleStream = new WordTagSampleStream(os);
POSModel model = POSTaggerME.train("en", sampleStream,
TrainingParameters.defaultParams(), null, null);
save(model);


But how can I take an existing source model as a basis (e.g., "en-pos-maxent")?


Thanks in advance,
Michael

2014-07-24 14:08 GMT+02:00 Jörn Kottmann <ko...@gmail.com>:
> On 07/23/2014 06:36 PM, Michael Rüegg wrote:
>>
>> I was not able to find the source of the 'en-pos-maxent' model, only
>> its binary. I don't want to create my own model, just extend the
>> existing one.
>
>
> The OpenNLP models can't be extended. You will have to retrain to be able to
> change the
> tag set.
>
> Jörn

Re: Extend OpenNLP POS tagger model

Posted by Jörn Kottmann <ko...@gmail.com>.
On 07/23/2014 06:36 PM, Michael Rüegg wrote:
> I was not able to find the source of the 'en-pos-maxent' model, only
> its binary. I don't want to create my own model, just extend the
> existing one.

The OpenNLP models can't be extended. You will have to retrain to be 
able to change the
tag set.

Jörn