You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by David Young <dy...@gmail.com> on 2012/09/03 01:45:04 UTC

OpenNLP Maxent Data Format

Hi all, newbie question and it might just be that I do not understand
Maxent.

It seems that the theory behind it is to use as much information as
possible to make predictions (Maximum Entropy).

In the example given on SourceForge the features are the surrounding words
for a given word for example:
"previous=succeeds", "current=Terrence", "next=D"

I have a dataset just like this only it contains 6 surrounding words
including ("previous3=None") when there is no previous3, if for example if
the current word is the beginning of a sentence.

But my question is; what happens when I want to use something like
"next=WordNotInModel", a word that does not exist in the training data, and
still want to get a prediction using the rest of the surrounding context?
Even If I use "next=Unknown" or "next=null" or Null I get an error
"predicateLabel KeyNotFoundException was unhandled". Because "next=
WordNotInModel" is not a known key.

Thanks for your time.

RE: EXTERNAL: Re: OpenNLP Maxent Data Format

Posted by "Fotiadis, Konstantinos" <ko...@lmco.com>.
unsubscribe

Konstantinos Fotiadis
Software Engineer Senior
Innovation Technology Group
Lockheed Martin IS&GS
O: 610.354.7759 | M: 610.331.0013
-----Original Message-----
From: Jörn Kottmann [mailto:kottmann@gmail.com] 
Sent: Thursday, September 06, 2012 4:19 AM
To: users@opennlp.apache.org
Subject: EXTERNAL: Re: OpenNLP Maxent Data Format

Hello,

SharpNLP is C# clone of OpenNLP, I never worked with it or know much about it, sorry.

If you need to work with .NET and want to use OpenNLP, you can try this:
https://cwiki.apache.org/OPENNLP/a-quick-guide-to-using-opennlp-from-net.html

It should not be a problem to pass an non-existing feature to a model, in OpenNLP this is done all the time e.g. if there is word which was not seen in the training data before.

HTH,
Jörn


On 09/04/2012 06:19 PM, David Young wrote:
> Hi thanks for the reply. I am not as familiar with Java so I thought 
> Id produce a model first with SharpEntropy.
> I have not really modified the simple example so It is only several 
> lines of basic code.
>
> This is how it works:
> http://pastebin.com/LK9tNsrj
>
> The training data is as follows:
> http://pastebin.com/3icni8Jc
>
> This example works fine but the problem is when I try to use any words 
> that are not in the training data.
> For example
>              context.Add("oWord=someNewWord")...
>
> This gives an unknown key error because it is not recognised. But I 
> want to make predictions using what is known. The surrounding context.
>
> As a maximum entropy model I have lots of words in training data that 
> should be taken into account when available in addition to each word POS.
> But sometimes in the real data I want to evaluate I have the POS for 
> each word, some words that are in the training data but also in the 
> context there might be words that are not in the training data. How do 
> I still get a prediction in this case using the rest of the context?
>
> Thanks for your time.
>
> On Tue, Sep 4, 2012 at 10:50 AM, Jörn Kottmann <ko...@gmail.com> wrote:
>
>> On 09/03/2012 01:45 AM, David Young wrote:
>>
>>> But my question is; what happens when I want to use something like 
>>> "next=WordNotInModel", a word that does not exist in the training 
>>> data, and still want to get a prediction using the rest of the 
>>> surrounding context?
>>> Even If I use "next=Unknown" or "next=null" or Null I get an error 
>>> "predicateLabel KeyNotFoundException was unhandled". Because "next= 
>>> WordNotInModel" is not a known key.
>>>
>> Usually maxent is used as an API, can you post some code here so we 
>> can see what you are doing? Or do you use one of the command line 
>> utils?
>>
>> Thanks,
>> Jörn
>>


Re: OpenNLP Maxent Data Format

Posted by Jörn Kottmann <ko...@gmail.com>.
Hello,

SharpNLP is C# clone of OpenNLP, I never worked
with it or know much about it, sorry.

If you need to work with .NET and want to use OpenNLP,
you can try this:
https://cwiki.apache.org/OPENNLP/a-quick-guide-to-using-opennlp-from-net.html

It should not be a problem to pass an non-existing feature to a model,
in OpenNLP this is done all the time e.g. if there is word which was not 
seen
in the training data before.

HTH,
Jörn


On 09/04/2012 06:19 PM, David Young wrote:
> Hi thanks for the reply. I am not as familiar with Java so I thought Id
> produce a model first with SharpEntropy.
> I have not really modified the simple example so It is only several lines
> of basic code.
>
> This is how it works:
> http://pastebin.com/LK9tNsrj
>
> The training data is as follows:
> http://pastebin.com/3icni8Jc
>
> This example works fine but the problem is when I try to use any words that
> are not in the training data.
> For example
>              context.Add("oWord=someNewWord")...
>
> This gives an unknown key error because it is not recognised. But I want to
> make predictions using what is known. The surrounding context.
>
> As a maximum entropy model I have lots of words in training data that
> should be taken into account when available in addition to each word POS.
> But sometimes in the real data I want to evaluate I have the POS for each
> word, some words that are in the training data but also in the context
> there might be words that are not in the training data. How do I still get
> a prediction in this case using the rest of the context?
>
> Thanks for your time.
>
> On Tue, Sep 4, 2012 at 10:50 AM, Jörn Kottmann <ko...@gmail.com> wrote:
>
>> On 09/03/2012 01:45 AM, David Young wrote:
>>
>>> But my question is; what happens when I want to use something like
>>> "next=WordNotInModel", a word that does not exist in the training data,
>>> and
>>> still want to get a prediction using the rest of the surrounding context?
>>> Even If I use "next=Unknown" or "next=null" or Null I get an error
>>> "predicateLabel KeyNotFoundException was unhandled". Because "next=
>>> WordNotInModel" is not a known key.
>>>
>> Usually maxent is used as an API, can you post some code here
>> so we can see what you are doing? Or do you use one of the command
>> line utils?
>>
>> Thanks,
>> Jörn
>>


Re: OpenNLP Maxent Data Format

Posted by David Young <dy...@gmail.com>.
Hi thanks for the reply. I am not as familiar with Java so I thought Id
produce a model first with SharpEntropy.
I have not really modified the simple example so It is only several lines
of basic code.

This is how it works:
http://pastebin.com/LK9tNsrj

The training data is as follows:
http://pastebin.com/3icni8Jc

This example works fine but the problem is when I try to use any words that
are not in the training data.
For example
            context.Add("oWord=someNewWord")...

This gives an unknown key error because it is not recognised. But I want to
make predictions using what is known. The surrounding context.

As a maximum entropy model I have lots of words in training data that
should be taken into account when available in addition to each word POS.
But sometimes in the real data I want to evaluate I have the POS for each
word, some words that are in the training data but also in the context
there might be words that are not in the training data. How do I still get
a prediction in this case using the rest of the context?

Thanks for your time.

On Tue, Sep 4, 2012 at 10:50 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 09/03/2012 01:45 AM, David Young wrote:
>
>> But my question is; what happens when I want to use something like
>> "next=WordNotInModel", a word that does not exist in the training data,
>> and
>> still want to get a prediction using the rest of the surrounding context?
>> Even If I use "next=Unknown" or "next=null" or Null I get an error
>> "predicateLabel KeyNotFoundException was unhandled". Because "next=
>> WordNotInModel" is not a known key.
>>
>
> Usually maxent is used as an API, can you post some code here
> so we can see what you are doing? Or do you use one of the command
> line utils?
>
> Thanks,
> Jörn
>

Re: OpenNLP Maxent Data Format

Posted by Jörn Kottmann <ko...@gmail.com>.
On 09/03/2012 01:45 AM, David Young wrote:
> But my question is; what happens when I want to use something like
> "next=WordNotInModel", a word that does not exist in the training data, and
> still want to get a prediction using the rest of the surrounding context?
> Even If I use "next=Unknown" or "next=null" or Null I get an error
> "predicateLabel KeyNotFoundException was unhandled". Because "next=
> WordNotInModel" is not a known key.

Usually maxent is used as an API, can you post some code here
so we can see what you are doing? Or do you use one of the command
line utils?

Thanks,
Jörn