You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by "Vashishth, Rahul" <ra...@optum.com> on 2015/05/14 08:37:48 UTC

Custom model creation openNLP

Hi,
My requirement is to analyze sentence like. "What is health insurence." or "What is mortgage loan."
For this i need to create a custom models to find the business words in given array of tokens. So that later on
i can create a query based on given sentence.
As we have models created for person name location name, I need to have a model for business terms i.e. Loan, Insurance, and
User action i.e. Download, define and English grammar i.e. What, How.
Please let me know how i can achieve this or if there is any other way to analyze the sentence like that.

Many Regards,
Rahul Vashishth

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.

Re: Custom model creation openNLP

Posted by Rodrigo Agerri <ra...@apache.org>.

Hi Rahul,

This discussion has happened already a number of times. Doing a quick
google search gives you a number of solutions to the same question
already answered:

You need to train with whole sentences annotated with the sequences
you need. One sentence per line of tokenized text. In the link I sent
you in the previous email

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind

you can see the kind of annotations required to train a model in OpenNLP:

<START:person> Pierre Vinken <END> , 61 years old , will join the
board as a nonexecutive director Nov. 29 .
Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the
Dutch publishing group .
<START:person> Rudolph Agnew <END> , 55 years old and former chairman
of Consolidated Gold Fields PLC , was named a director of this British
industrial conglomerate .

If you only need one class, say "business" then the tags will be
<START:business> and so on.
If you are getting the whole token array could be because you only
trained on a list of words or because you did not use enough data,
among other reasons.

Cheers,

Rodrigo


On Thu, May 14, 2015 at 3:48 PM, Vashishth, Rahul
<ra...@optum.com> wrote:
> Hi Rodrigo,
>
> I did follow below link to create custom model, but it isn't working for me.
> https://gist.github.com/johnmiedema/4020deea875ce306971e
> Test File
> https://gist.github.com/johnmiedema/4020deea875ce306971e/download#
> Training file - I couldn't find a document to design a training file though
> <START> Loan <END>
> <START> Insurance <END>
> <START> Mortgage <END>
>
> I successfully created the model bin file. But when I used this file with TokenNameFinderModel,
> Instead of returning keyword span it is returning me span for whole token array. Can you suggest
> me any possible issue with above code.
>
> Thanks,
> Rahul Vashishth
>
> -----Original Message-----
> From: Rodrigo Agerri [mailto:ragerri@apache.org]
> Sent: Thursday, May 14, 2015 4:09 PM
> To: users@opennlp.apache.org
> Subject: Re: Custom model creation openNLP
>
> Hello,
>
> The best thing to do would be to manually annotate some data of the same type you want to analyze. In this case, you will be annotating "loan", "insurance" and so on. Then you can train a model to recognize such sequences.
>
> http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind
>
> If you have only a limited list of words you want to find you could get away with lookups, but for open-ended terms recognition of those types you will need to try to generate training data.
>
> Brat is a nice tool to do such annotation
>
> http://brat.nlplab.org/
>
> HTH,
>
> R
>
> On Thu, May 14, 2015 at 8:37 AM, Vashishth, Rahul <ra...@optum.com> wrote:
>> Hi,
>> My requirement is to analyze sentence like. "What is health insurence." or "What is mortgage loan."
>> For this i need to create a custom models to find the business words
>> in given array of tokens. So that later on i can create a query based on given sentence.
>> As we have models created for person name location name, I need to
>> have a model for business terms i.e. Loan, Insurance, and User action i.e. Download, define and English grammar i.e. What, How.
>> Please let me know how i can achieve this or if there is any other way to analyze the sentence like that.
>>
>> Many Regards,
>> Rahul Vashishth
>>
>> This e-mail, including attachments, may include confidential and/or
>> proprietary information, and may be used only by the person or entity
>> to which it is addressed. If the reader of this e-mail is not the
>> intended recipient or his or her authorized agent, the reader is
>> hereby notified that any dissemination, distribution or copying of
>> this e-mail is prohibited. If you have received this e-mail in error,
>> please notify the sender by replying to this message and delete this e-mail immediately.
>
>
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.

RE: Custom model creation openNLP

Posted by "Vashishth, Rahul" <ra...@optum.com>.

Hi Rodrigo,

I did follow below link to create custom model, but it isn't working for me.
https://gist.github.com/johnmiedema/4020deea875ce306971e  
Test File
https://gist.github.com/johnmiedema/4020deea875ce306971e/download#
Training file - I couldn't find a document to design a training file though
<START> Loan <END>
<START> Insurance <END>
<START> Mortgage <END>

I successfully created the model bin file. But when I used this file with TokenNameFinderModel, 
Instead of returning keyword span it is returning me span for whole token array. Can you suggest 
me any possible issue with above code. 

Thanks,
Rahul Vashishth

-----Original Message-----
From: Rodrigo Agerri [mailto:ragerri@apache.org] 
Sent: Thursday, May 14, 2015 4:09 PM
To: users@opennlp.apache.org
Subject: Re: Custom model creation openNLP

Hello,

The best thing to do would be to manually annotate some data of the same type you want to analyze. In this case, you will be annotating "loan", "insurance" and so on. Then you can train a model to recognize such sequences.

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind

If you have only a limited list of words you want to find you could get away with lookups, but for open-ended terms recognition of those types you will need to try to generate training data.

Brat is a nice tool to do such annotation

http://brat.nlplab.org/

HTH,

R

On Thu, May 14, 2015 at 8:37 AM, Vashishth, Rahul <ra...@optum.com> wrote:
> Hi,
> My requirement is to analyze sentence like. "What is health insurence." or "What is mortgage loan."
> For this i need to create a custom models to find the business words 
> in given array of tokens. So that later on i can create a query based on given sentence.
> As we have models created for person name location name, I need to 
> have a model for business terms i.e. Loan, Insurance, and User action i.e. Download, define and English grammar i.e. What, How.
> Please let me know how i can achieve this or if there is any other way to analyze the sentence like that.
>
> Many Regards,
> Rahul Vashishth
>
> This e-mail, including attachments, may include confidential and/or 
> proprietary information, and may be used only by the person or entity 
> to which it is addressed. If the reader of this e-mail is not the 
> intended recipient or his or her authorized agent, the reader is 
> hereby notified that any dissemination, distribution or copying of 
> this e-mail is prohibited. If you have received this e-mail in error, 
> please notify the sender by replying to this message and delete this e-mail immediately.

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.

Re: Custom model creation openNLP

Posted by Rodrigo Agerri <ra...@apache.org>.

Hello,

The best thing to do would be to manually annotate some data of the
same type you want to analyze. In this case, you will be annotating
"loan", "insurance" and so on. Then you can train a model to recognize
such sequences.

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind

If you have only a limited list of words you want to find you could
get away with lookups, but for open-ended terms recognition of those
types you will need to try to generate training data.

Brat is a nice tool to do such annotation

http://brat.nlplab.org/

HTH,

R

On Thu, May 14, 2015 at 8:37 AM, Vashishth, Rahul
<ra...@optum.com> wrote:
> Hi,
> My requirement is to analyze sentence like. "What is health insurence." or "What is mortgage loan."
> For this i need to create a custom models to find the business words in given array of tokens. So that later on
> i can create a query based on given sentence.
> As we have models created for person name location name, I need to have a model for business terms i.e. Loan, Insurance, and
> User action i.e. Download, define and English grammar i.e. What, How.
> Please let me know how i can achieve this or if there is any other way to analyze the sentence like that.
>
> Many Regards,
> Rahul Vashishth
>
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.