You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Vivekanand Ittigi <vi...@biginfolabs.com> on 2014/06/24 09:44:59 UTC

Writing our own models in openNLP.

Hi,

If i use a query like this in command line

./opennlp TokenNameFinder en-ner-person.bin <input.txt> <output.txt>

I'll get person names printed in output.txt but I want to write own models
such that i should print my own entities.

E.g.

1. what is the risk value on icm2500.
2. Delivery of prd_234 will be arrived late.
3. Watson is handling router_34.

If i pass these lines, it should parse and extract product_entities.
icm2500, prd_234, router_34... etc these are all Products( we can save this
information in a file and we can use it as look up kind of for models or
openNLP).

Can anyone please tel me how to do this  ?

Fwd: Writing our own models in openNLP.

Posted by Vivekanand Ittigi <vi...@biginfolabs.com>.

Hi Jorn,

I read the document
http://opennlp.apache.org/documentation/manual/opennlp.html#tools.namefind.recognition.cmdline.
But i felt i needed more information to put it in code.

I got to know that we need to train the model. But could not get it.
Can you please explain it. so that i could start implementing it.

Thanks,
Vivek

Thanks,
Vivek


On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
>
>> Hi,
>>
>> If i use a query like this in command line
>>
>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt> <output.txt>
>>
>> I'll get person names printed in output.txt but I want to write own models
>> such that i should print my own entities.
>>
>> E.g.
>>
>> 1. what is the risk value on icm2500.
>> 2. Delivery of prd_234 will be arrived late.
>> 3. Watson is handling router_34.
>>
>> If i pass these lines, it should parse and extract product_entities.
>> icm2500, prd_234, router_34... etc these are all Products( we can save
>> this
>> information in a file and we can use it as look up kind of for models or
>> openNLP).
>>
>> Can anyone please tel me how to do this  ?
>>
>>
> You need to train your own model. To do that you have to collect some of
> the texts
> and annotate them with the entities you wish to detect.
>
> Have a look at the documentation about the name finder. It explains how to
> the training
> works.
>
> HTH,
> Jörn
>

Fwd: Writing our own models in openNLP.

Posted by Vivekanand Ittigi <vi...@biginfolabs.com>.

Hi Jorn,

Let me use training model itself.

Let me just say what i've done so far

1. I've written the following text into a file called test.train
<START:Product_entities>icm2500<END>
<START:Product_entities>prd_234<END>
.
.
.

2.  i ran the following

./opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data test.train
-model en-ner-person.bin

3. I've added the bellow line in "sometext.txt"

 What is the risk value on icm2500. Delivery of prd_234 will be arrived
late. Watson is handling router_34.

4. I ran the command

./opennlp TokenNameFinder en-ner-person.bin <sometext.txt>
output/output4.txt

result: It threw me the same line instead of What is the risk value on
<START:Product_entities>icm2500<END> Delivery of
<START:Product_entities>prd_234<END> will be arrived late.......

Please tell me what am i doing wrong??????

Thanks,
Vivek





On Tue, Jun 24, 2014 at 5:06 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 06/24/2014 01:10 PM, Vivekanand Ittigi wrote:
>
>> Hi Jorn,
>>
>> I read the document
>> http://opennlp.apache.org/documentation/manual/opennlp.
>> html#tools.namefind.recognition.cmdline.
>> But i felt i needed more information to put it in code.
>>
>> I got to know that we need to train the model. But could not get it.
>> Can you please explain it. so that i could start implementing it.
>>
>> Thanks,
>> Vivek
>>
>> Thanks,
>> Vivek
>>
>>
>> On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <ko...@gmail.com>
>> wrote:
>>
>>  On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
>>>
>>>  Hi,
>>>>
>>>> If i use a query like this in command line
>>>>
>>>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt> <output.txt>
>>>>
>>>> I'll get person names printed in output.txt but I want to write own
>>>> models
>>>> such that i should print my own entities.
>>>>
>>>> E.g.
>>>>
>>>> 1. what is the risk value on icm2500.
>>>> 2. Delivery of prd_234 will be arrived late.
>>>> 3. Watson is handling router_34.
>>>>
>>>> If i pass these lines, it should parse and extract product_entities.
>>>> icm2500, prd_234, router_34... etc these are all Products( we can save
>>>> this
>>>> information in a file and we can use it as look up kind of for models or
>>>> openNLP).
>>>>
>>>> Can anyone please tel me how to do this  ?
>>>>
>>>>
>>>>  You need to train your own model. To do that you have to collect some
>>> of
>>> the texts
>>> and annotate them with the entities you wish to detect.
>>>
>>> Have a look at the documentation about the name finder. It explains how
>>> to
>>> the training
>>> works.
>>>
>>
> For the training you need to produce annotated texts like the sample in
> the documentation.
> If you have a training data file in that format you can use the command
> line interface to
> actual train a model.
>
> The latest trunk version of OpenNLP can also be trained on files in the
> brat data format,
> those can be easily created with brat.
>
> Have a look here:
> http://brat.nlplab.org/index.html
>
> In my experience brat works quite well in the latest trunk version.
>
> To train with brat you need to suffix the training command like this
> bin/opennlp TokenNameFinderTrainer.brat
> That command will print a help message explaining the inputs it needs.
>
> There is no need to write code to train a name finder model.
>
> Jörn
>
>
>
>
>

Re: Writing our own models in openNLP.

Posted by Vivekanand Ittigi <vi...@biginfolabs.com>.

Hi Jorn,

I've already mentioned it in the mail. I've just put three lines to train
the model. I gave the similar input sentences to detect the entities.

two questions i want to quote here:
1. Though i'm passing similar or almost same sentence pattern as
input(please compare training data and input sentences), why entities were
not recognized.?

2. what kind of sentences i should pass to detect these entities for
mentioned sentences?

Thanks,
Vivek

On Tue, Jul 1, 2014 at 6:54 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 07/01/2014 02:04 PM, Vivekanand Ittigi wrote:
>
>> And these are the inputs i'm giving:
>> static String sentence = "phone is a product in our system";
>> static String sentence = "what is the risk on phone";
>> static String sentence = "who is working on switch today";
>>
>> I should get these entities from respective lines "phone" "phone" and
>> "switch".
>>
>> But i got nothing.? I know i'm doing something wrong in training data. I'm
>> new to this field. can you please guide me what can i put in training data
>> to process all these sentences.
>>
>
> You need much more content to train a useful model. I don't know your data
> and how much
> you use, but if you just have a few ten sentences it will not work.
>
> On how many sentences do you train?
>
> Jörn
>

Re: Writing our own models in openNLP.

Posted by Jörn Kottmann <ko...@gmail.com>.

On 07/01/2014 02:04 PM, Vivekanand Ittigi wrote:
> And these are the inputs i'm giving:
> static String sentence = "phone is a product in our system";
> static String sentence = "what is the risk on phone";
> static String sentence = "who is working on switch today";
>
> I should get these entities from respective lines "phone" "phone" and
> "switch".
>
> But i got nothing.? I know i'm doing something wrong in training data. I'm
> new to this field. can you please guide me what can i put in training data
> to process all these sentences.

You need much more content to train a useful model. I don't know your 
data and how much
you use, but if you just have a few ten sentences it will not work.

On how many sentences do you train?

Jörn

Re: Writing our own models in openNLP.

Posted by John Miedema <jo...@gmail.com>.

Hi Ittigi, off the top ... how many lines are in your training set? I
assume you have compiled the model and are pointing to it as in the sample
code. Sorry, just confirming the most obvious things first.





On Tue, Jul 1, 2014 at 8:04 AM, Vivekanand Ittigi <vi...@biginfolabs.com>
wrote:

> Hi John,
>
> I went through your post. It was so impressive and started implementing as
> you said.
>
> This is what is my training data:
> who is working on <START> phone <END>.
> <START> mobile <END> is a product in our system.
>
>
> And these are the inputs i'm giving:
> static String sentence = "phone is a product in our system";
> static String sentence = "what is the risk on phone";
> static String sentence = "who is working on switch today";
>
> I should get these entities from respective lines "phone" "phone" and
> "switch".
>
> But i got nothing.? I know i'm doing something wrong in training data. I'm
> new to this field. can you please guide me what can i put in training data
> to process all these sentences.
>
> If i get little more knowledge about this, i can implement in our base??
>
> Please help me..!
>
> Thanks,
> Vivek
>
>
>
>
>
> On Tue, Jun 24, 2014 at 9:17 PM, John Miedema <jo...@gmail.com>
> wrote:
>
>> I recently wrote up post, doing this in java, not using command line.
>> Maybe
>> it will help. Code samples in java. http://johnmiedema.com/?p=744
>>
>>
>> On Tue, Jun 24, 2014 at 8:53 AM, Vivekanand Ittigi <vivek@biginfolabs.com
>> >
>> wrote:
>>
>> > It means you want me to write small story integrating these entities.?
>> >
>> >
>> > On Tue, Jun 24, 2014 at 5:59 PM, Mark G <gi...@gmail.com> wrote:
>> >
>> > > Hello, you need to annotate the entity within some of the sentences it
>> > > occurs in. The name finder needs context. It's giving you the same
>> > sentence
>> > > back because it was trained to find any token anywhere.
>> > > Mg
>> > >
>> > >
>> > > > On Jun 24, 2014, at 8:12 AM, Vivekanand Ittigi <
>> vivek@biginfolabs.com>
>> > > wrote:
>> > > >
>> > > > Hi Jorn,
>> > > >
>> > > > Let me use training model itself.
>> > > >
>> > > > Let me just say what i've done so far
>> > > >
>> > > > 1. I've written the following text into a file called test.train
>> > > > <START:Product_entities>icm2500<END>
>> > > > <START:Product_entities>prd_234<END>
>> > > > .
>> > > > .
>> > > > .
>> > > >
>> > > > 2.  i ran the following
>> > > >
>> > > > ./opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data
>> > > test.train
>> > > > -model en-ner-person.bin
>> > > >
>> > > > 3. I've added the bellow line in "sometext.txt"
>> > > >
>> > > > What is the risk value on icm2500. Delivery of prd_234 will be
>> arrived
>> > > > late. Watson is handling router_34.
>> > > >
>> > > > 4. I ran the command
>> > > >
>> > > > ./opennlp TokenNameFinder en-ner-person.bin <sometext.txt>
>> > > > output/output4.txt
>> > > >
>> > > > result: It threw me the same line instead of What is the risk value
>> on
>> > > > <START:Product_entities>icm2500<END> Delivery of
>> > > > <START:Product_entities>prd_234<END> will be arrived late.......
>> > > >
>> > > > Please tell me what am i doing wrong??????
>> > > >
>> > > > Thanks,
>> > > > Vivek
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >> On Tue, Jun 24, 2014 at 5:06 PM, Jörn Kottmann <kottmann@gmail.com
>> >
>> > > wrote:
>> > > >>
>> > > >>> On 06/24/2014 01:10 PM, Vivekanand Ittigi wrote:
>> > > >>>
>> > > >>> Hi Jorn,
>> > > >>>
>> > > >>> I read the document
>> > > >>> http://opennlp.apache.org/documentation/manual/opennlp.
>> > > >>> html#tools.namefind.recognition.cmdline.
>> > > >>> But i felt i needed more information to put it in code.
>> > > >>>
>> > > >>> I got to know that we need to train the model. But could not get
>> it.
>> > > >>> Can you please explain it. so that i could start implementing it.
>> > > >>>
>> > > >>> Thanks,
>> > > >>> Vivek
>> > > >>>
>> > > >>> Thanks,
>> > > >>> Vivek
>> > > >>>
>> > > >>>
>> > > >>> On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <
>> kottmann@gmail.com>
>> > > >>> wrote:
>> > > >>>
>> > > >>>> On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
>> > > >>>>
>> > > >>>> Hi,
>> > > >>>>>
>> > > >>>>> If i use a query like this in command line
>> > > >>>>>
>> > > >>>>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt>
>> > <output.txt>
>> > > >>>>>
>> > > >>>>> I'll get person names printed in output.txt but I want to write
>> own
>> > > >>>>> models
>> > > >>>>> such that i should print my own entities.
>> > > >>>>>
>> > > >>>>> E.g.
>> > > >>>>>
>> > > >>>>> 1. what is the risk value on icm2500.
>> > > >>>>> 2. Delivery of prd_234 will be arrived late.
>> > > >>>>> 3. Watson is handling router_34.
>> > > >>>>>
>> > > >>>>> If i pass these lines, it should parse and extract
>> > product_entities.
>> > > >>>>> icm2500, prd_234, router_34... etc these are all Products( we
>> can
>> > > save
>> > > >>>>> this
>> > > >>>>> information in a file and we can use it as look up kind of for
>> > > models or
>> > > >>>>> openNLP).
>> > > >>>>>
>> > > >>>>> Can anyone please tel me how to do this  ?
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> You need to train your own model. To do that you have to collect
>> > some
>> > > >>>> of
>> > > >>>> the texts
>> > > >>>> and annotate them with the entities you wish to detect.
>> > > >>>>
>> > > >>>> Have a look at the documentation about the name finder. It
>> explains
>> > > how
>> > > >>>> to
>> > > >>>> the training
>> > > >>>> works.
>> > > >> For the training you need to produce annotated texts like the
>> sample
>> > in
>> > > >> the documentation.
>> > > >> If you have a training data file in that format you can use the
>> > command
>> > > >> line interface to
>> > > >> actual train a model.
>> > > >>
>> > > >> The latest trunk version of OpenNLP can also be trained on files in
>> > the
>> > > >> brat data format,
>> > > >> those can be easily created with brat.
>> > > >>
>> > > >> Have a look here:
>> > > >> http://brat.nlplab.org/index.html
>> > > >>
>> > > >> In my experience brat works quite well in the latest trunk version.
>> > > >>
>> > > >> To train with brat you need to suffix the training command like
>> this
>> > > >> bin/opennlp TokenNameFinderTrainer.brat
>> > > >> That command will print a help message explaining the inputs it
>> needs.
>> > > >>
>> > > >> There is no need to write code to train a name finder model.
>> > > >>
>> > > >> Jörn
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > >
>> >
>>
>>
>>
>> --
>> _________________________________________
>> johnmiedema.com
>>
>
>


-- 
_________________________________________
johnmiedema.com

Re: Writing our own models in openNLP.

Posted by Vivekanand Ittigi <vi...@biginfolabs.com>.

Hi John,

I went through your post. It was so impressive and started implementing as
you said.

This is what is my training data:
who is working on <START> phone <END>.
<START> mobile <END> is a product in our system.


And these are the inputs i'm giving:
static String sentence = "phone is a product in our system";
static String sentence = "what is the risk on phone";
static String sentence = "who is working on switch today";

I should get these entities from respective lines "phone" "phone" and
"switch".

But i got nothing.? I know i'm doing something wrong in training data. I'm
new to this field. can you please guide me what can i put in training data
to process all these sentences.

If i get little more knowledge about this, i can implement in our base??

Please help me..!

Thanks,
Vivek





On Tue, Jun 24, 2014 at 9:17 PM, John Miedema <jo...@gmail.com>
wrote:

> I recently wrote up post, doing this in java, not using command line. Maybe
> it will help. Code samples in java. http://johnmiedema.com/?p=744
>
>
> On Tue, Jun 24, 2014 at 8:53 AM, Vivekanand Ittigi <vi...@biginfolabs.com>
> wrote:
>
> > It means you want me to write small story integrating these entities.?
> >
> >
> > On Tue, Jun 24, 2014 at 5:59 PM, Mark G <gi...@gmail.com> wrote:
> >
> > > Hello, you need to annotate the entity within some of the sentences it
> > > occurs in. The name finder needs context. It's giving you the same
> > sentence
> > > back because it was trained to find any token anywhere.
> > > Mg
> > >
> > >
> > > > On Jun 24, 2014, at 8:12 AM, Vivekanand Ittigi <
> vivek@biginfolabs.com>
> > > wrote:
> > > >
> > > > Hi Jorn,
> > > >
> > > > Let me use training model itself.
> > > >
> > > > Let me just say what i've done so far
> > > >
> > > > 1. I've written the following text into a file called test.train
> > > > <START:Product_entities>icm2500<END>
> > > > <START:Product_entities>prd_234<END>
> > > > .
> > > > .
> > > > .
> > > >
> > > > 2.  i ran the following
> > > >
> > > > ./opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data
> > > test.train
> > > > -model en-ner-person.bin
> > > >
> > > > 3. I've added the bellow line in "sometext.txt"
> > > >
> > > > What is the risk value on icm2500. Delivery of prd_234 will be
> arrived
> > > > late. Watson is handling router_34.
> > > >
> > > > 4. I ran the command
> > > >
> > > > ./opennlp TokenNameFinder en-ner-person.bin <sometext.txt>
> > > > output/output4.txt
> > > >
> > > > result: It threw me the same line instead of What is the risk value
> on
> > > > <START:Product_entities>icm2500<END> Delivery of
> > > > <START:Product_entities>prd_234<END> will be arrived late.......
> > > >
> > > > Please tell me what am i doing wrong??????
> > > >
> > > > Thanks,
> > > > Vivek
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >> On Tue, Jun 24, 2014 at 5:06 PM, Jörn Kottmann <ko...@gmail.com>
> > > wrote:
> > > >>
> > > >>> On 06/24/2014 01:10 PM, Vivekanand Ittigi wrote:
> > > >>>
> > > >>> Hi Jorn,
> > > >>>
> > > >>> I read the document
> > > >>> http://opennlp.apache.org/documentation/manual/opennlp.
> > > >>> html#tools.namefind.recognition.cmdline.
> > > >>> But i felt i needed more information to put it in code.
> > > >>>
> > > >>> I got to know that we need to train the model. But could not get
> it.
> > > >>> Can you please explain it. so that i could start implementing it.
> > > >>>
> > > >>> Thanks,
> > > >>> Vivek
> > > >>>
> > > >>> Thanks,
> > > >>> Vivek
> > > >>>
> > > >>>
> > > >>> On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <kottmann@gmail.com
> >
> > > >>> wrote:
> > > >>>
> > > >>>> On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
> > > >>>>
> > > >>>> Hi,
> > > >>>>>
> > > >>>>> If i use a query like this in command line
> > > >>>>>
> > > >>>>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt>
> > <output.txt>
> > > >>>>>
> > > >>>>> I'll get person names printed in output.txt but I want to write
> own
> > > >>>>> models
> > > >>>>> such that i should print my own entities.
> > > >>>>>
> > > >>>>> E.g.
> > > >>>>>
> > > >>>>> 1. what is the risk value on icm2500.
> > > >>>>> 2. Delivery of prd_234 will be arrived late.
> > > >>>>> 3. Watson is handling router_34.
> > > >>>>>
> > > >>>>> If i pass these lines, it should parse and extract
> > product_entities.
> > > >>>>> icm2500, prd_234, router_34... etc these are all Products( we can
> > > save
> > > >>>>> this
> > > >>>>> information in a file and we can use it as look up kind of for
> > > models or
> > > >>>>> openNLP).
> > > >>>>>
> > > >>>>> Can anyone please tel me how to do this  ?
> > > >>>>>
> > > >>>>>
> > > >>>>> You need to train your own model. To do that you have to collect
> > some
> > > >>>> of
> > > >>>> the texts
> > > >>>> and annotate them with the entities you wish to detect.
> > > >>>>
> > > >>>> Have a look at the documentation about the name finder. It
> explains
> > > how
> > > >>>> to
> > > >>>> the training
> > > >>>> works.
> > > >> For the training you need to produce annotated texts like the sample
> > in
> > > >> the documentation.
> > > >> If you have a training data file in that format you can use the
> > command
> > > >> line interface to
> > > >> actual train a model.
> > > >>
> > > >> The latest trunk version of OpenNLP can also be trained on files in
> > the
> > > >> brat data format,
> > > >> those can be easily created with brat.
> > > >>
> > > >> Have a look here:
> > > >> http://brat.nlplab.org/index.html
> > > >>
> > > >> In my experience brat works quite well in the latest trunk version.
> > > >>
> > > >> To train with brat you need to suffix the training command like this
> > > >> bin/opennlp TokenNameFinderTrainer.brat
> > > >> That command will print a help message explaining the inputs it
> needs.
> > > >>
> > > >> There is no need to write code to train a name finder model.
> > > >>
> > > >> Jörn
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> >
>
>
>
> --
> _________________________________________
> johnmiedema.com
>

Re: Writing our own models in openNLP.

Posted by John Miedema <jo...@gmail.com>.

I recently wrote up post, doing this in java, not using command line. Maybe
it will help. Code samples in java. http://johnmiedema.com/?p=744


On Tue, Jun 24, 2014 at 8:53 AM, Vivekanand Ittigi <vi...@biginfolabs.com>
wrote:

> It means you want me to write small story integrating these entities.?
>
>
> On Tue, Jun 24, 2014 at 5:59 PM, Mark G <gi...@gmail.com> wrote:
>
> > Hello, you need to annotate the entity within some of the sentences it
> > occurs in. The name finder needs context. It's giving you the same
> sentence
> > back because it was trained to find any token anywhere.
> > Mg
> >
> >
> > > On Jun 24, 2014, at 8:12 AM, Vivekanand Ittigi <vi...@biginfolabs.com>
> > wrote:
> > >
> > > Hi Jorn,
> > >
> > > Let me use training model itself.
> > >
> > > Let me just say what i've done so far
> > >
> > > 1. I've written the following text into a file called test.train
> > > <START:Product_entities>icm2500<END>
> > > <START:Product_entities>prd_234<END>
> > > .
> > > .
> > > .
> > >
> > > 2.  i ran the following
> > >
> > > ./opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data
> > test.train
> > > -model en-ner-person.bin
> > >
> > > 3. I've added the bellow line in "sometext.txt"
> > >
> > > What is the risk value on icm2500. Delivery of prd_234 will be arrived
> > > late. Watson is handling router_34.
> > >
> > > 4. I ran the command
> > >
> > > ./opennlp TokenNameFinder en-ner-person.bin <sometext.txt>
> > > output/output4.txt
> > >
> > > result: It threw me the same line instead of What is the risk value on
> > > <START:Product_entities>icm2500<END> Delivery of
> > > <START:Product_entities>prd_234<END> will be arrived late.......
> > >
> > > Please tell me what am i doing wrong??????
> > >
> > > Thanks,
> > > Vivek
> > >
> > >
> > >
> > >
> > >
> > >> On Tue, Jun 24, 2014 at 5:06 PM, Jörn Kottmann <ko...@gmail.com>
> > wrote:
> > >>
> > >>> On 06/24/2014 01:10 PM, Vivekanand Ittigi wrote:
> > >>>
> > >>> Hi Jorn,
> > >>>
> > >>> I read the document
> > >>> http://opennlp.apache.org/documentation/manual/opennlp.
> > >>> html#tools.namefind.recognition.cmdline.
> > >>> But i felt i needed more information to put it in code.
> > >>>
> > >>> I got to know that we need to train the model. But could not get it.
> > >>> Can you please explain it. so that i could start implementing it.
> > >>>
> > >>> Thanks,
> > >>> Vivek
> > >>>
> > >>> Thanks,
> > >>> Vivek
> > >>>
> > >>>
> > >>> On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <ko...@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
> > >>>>
> > >>>> Hi,
> > >>>>>
> > >>>>> If i use a query like this in command line
> > >>>>>
> > >>>>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt>
> <output.txt>
> > >>>>>
> > >>>>> I'll get person names printed in output.txt but I want to write own
> > >>>>> models
> > >>>>> such that i should print my own entities.
> > >>>>>
> > >>>>> E.g.
> > >>>>>
> > >>>>> 1. what is the risk value on icm2500.
> > >>>>> 2. Delivery of prd_234 will be arrived late.
> > >>>>> 3. Watson is handling router_34.
> > >>>>>
> > >>>>> If i pass these lines, it should parse and extract
> product_entities.
> > >>>>> icm2500, prd_234, router_34... etc these are all Products( we can
> > save
> > >>>>> this
> > >>>>> information in a file and we can use it as look up kind of for
> > models or
> > >>>>> openNLP).
> > >>>>>
> > >>>>> Can anyone please tel me how to do this  ?
> > >>>>>
> > >>>>>
> > >>>>> You need to train your own model. To do that you have to collect
> some
> > >>>> of
> > >>>> the texts
> > >>>> and annotate them with the entities you wish to detect.
> > >>>>
> > >>>> Have a look at the documentation about the name finder. It explains
> > how
> > >>>> to
> > >>>> the training
> > >>>> works.
> > >> For the training you need to produce annotated texts like the sample
> in
> > >> the documentation.
> > >> If you have a training data file in that format you can use the
> command
> > >> line interface to
> > >> actual train a model.
> > >>
> > >> The latest trunk version of OpenNLP can also be trained on files in
> the
> > >> brat data format,
> > >> those can be easily created with brat.
> > >>
> > >> Have a look here:
> > >> http://brat.nlplab.org/index.html
> > >>
> > >> In my experience brat works quite well in the latest trunk version.
> > >>
> > >> To train with brat you need to suffix the training command like this
> > >> bin/opennlp TokenNameFinderTrainer.brat
> > >> That command will print a help message explaining the inputs it needs.
> > >>
> > >> There is no need to write code to train a name finder model.
> > >>
> > >> Jörn
> > >>
> > >>
> > >>
> > >>
> > >>
> >
>



-- 
_________________________________________
johnmiedema.com

Re: Writing our own models in openNLP.

Posted by Vivekanand Ittigi <vi...@biginfolabs.com>.

It means you want me to write small story integrating these entities.?


On Tue, Jun 24, 2014 at 5:59 PM, Mark G <gi...@gmail.com> wrote:

> Hello, you need to annotate the entity within some of the sentences it
> occurs in. The name finder needs context. It's giving you the same sentence
> back because it was trained to find any token anywhere.
> Mg
>
>
> > On Jun 24, 2014, at 8:12 AM, Vivekanand Ittigi <vi...@biginfolabs.com>
> wrote:
> >
> > Hi Jorn,
> >
> > Let me use training model itself.
> >
> > Let me just say what i've done so far
> >
> > 1. I've written the following text into a file called test.train
> > <START:Product_entities>icm2500<END>
> > <START:Product_entities>prd_234<END>
> > .
> > .
> > .
> >
> > 2.  i ran the following
> >
> > ./opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data
> test.train
> > -model en-ner-person.bin
> >
> > 3. I've added the bellow line in "sometext.txt"
> >
> > What is the risk value on icm2500. Delivery of prd_234 will be arrived
> > late. Watson is handling router_34.
> >
> > 4. I ran the command
> >
> > ./opennlp TokenNameFinder en-ner-person.bin <sometext.txt>
> > output/output4.txt
> >
> > result: It threw me the same line instead of What is the risk value on
> > <START:Product_entities>icm2500<END> Delivery of
> > <START:Product_entities>prd_234<END> will be arrived late.......
> >
> > Please tell me what am i doing wrong??????
> >
> > Thanks,
> > Vivek
> >
> >
> >
> >
> >
> >> On Tue, Jun 24, 2014 at 5:06 PM, Jörn Kottmann <ko...@gmail.com>
> wrote:
> >>
> >>> On 06/24/2014 01:10 PM, Vivekanand Ittigi wrote:
> >>>
> >>> Hi Jorn,
> >>>
> >>> I read the document
> >>> http://opennlp.apache.org/documentation/manual/opennlp.
> >>> html#tools.namefind.recognition.cmdline.
> >>> But i felt i needed more information to put it in code.
> >>>
> >>> I got to know that we need to train the model. But could not get it.
> >>> Can you please explain it. so that i could start implementing it.
> >>>
> >>> Thanks,
> >>> Vivek
> >>>
> >>> Thanks,
> >>> Vivek
> >>>
> >>>
> >>> On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <ko...@gmail.com>
> >>> wrote:
> >>>
> >>>> On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
> >>>>
> >>>> Hi,
> >>>>>
> >>>>> If i use a query like this in command line
> >>>>>
> >>>>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt> <output.txt>
> >>>>>
> >>>>> I'll get person names printed in output.txt but I want to write own
> >>>>> models
> >>>>> such that i should print my own entities.
> >>>>>
> >>>>> E.g.
> >>>>>
> >>>>> 1. what is the risk value on icm2500.
> >>>>> 2. Delivery of prd_234 will be arrived late.
> >>>>> 3. Watson is handling router_34.
> >>>>>
> >>>>> If i pass these lines, it should parse and extract product_entities.
> >>>>> icm2500, prd_234, router_34... etc these are all Products( we can
> save
> >>>>> this
> >>>>> information in a file and we can use it as look up kind of for
> models or
> >>>>> openNLP).
> >>>>>
> >>>>> Can anyone please tel me how to do this  ?
> >>>>>
> >>>>>
> >>>>> You need to train your own model. To do that you have to collect some
> >>>> of
> >>>> the texts
> >>>> and annotate them with the entities you wish to detect.
> >>>>
> >>>> Have a look at the documentation about the name finder. It explains
> how
> >>>> to
> >>>> the training
> >>>> works.
> >> For the training you need to produce annotated texts like the sample in
> >> the documentation.
> >> If you have a training data file in that format you can use the command
> >> line interface to
> >> actual train a model.
> >>
> >> The latest trunk version of OpenNLP can also be trained on files in the
> >> brat data format,
> >> those can be easily created with brat.
> >>
> >> Have a look here:
> >> http://brat.nlplab.org/index.html
> >>
> >> In my experience brat works quite well in the latest trunk version.
> >>
> >> To train with brat you need to suffix the training command like this
> >> bin/opennlp TokenNameFinderTrainer.brat
> >> That command will print a help message explaining the inputs it needs.
> >>
> >> There is no need to write code to train a name finder model.
> >>
> >> Jörn
> >>
> >>
> >>
> >>
> >>
>

Re: Writing our own models in openNLP.

Posted by Mark G <gi...@gmail.com>.

Hello, you need to annotate the entity within some of the sentences it occurs in. The name finder needs context. It's giving you the same sentence back because it was trained to find any token anywhere.
Mg


> On Jun 24, 2014, at 8:12 AM, Vivekanand Ittigi <vi...@biginfolabs.com> wrote:
> 
> Hi Jorn,
> 
> Let me use training model itself.
> 
> Let me just say what i've done so far
> 
> 1. I've written the following text into a file called test.train
> <START:Product_entities>icm2500<END>
> <START:Product_entities>prd_234<END>
> .
> .
> .
> 
> 2.  i ran the following
> 
> ./opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data test.train
> -model en-ner-person.bin
> 
> 3. I've added the bellow line in "sometext.txt"
> 
> What is the risk value on icm2500. Delivery of prd_234 will be arrived
> late. Watson is handling router_34.
> 
> 4. I ran the command
> 
> ./opennlp TokenNameFinder en-ner-person.bin <sometext.txt>
> output/output4.txt
> 
> result: It threw me the same line instead of What is the risk value on
> <START:Product_entities>icm2500<END> Delivery of
> <START:Product_entities>prd_234<END> will be arrived late.......
> 
> Please tell me what am i doing wrong??????
> 
> Thanks,
> Vivek
> 
> 
> 
> 
> 
>> On Tue, Jun 24, 2014 at 5:06 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>> 
>>> On 06/24/2014 01:10 PM, Vivekanand Ittigi wrote:
>>> 
>>> Hi Jorn,
>>> 
>>> I read the document
>>> http://opennlp.apache.org/documentation/manual/opennlp.
>>> html#tools.namefind.recognition.cmdline.
>>> But i felt i needed more information to put it in code.
>>> 
>>> I got to know that we need to train the model. But could not get it.
>>> Can you please explain it. so that i could start implementing it.
>>> 
>>> Thanks,
>>> Vivek
>>> 
>>> Thanks,
>>> Vivek
>>> 
>>> 
>>> On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <ko...@gmail.com>
>>> wrote:
>>> 
>>>> On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
>>>> 
>>>> Hi,
>>>>> 
>>>>> If i use a query like this in command line
>>>>> 
>>>>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt> <output.txt>
>>>>> 
>>>>> I'll get person names printed in output.txt but I want to write own
>>>>> models
>>>>> such that i should print my own entities.
>>>>> 
>>>>> E.g.
>>>>> 
>>>>> 1. what is the risk value on icm2500.
>>>>> 2. Delivery of prd_234 will be arrived late.
>>>>> 3. Watson is handling router_34.
>>>>> 
>>>>> If i pass these lines, it should parse and extract product_entities.
>>>>> icm2500, prd_234, router_34... etc these are all Products( we can save
>>>>> this
>>>>> information in a file and we can use it as look up kind of for models or
>>>>> openNLP).
>>>>> 
>>>>> Can anyone please tel me how to do this  ?
>>>>> 
>>>>> 
>>>>> You need to train your own model. To do that you have to collect some
>>>> of
>>>> the texts
>>>> and annotate them with the entities you wish to detect.
>>>> 
>>>> Have a look at the documentation about the name finder. It explains how
>>>> to
>>>> the training
>>>> works.
>> For the training you need to produce annotated texts like the sample in
>> the documentation.
>> If you have a training data file in that format you can use the command
>> line interface to
>> actual train a model.
>> 
>> The latest trunk version of OpenNLP can also be trained on files in the
>> brat data format,
>> those can be easily created with brat.
>> 
>> Have a look here:
>> http://brat.nlplab.org/index.html
>> 
>> In my experience brat works quite well in the latest trunk version.
>> 
>> To train with brat you need to suffix the training command like this
>> bin/opennlp TokenNameFinderTrainer.brat
>> That command will print a help message explaining the inputs it needs.
>> 
>> There is no need to write code to train a name finder model.
>> 
>> Jörn
>> 
>> 
>> 
>> 
>>

Re: Writing our own models in openNLP.

Posted by Vivekanand Ittigi <vi...@biginfolabs.com>.

Hi Jorn,

Let me use training model itself.

Let me just say what i've done so far

1. I've written the following text into a file called test.train
<START:Product_entities>icm2500<END>
<START:Product_entities>prd_234<END>
.
.
.

2.  i ran the following

./opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data test.train
-model en-ner-person.bin

3. I've added the bellow line in "sometext.txt"

 What is the risk value on icm2500. Delivery of prd_234 will be arrived
late. Watson is handling router_34.

4. I ran the command

./opennlp TokenNameFinder en-ner-person.bin <sometext.txt>
output/output4.txt

result: It threw me the same line instead of What is the risk value on
<START:Product_entities>icm2500<END> Delivery of
<START:Product_entities>prd_234<END> will be arrived late.......

Please tell me what am i doing wrong??????

Thanks,
Vivek





On Tue, Jun 24, 2014 at 5:06 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 06/24/2014 01:10 PM, Vivekanand Ittigi wrote:
>
>> Hi Jorn,
>>
>> I read the document
>> http://opennlp.apache.org/documentation/manual/opennlp.
>> html#tools.namefind.recognition.cmdline.
>> But i felt i needed more information to put it in code.
>>
>> I got to know that we need to train the model. But could not get it.
>> Can you please explain it. so that i could start implementing it.
>>
>> Thanks,
>> Vivek
>>
>> Thanks,
>> Vivek
>>
>>
>> On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <ko...@gmail.com>
>> wrote:
>>
>>  On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
>>>
>>>  Hi,
>>>>
>>>> If i use a query like this in command line
>>>>
>>>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt> <output.txt>
>>>>
>>>> I'll get person names printed in output.txt but I want to write own
>>>> models
>>>> such that i should print my own entities.
>>>>
>>>> E.g.
>>>>
>>>> 1. what is the risk value on icm2500.
>>>> 2. Delivery of prd_234 will be arrived late.
>>>> 3. Watson is handling router_34.
>>>>
>>>> If i pass these lines, it should parse and extract product_entities.
>>>> icm2500, prd_234, router_34... etc these are all Products( we can save
>>>> this
>>>> information in a file and we can use it as look up kind of for models or
>>>> openNLP).
>>>>
>>>> Can anyone please tel me how to do this  ?
>>>>
>>>>
>>>>  You need to train your own model. To do that you have to collect some
>>> of
>>> the texts
>>> and annotate them with the entities you wish to detect.
>>>
>>> Have a look at the documentation about the name finder. It explains how
>>> to
>>> the training
>>> works.
>>>
>>
> For the training you need to produce annotated texts like the sample in
> the documentation.
> If you have a training data file in that format you can use the command
> line interface to
> actual train a model.
>
> The latest trunk version of OpenNLP can also be trained on files in the
> brat data format,
> those can be easily created with brat.
>
> Have a look here:
> http://brat.nlplab.org/index.html
>
> In my experience brat works quite well in the latest trunk version.
>
> To train with brat you need to suffix the training command like this
> bin/opennlp TokenNameFinderTrainer.brat
> That command will print a help message explaining the inputs it needs.
>
> There is no need to write code to train a name finder model.
>
> Jörn
>
>
>
>
>

Re: Writing our own models in openNLP.

Posted by Jörn Kottmann <ko...@gmail.com>.

On 06/24/2014 01:10 PM, Vivekanand Ittigi wrote:
> Hi Jorn,
>
> I read the document
> http://opennlp.apache.org/documentation/manual/opennlp.html#tools.namefind.recognition.cmdline.
> But i felt i needed more information to put it in code.
>
> I got to know that we need to train the model. But could not get it.
> Can you please explain it. so that i could start implementing it.
>
> Thanks,
> Vivek
>
> Thanks,
> Vivek
>
>
> On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>
>> On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
>>
>>> Hi,
>>>
>>> If i use a query like this in command line
>>>
>>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt> <output.txt>
>>>
>>> I'll get person names printed in output.txt but I want to write own models
>>> such that i should print my own entities.
>>>
>>> E.g.
>>>
>>> 1. what is the risk value on icm2500.
>>> 2. Delivery of prd_234 will be arrived late.
>>> 3. Watson is handling router_34.
>>>
>>> If i pass these lines, it should parse and extract product_entities.
>>> icm2500, prd_234, router_34... etc these are all Products( we can save
>>> this
>>> information in a file and we can use it as look up kind of for models or
>>> openNLP).
>>>
>>> Can anyone please tel me how to do this  ?
>>>
>>>
>> You need to train your own model. To do that you have to collect some of
>> the texts
>> and annotate them with the entities you wish to detect.
>>
>> Have a look at the documentation about the name finder. It explains how to
>> the training
>> works.

For the training you need to produce annotated texts like the sample in 
the documentation.
If you have a training data file in that format you can use the command 
line interface to
actual train a model.

The latest trunk version of OpenNLP can also be trained on files in the 
brat data format,
those can be easily created with brat.

Have a look here:
http://brat.nlplab.org/index.html

In my experience brat works quite well in the latest trunk version.

To train with brat you need to suffix the training command like this 
bin/opennlp TokenNameFinderTrainer.brat
That command will print a help message explaining the inputs it needs.

There is no need to write code to train a name finder model.

Jörn

Re: Writing our own models in openNLP.

Posted by Vivekanand Ittigi <vi...@biginfolabs.com>.

Hi Jorn,

I read the document
http://opennlp.apache.org/documentation/manual/opennlp.html#tools.namefind.recognition.cmdline.
But i felt i needed more information to put it in code.

I got to know that we need to train the model. But could not get it.
Can you please explain it. so that i could start implementing it.

Thanks,
Vivek

Thanks,
Vivek


On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
>
>> Hi,
>>
>> If i use a query like this in command line
>>
>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt> <output.txt>
>>
>> I'll get person names printed in output.txt but I want to write own models
>> such that i should print my own entities.
>>
>> E.g.
>>
>> 1. what is the risk value on icm2500.
>> 2. Delivery of prd_234 will be arrived late.
>> 3. Watson is handling router_34.
>>
>> If i pass these lines, it should parse and extract product_entities.
>> icm2500, prd_234, router_34... etc these are all Products( we can save
>> this
>> information in a file and we can use it as look up kind of for models or
>> openNLP).
>>
>> Can anyone please tel me how to do this  ?
>>
>>
> You need to train your own model. To do that you have to collect some of
> the texts
> and annotate them with the entities you wish to detect.
>
> Have a look at the documentation about the name finder. It explains how to
> the training
> works.
>
> HTH,
> Jörn
>

Re: Writing our own models in openNLP.

Posted by Jörn Kottmann <ko...@gmail.com>.

On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
> Hi,
>
> If i use a query like this in command line
>
> ./opennlp TokenNameFinder en-ner-person.bin <input.txt> <output.txt>
>
> I'll get person names printed in output.txt but I want to write own models
> such that i should print my own entities.
>
> E.g.
>
> 1. what is the risk value on icm2500.
> 2. Delivery of prd_234 will be arrived late.
> 3. Watson is handling router_34.
>
> If i pass these lines, it should parse and extract product_entities.
> icm2500, prd_234, router_34... etc these are all Products( we can save this
> information in a file and we can use it as look up kind of for models or
> openNLP).
>
> Can anyone please tel me how to do this  ?
>

You need to train your own model. To do that you have to collect some of 
the texts
and annotate them with the entities you wish to detect.

Have a look at the documentation about the name finder. It explains how 
to the training
works.

HTH,
Jörn