You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Amal Elmah <am...@hotmail.com> on 2011/06/20 03:47:39 UTC

OpenNLP tool for NameFinder

Hi OpenNLP team,
 
I used the command line training tool for NameFinder .So, I used the following command:
$bin/opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data en-ner-person.train -model en-ner-person.bin
 
I do not know from where can I get the en-ner-person.train . So, I made a trining file (training.txt) and add training data as follows:
 
<START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .
 
My Questions are:
1- How can I add features if I want to use the command line training tool not API? Can you please give me an example if this is possible!
 
2- Can we add features to the training data I mean with the annotation <START: person feature=value>
 
3- Does Opennlp tool have a way to generate these features automatically from the training data?
 
thanks alot 
Amal
 
 
 		 	   		  

RE: OpenNLP tool for NameFinder

Posted by Amal Elmah <am...@hotmail.com>.
Is there anyway to learn how to build a new feature extractor ? just in case I need to do that!
what does AFAIK mean?
and can you explain what do you mean by source tree? I am sorry if this question is silly but I am new in the NLP field ?
 
thanks alot

 

> From: olivier.grisel@ensta.org
> Date: Mon, 20 Jun 2011 17:40:08 +0200
> Subject: Re: OpenNLP tool for NameFinder
> To: opennlp-users@incubator.apache.org
> 
> 2011/6/20 Amal Elmah <am...@hotmail.com>:
> >
> > thanks for replying
> >
> > What I need to do is to make a new model that can extracts the names of recipes in specific website for cooking
> > could you please correct me if I made any wrong :
> >
> > - first, I made a training file (training.txt) in this file I chose a lot of sentences that contain recipe name. I put each sentence in one line for example
> > <START>Shortbread <END> is an easy  buttery biscuits as homemade Christmas presents .
> > ... etc
> >
> > - then I use the command line training tool to generate the new model
> > - After that I will use this model in my application to deal with any new page from this cooking website.
> > - the features will be extracted automatically by Opennlp so I do not need to specify that just I nedd to provide as many training data as I can (this is what I understood)
> >
> > Are all my steps right?
> 
> Yes but I am not sure that the name finder will be able to find good
> models for this problem.
> 
> > Do I need to do anything to make the results more accurate?
> 
> Probably more annotated data :)
> 
> You could also build your own feature extractor with a list of well
> know recipes names coming from a thesaurus (a.k.a. a gazetteer) but
> this would require a bit of programming with the OpenNLP API (AFAIK
> there is no such Gazetteer feature extractor implemented in the source
> tree so far).
> 
> -- 
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
 		 	   		  

RE: OpenNLP tool for NameFinder

Posted by Amal Elmah <am...@hotmail.com>.
thanks guy
 
Can you please give me a simple example of making dictionary and feature generator? and what are their format ( XML format or what)?
Is there any tutorial of making or using them inside the application?
 
   
 

> Date: Wed, 22 Jun 2011 08:13:44 +0200
> From: kottmann@gmail.com
> To: opennlp-users@incubator.apache.org
> Subject: Re: OpenNLP tool for NameFinder
> 
> On 6/20/11 5:40 PM, Olivier Grisel wrote:
> > (AFAIK
> > there is no such Gazetteer feature extractor implemented in the source
> > tree so far)
> The Dictionary Feature Generator can generate features based on a name
> list, but the feature generation might needs to be improved. I actually 
> worked
> a little on that, but did not check in my changes yet.
> 
> The new custom feature generation can be used to define even multiple 
> dictionary
> feature generators and the dictionaries will be placed in the model package.
> 
> Jörn
 		 	   		  

Re: OpenNLP tool for NameFinder

Posted by Jörn Kottmann <ko...@gmail.com>.
On 6/20/11 5:40 PM, Olivier Grisel wrote:
> (AFAIK
> there is no such Gazetteer feature extractor implemented in the source
> tree so far)
The Dictionary Feature Generator can generate features based on a name
list, but the feature generation might needs to be improved. I actually 
worked
a little on that, but did not check in my changes yet.

The new custom feature generation can be used to define even multiple 
dictionary
feature generators and the dictionaries will be placed in the model package.

Jörn

Re: OpenNLP tool for NameFinder

Posted by Olivier Grisel <ol...@ensta.org>.
2011/6/20 Amal Elmah <am...@hotmail.com>:
>
> thanks for replying
>
> What I need to do is to make a new model that can extracts the names of recipes in specific website for cooking
> could you please correct me if I made any wrong :
>
> - first, I made a training file (training.txt) in this file I chose a lot of sentences that contain recipe name. I put each sentence in one line for example
> <START>Shortbread <END> is an easy  buttery biscuits as homemade Christmas presents .
> ... etc
>
> - then I use the command line training tool to generate the new model
> - After that I will use this model in my application to deal with any new page from this cooking website.
> - the features will be extracted automatically by Opennlp so I do not need to specify that just I nedd to provide as many training data as I can (this is what I understood)
>
> Are all my steps right?

Yes but I am not sure that the name finder will be able to find good
models for this problem.

> Do I need to do anything to make the results more accurate?

Probably more annotated data :)

You could also build your own feature extractor with a list of well
know recipes names coming from a thesaurus (a.k.a. a gazetteer) but
this would require a bit of programming with the OpenNLP API (AFAIK
there is no such Gazetteer feature extractor implemented in the source
tree so far).

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: OpenNLP tool for NameFinder

Posted by Olivier Grisel <ol...@ensta.org>.
2011/6/20 Alexandre Patry <al...@nlpfu.com>:
> Maybe you do not need to use NLP for your task. Recipe websites often render
> all recipes using similar html structures, it can be simpler to just create
> a program for each website that will extract the recipe title from the html
> DOM.
>
> I do not know which websites you want to extract recipes from, but if they
> use the hRecipe micro-format[1], the same extraction code will do in all
> places.

+1

You should also have a look at http://scraperwiki.com/

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: OpenNLP tool for NameFinder

Posted by Alexandre Patry <al...@nlpfu.com>.
Maybe you do not need to use NLP for your task. Recipe websites often 
render all recipes using similar html structures, it can be simpler to 
just create a program for each website that will extract the recipe 
title from the html DOM.

I do not know which websites you want to extract recipes from, but if 
they use the hRecipe micro-format[1], the same extraction code will do 
in all places.

Hth,

Alexandre

[1] http://microformats.org/wiki/hrecipe

On 11-06-20 11:31 AM, Amal Elmah wrote:
> thanks for replying
>
> What I need to do is to make a new model that can extracts the names of recipes in specific website for cooking
> could you please correct me if I made any wrong :
>
> - first, I made a training file (training.txt) in this file I chose a lot of sentences that contain recipe name. I put each sentence in one line for example
> <START>Shortbread<END>  is an easy  buttery biscuits as homemade Christmas presents .
> ... etc
>
> - then I use the command line training tool to generate the new model
> - After that I will use this model in my application to deal with any new page from this cooking website.
> - the features will be extracted automatically by Opennlp so I do not need to specify that just I nedd to provide as many training data as I can (this is what I understood)
>
> Are all my steps right?
> Do I need to do anything to make the results more accurate?
> I appreciate your help
>
> Best,
> Amal
>
>
>
>
>> From: olivier.grisel@ensta.org
>> Date: Mon, 20 Jun 2011 10:06:23 +0200
>> Subject: Re: OpenNLP tool for NameFinder
>> To: opennlp-users@incubator.apache.org
>>
>> 2011/6/20 Amal Elmah<am...@hotmail.com>:
>>> Hi OpenNLP team,
>>>
>>> I used the command line training tool for NameFinder .So, I used the following command:
>>> $bin/opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data en-ner-person.train -model en-ner-person.bin
>>>
>>> I do not know from where can I get the en-ner-person.train . So, I made a trining file (training.txt) and add training data as follows:
>>>
>>> <START:person>  Pierre Vinken<END>  , 61 years old , will join the board as a nonexecutive director Nov. 29 .
>>> Mr .<START:person>  Vinken<END>  is chairman of Elsevier N.V. , the Dutch publishing group .
>>>
>>> My Questions are:
>>> 1- How can I add features if I want to use the command line training tool not API? Can you please give me an example if this is possible!
>> AFAIK in the current state feature extraction is only customizable
>> through the API.
>>
>>> 2- Can we add features to the training data I mean with the annotation<START: person feature=value>
>> No. What would be the use case? Can you give a concrete example of
>> such a manual feature annotation? What goal do you want to achieve
>> with such annotations?
>>
>>> 3- Does Opennlp tool have a way to generate these features automatically from the training data?
>> OpenNLP already generates its feature automatically by combining
>> several feature extractors as in:
>>
>> https://svn.apache.org/repos/asf/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/namefind/DefaultNameContextGenerator.java
>>
>> All those feature extractors do not expect any kind of many
>> annotations. This is expected since in general the text you want to
>> analyze with a NameFinde instance will not have any kind of
>> annotations.
>>
>> -- 
>> Olivier
>> http://twitter.com/ogrisel - http://github.com/ogrisel
>   		 	   		


RE: OpenNLP tool for NameFinder

Posted by Amal Elmah <am...@hotmail.com>.
thanks for replying
 
What I need to do is to make a new model that can extracts the names of recipes in specific website for cooking 
could you please correct me if I made any wrong :
 
- first, I made a training file (training.txt) in this file I chose a lot of sentences that contain recipe name. I put each sentence in one line for example 
<START>Shortbread <END> is an easy  buttery biscuits as homemade Christmas presents .
... etc 
 
- then I use the command line training tool to generate the new model 
- After that I will use this model in my application to deal with any new page from this cooking website.
- the features will be extracted automatically by Opennlp so I do not need to specify that just I nedd to provide as many training data as I can (this is what I understood)
 
Are all my steps right? 
Do I need to do anything to make the results more accurate?
I appreciate your help
 
Best,
Amal
 

 

> From: olivier.grisel@ensta.org
> Date: Mon, 20 Jun 2011 10:06:23 +0200
> Subject: Re: OpenNLP tool for NameFinder
> To: opennlp-users@incubator.apache.org
> 
> 2011/6/20 Amal Elmah <am...@hotmail.com>:
> >
> > Hi OpenNLP team,
> >
> > I used the command line training tool for NameFinder .So, I used the following command:
> > $bin/opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data en-ner-person.train -model en-ner-person.bin
> >
> > I do not know from where can I get the en-ner-person.train . So, I made a trining file (training.txt) and add training data as follows:
> >
> > <START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
> > Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .
> >
> > My Questions are:
> > 1- How can I add features if I want to use the command line training tool not API? Can you please give me an example if this is possible!
> 
> AFAIK in the current state feature extraction is only customizable
> through the API.
> 
> > 2- Can we add features to the training data I mean with the annotation <START: person feature=value>
> 
> No. What would be the use case? Can you give a concrete example of
> such a manual feature annotation? What goal do you want to achieve
> with such annotations?
> 
> > 3- Does Opennlp tool have a way to generate these features automatically from the training data?
> 
> OpenNLP already generates its feature automatically by combining
> several feature extractors as in:
> 
> https://svn.apache.org/repos/asf/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/namefind/DefaultNameContextGenerator.java
> 
> All those feature extractors do not expect any kind of many
> annotations. This is expected since in general the text you want to
> analyze with a NameFinde instance will not have any kind of
> annotations.
> 
> -- 
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
 		 	   		  

Re: OpenNLP tool for NameFinder

Posted by Olivier Grisel <ol...@ensta.org>.
2011/6/20 Amal Elmah <am...@hotmail.com>:
>
> Hi OpenNLP team,
>
> I used the command line training tool for NameFinder .So, I used the following command:
> $bin/opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data en-ner-person.train -model en-ner-person.bin
>
> I do not know from where can I get the en-ner-person.train . So, I made a trining file (training.txt) and add training data as follows:
>
> <START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 .
> Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group .
>
> My Questions are:
> 1- How can I add features if I want to use the command line training tool not API? Can you please give me an example if this is possible!

AFAIK in the current state feature extraction is only customizable
through the API.

> 2- Can we add features to the training data I mean with the annotation <START: person feature=value>

No. What would be the use case? Can you give a concrete example of
such a manual feature annotation? What goal do you want to achieve
with such annotations?

> 3- Does Opennlp tool have a way to generate these features automatically from the training data?

OpenNLP already generates its feature automatically by combining
several feature extractors as in:

https://svn.apache.org/repos/asf/incubator/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/namefind/DefaultNameContextGenerator.java

All those feature extractors do not expect any kind of many
annotations. This is expected since in general the text you want to
analyze with a NameFinde instance will not have any kind of
annotations.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel