You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by James <ja...@gmail.com> on 2011/08/13 10:37:55 UTC

Using Dictionaries and Pattern Matches

Hi All,

If I have some training data such as:

The weather on Thursday will have <START:strength> Force 4 <END> winds
blowing <START:direction> North-North-West <END> .
Friday will be wild with <START:strength> force 8 <END> winds coming
from the <START:direction> North <END> .
Saturday will continue to be wild with <START:strength> gale force
<END> winds continuing from the <START:direction> North <END> .

I would like to use a dictionary to help the name finder find
directions and a pattern to help the name finder find strengths.
I am currently training the model for both strength and direction
together. And I extract the entities together as well.

How would I go about configuring the required dictionaries and
patterns for these entities?  I get the feeling that features should
help me do this but I am completely baffled with how to start
integrating those and how they would work for the different entities.

Please provide me with some guidance or pointers.

Thank you for your time,

-James.

Re: Using Dictionaries and Pattern Matches

Posted by Jörn Kottmann <ko...@gmail.com>.
On 8/13/11 10:37 AM, James wrote:
> I would like to use a dictionary to help the name finder find
> directions and a pattern to help the name finder find strengths.
> I am currently training the model for both strength and direction
> together. And I extract the entities together as well.

I also tried this once, but believe that our current setup works better
when you train a model per type. You can easily verify if it makes
a difference for you with our built-i n evaluation.
> How would I go about configuring the required dictionaries and
> patterns for these entities?  I get the feeling that features should
> help me do this but I am completely baffled with how to start
> integrating those and how they would work for the different entities.

I suggest that you write a custom feature generator to experiment
with, we still do not have dictionary support out of the box, but it
can very easily be added.

Do you know how to implement your own feature generator?

In our next version (1.5.2) we will have configurable feature generation
for the name finder with an xml file, that will ease up your use case.
Maybe you want to try this out, for that you need to get the current trunk
version. In case you use maven you can just depend on our snapshot version.

Jörn