You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Ryan Josal <rj...@gmail.com> on 2013/08/15 00:46:10 UTC

Creating address NER model.

I want to train a model to detect addresses in English text, because I think I may get better results than a RegexNameFinder if there are many variations, though I will compare the results.  Is there somewhere I might be able to get a corpus of annotated text for this, or else just a corpus of text and something that can automatically annotate addresses?  I looked at OANC, but it looks like there's not much I can do with GrAF format.

Ryan

Re: Creating address NER model.

Posted by Ryan Josal <rj...@gmail.com>.
I meant I thought I would need 15000, but glad to hear I won't.  I got that number from some documentation somewhere.  Bootstrapping a model to get there is a good idea though, and I may try it.  I will look at brat, but it may work well enough in vim for a few hundred.

Ryan

On Aug 15, 2013, at 6:22, Jörn Kottmann <ko...@gmail.com> wrote:

> On 08/15/2013 03:19 PM, Ryan Josal wrote:
>> That's great, thanks.  I was thinking I would need 15000 annotations.
> 
> That should work nicely. You can probably bootstrap a model to assist with the annotation with a few hundred annotations already.
> 
> Are you planning on using brat to do the job?
> 
> Jörn

Re: Creating address NER model.

Posted by Jörn Kottmann <ko...@gmail.com>.
On 08/15/2013 03:19 PM, Ryan Josal wrote:
> That's great, thanks.  I was thinking I would need 15000 annotations.
>

That should work nicely. You can probably bootstrap a model to assist 
with the annotation with a few hundred annotations already.

Are you planning on using brat to do the job?

Jörn

Re: Creating address NER model.

Posted by Ryan Josal <rj...@gmail.com>.
That's great, thanks.  I was thinking I would need 15000 annotations.

Ryan

On Aug 15, 2013, at 0:29, Jörn Kottmann <ko...@gmail.com> wrote:

> On 08/15/2013 12:46 AM, Ryan Josal wrote:
>> I want to train a model to detect addresses in English text, because I think I may get better results than a RegexNameFinder if there are many variations, though I will compare the results.  Is there somewhere I might be able to get a corpus of annotated text for this, or else just a corpus of text and something that can automatically annotate addresses?
> 
> The learn-able Name Finder is quite good at detecting addresses. You will probably get the best result when you take your own data and annotate a few hundred of your documents.
> 
> HTH,
> Jörn

Re: Creating address NER model.

Posted by Jörn Kottmann <ko...@gmail.com>.
On 08/15/2013 12:46 AM, Ryan Josal wrote:
> I want to train a model to detect addresses in English text, because I think I may get better results than a RegexNameFinder if there are many variations, though I will compare the results.  Is there somewhere I might be able to get a corpus of annotated text for this, or else just a corpus of text and something that can automatically annotate addresses?

The learn-able Name Finder is quite good at detecting addresses. You 
will probably get the best result when you take your own data and 
annotate a few hundred of your documents.

HTH,
Jörn