You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Ryan Josal <rj...@gmail.com> on 2013/08/15 00:46:10 UTC
Creating address NER model.
I want to train a model to detect addresses in English text, because I think I may get better results than a RegexNameFinder if there are many variations, though I will compare the results. Is there somewhere I might be able to get a corpus of annotated text for this, or else just a corpus of text and something that can automatically annotate addresses? I looked at OANC, but it looks like there's not much I can do with GrAF format.
Ryan
Re: Creating address NER model.
Posted by Ryan Josal <rj...@gmail.com>.
I meant I thought I would need 15000, but glad to hear I won't. I got that number from some documentation somewhere. Bootstrapping a model to get there is a good idea though, and I may try it. I will look at brat, but it may work well enough in vim for a few hundred.
Ryan
On Aug 15, 2013, at 6:22, Jörn Kottmann <ko...@gmail.com> wrote:
> On 08/15/2013 03:19 PM, Ryan Josal wrote:
>> That's great, thanks. I was thinking I would need 15000 annotations.
>
> That should work nicely. You can probably bootstrap a model to assist with the annotation with a few hundred annotations already.
>
> Are you planning on using brat to do the job?
>
> Jörn
Re: Creating address NER model.
Posted by Jörn Kottmann <ko...@gmail.com>.
On 08/15/2013 03:19 PM, Ryan Josal wrote:
> That's great, thanks. I was thinking I would need 15000 annotations.
>
That should work nicely. You can probably bootstrap a model to assist
with the annotation with a few hundred annotations already.
Are you planning on using brat to do the job?
Jörn
Re: Creating address NER model.
Posted by Ryan Josal <rj...@gmail.com>.
That's great, thanks. I was thinking I would need 15000 annotations.
Ryan
On Aug 15, 2013, at 0:29, Jörn Kottmann <ko...@gmail.com> wrote:
> On 08/15/2013 12:46 AM, Ryan Josal wrote:
>> I want to train a model to detect addresses in English text, because I think I may get better results than a RegexNameFinder if there are many variations, though I will compare the results. Is there somewhere I might be able to get a corpus of annotated text for this, or else just a corpus of text and something that can automatically annotate addresses?
>
> The learn-able Name Finder is quite good at detecting addresses. You will probably get the best result when you take your own data and annotate a few hundred of your documents.
>
> HTH,
> Jörn
Re: Creating address NER model.
Posted by Jörn Kottmann <ko...@gmail.com>.
On 08/15/2013 12:46 AM, Ryan Josal wrote:
> I want to train a model to detect addresses in English text, because I think I may get better results than a RegexNameFinder if there are many variations, though I will compare the results. Is there somewhere I might be able to get a corpus of annotated text for this, or else just a corpus of text and something that can automatically annotate addresses?
The learn-able Name Finder is quite good at detecting addresses. You
will probably get the best result when you take your own data and
annotate a few hundred of your documents.
HTH,
Jörn