You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Lance Norskog <go...@gmail.com> on 2012/05/22 07:50:29 UTC

Provenance of models?

What are the sources of the training data for the models on sourceforge?
In particular, for the English language NER models?

-- 
Lance Norskog
goksron@gmail.com

Re: Provenance of models?

Posted by Jörn Kottmann <ko...@gmail.com>.
You can buy MUC 6 and 7 data from LDC. They
cost a few hundred dollars.

There is parsing support for them built into OpenNLP.

We cannot share the data here because that of course would
violate the copyright.

Anyway OntoNotes might be better suited for your needs and
only costs 30 or 50 USD.

Jörn

On 05/22/2012 09:37 PM, Lance Norskog wrote:
> Where are the source files?
>
> On Tue, May 22, 2012 at 12:17 AM, Jörn Kottmann<ko...@gmail.com>  wrote:
>> On 05/22/2012 07:50 AM, Lance Norskog wrote:
>>> What are the sources of the training data for the models on sourceforge?
>>> In particular, for the English language NER models?
>>>
>> That is trained on hand corrected and extended MUC 6/7 training data.
>>
>> Jörn
>
>


Re: Provenance of models?

Posted by Lance Norskog <go...@gmail.com>.
Where are the source files?

On Tue, May 22, 2012 at 12:17 AM, Jörn Kottmann <ko...@gmail.com> wrote:
> On 05/22/2012 07:50 AM, Lance Norskog wrote:
>>
>> What are the sources of the training data for the models on sourceforge?
>> In particular, for the English language NER models?
>>
>
> That is trained on hand corrected and extended MUC 6/7 training data.
>
> Jörn



-- 
Lance Norskog
goksron@gmail.com

Re: Provenance of models?

Posted by Jörn Kottmann <ko...@gmail.com>.
On 05/22/2012 07:50 AM, Lance Norskog wrote:
> What are the sources of the training data for the models on sourceforge?
> In particular, for the English language NER models?
>

That is trained on hand corrected and extended MUC 6/7 training data.

Jörn