You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Benedict Holland <be...@gmail.com> on 2017/10/06 15:26:01 UTC

Name training data sentences

Hello all,

I am working on getting together a file with a list of tokenized sentences.
I have a quick question:

Can name training data contain sentences without any tags?

For example, if I had a sentence like

<START:person> Molly <END> enjoys pancakes in the morning .
She does not enjoy being woken up at 4:30 by her cat .

Does the second sentence provide any additional benefit to the ME model?
The answer to this question should probably be in the documentation.

Thanks,
~Ben

Re: Name training data sentences

Posted by Joern Kottmann <ko...@gmail.com>.

Please send us PR to update the documentation. We are always happy
about new contributors.

We have an annotation service which can be called from BRAT. There are
no tools to produce brat output otherwise.

Jörn

On Fri, Oct 6, 2017 at 11:42 PM, Benedict Holland
<be...@gmail.com> wrote:
> Hi All,
>
> That is exactly what I assumed but that information isn't included in the
> documentation.
>
> Also, if there a way to integrate the name finder with brat annotation to
> create brat annotated files using a name finder, that would superb to put
> within the documentation as well.
>
> BTW, this circular analysis is genuinely amazing. I am incredibly
> impressed.
>
> Thanks,
> ~Ben
>
> On Fri, Oct 6, 2017 at 4:13 PM, Gary Underwood <gu...@clinacuity.com>
> wrote:
>
>> I like to think of it as providing examples of what is NOT what you are
>> wanting tags.
>> Gary Underwood
>> gunderwood@clinacuity.com
>>
>>
>>
>> > On Oct 6, 2017, at 3:50 PM, Joern Kottmann <ko...@gmail.com> wrote:
>> >
>> > It is like Daniel says and it is good to have training data that is
>> > close to the data you intend to process with the model.
>> >
>> > Jörn
>> >
>> > On Fri, Oct 6, 2017 at 5:32 PM, Dan Russ <da...@gmail.com> wrote:
>> >> I believe it does.  Every word is classified as “begin”, “inside”, or
>> “outside” - BIO encoding, so an event is generated for “she” and then
>> “does” and then “not” — all of which is classified as “outside”.
>> >>
>> >> Anyone smarter have a comment on this???
>> >> Daniel
>> >>
>> >>
>> >>> On Oct 6, 2017, at 11:26 AM, Benedict Holland <
>> benedict.m.holland@gmail.com> wrote:
>> >>>
>> >>> Hello all,
>> >>>
>> >>> I am working on getting together a file with a list of tokenized
>> sentences.
>> >>> I have a quick question:
>> >>>
>> >>> Can name training data contain sentences without any tags?
>> >>>
>> >>> For example, if I had a sentence like
>> >>>
>> >>> <START:person> Molly <END> enjoys pancakes in the morning .
>> >>> She does not enjoy being woken up at 4:30 by her cat .
>> >>>
>> >>> Does the second sentence provide any additional benefit to the ME
>> model?
>> >>> The answer to this question should probably be in the documentation.
>> >>>
>> >>> Thanks,
>> >>> ~Ben
>> >>
>>
>>

Re: Name training data sentences

Posted by Benedict Holland <be...@gmail.com>.

Hi All,

That is exactly what I assumed but that information isn't included in the
documentation.

Also, if there a way to integrate the name finder with brat annotation to
create brat annotated files using a name finder, that would superb to put
within the documentation as well.

BTW, this circular analysis is genuinely amazing. I am incredibly
impressed.

Thanks,
~Ben

On Fri, Oct 6, 2017 at 4:13 PM, Gary Underwood <gu...@clinacuity.com>
wrote:

> I like to think of it as providing examples of what is NOT what you are
> wanting tags.
> Gary Underwood
> gunderwood@clinacuity.com
>
>
>
> > On Oct 6, 2017, at 3:50 PM, Joern Kottmann <ko...@gmail.com> wrote:
> >
> > It is like Daniel says and it is good to have training data that is
> > close to the data you intend to process with the model.
> >
> > Jörn
> >
> > On Fri, Oct 6, 2017 at 5:32 PM, Dan Russ <da...@gmail.com> wrote:
> >> I believe it does.  Every word is classified as “begin”, “inside”, or
> “outside” - BIO encoding, so an event is generated for “she” and then
> “does” and then “not” — all of which is classified as “outside”.
> >>
> >> Anyone smarter have a comment on this???
> >> Daniel
> >>
> >>
> >>> On Oct 6, 2017, at 11:26 AM, Benedict Holland <
> benedict.m.holland@gmail.com> wrote:
> >>>
> >>> Hello all,
> >>>
> >>> I am working on getting together a file with a list of tokenized
> sentences.
> >>> I have a quick question:
> >>>
> >>> Can name training data contain sentences without any tags?
> >>>
> >>> For example, if I had a sentence like
> >>>
> >>> <START:person> Molly <END> enjoys pancakes in the morning .
> >>> She does not enjoy being woken up at 4:30 by her cat .
> >>>
> >>> Does the second sentence provide any additional benefit to the ME
> model?
> >>> The answer to this question should probably be in the documentation.
> >>>
> >>> Thanks,
> >>> ~Ben
> >>
>
>

Re: Name training data sentences

Posted by Gary Underwood <gu...@clinacuity.com>.

I like to think of it as providing examples of what is NOT what you are wanting tags. 
Gary Underwood
gunderwood@clinacuity.com



> On Oct 6, 2017, at 3:50 PM, Joern Kottmann <ko...@gmail.com> wrote:
> 
> It is like Daniel says and it is good to have training data that is
> close to the data you intend to process with the model.
> 
> Jörn
> 
> On Fri, Oct 6, 2017 at 5:32 PM, Dan Russ <da...@gmail.com> wrote:
>> I believe it does.  Every word is classified as “begin”, “inside”, or “outside” - BIO encoding, so an event is generated for “she” and then “does” and then “not” — all of which is classified as “outside”.
>> 
>> Anyone smarter have a comment on this???
>> Daniel
>> 
>> 
>>> On Oct 6, 2017, at 11:26 AM, Benedict Holland <be...@gmail.com> wrote:
>>> 
>>> Hello all,
>>> 
>>> I am working on getting together a file with a list of tokenized sentences.
>>> I have a quick question:
>>> 
>>> Can name training data contain sentences without any tags?
>>> 
>>> For example, if I had a sentence like
>>> 
>>> <START:person> Molly <END> enjoys pancakes in the morning .
>>> She does not enjoy being woken up at 4:30 by her cat .
>>> 
>>> Does the second sentence provide any additional benefit to the ME model?
>>> The answer to this question should probably be in the documentation.
>>> 
>>> Thanks,
>>> ~Ben
>>

Re: Name training data sentences

Posted by Joern Kottmann <ko...@gmail.com>.

It is like Daniel says and it is good to have training data that is
close to the data you intend to process with the model.

Jörn

On Fri, Oct 6, 2017 at 5:32 PM, Dan Russ <da...@gmail.com> wrote:
> I believe it does.  Every word is classified as “begin”, “inside”, or “outside” - BIO encoding, so an event is generated for “she” and then “does” and then “not” — all of which is classified as “outside”.
>
> Anyone smarter have a comment on this???
> Daniel
>
>
>> On Oct 6, 2017, at 11:26 AM, Benedict Holland <be...@gmail.com> wrote:
>>
>> Hello all,
>>
>> I am working on getting together a file with a list of tokenized sentences.
>> I have a quick question:
>>
>> Can name training data contain sentences without any tags?
>>
>> For example, if I had a sentence like
>>
>> <START:person> Molly <END> enjoys pancakes in the morning .
>> She does not enjoy being woken up at 4:30 by her cat .
>>
>> Does the second sentence provide any additional benefit to the ME model?
>> The answer to this question should probably be in the documentation.
>>
>> Thanks,
>> ~Ben
>

Re: Name training data sentences

Posted by Dan Russ <da...@gmail.com>.

I believe it does.  Every word is classified as “begin”, “inside”, or “outside” - BIO encoding, so an event is generated for “she” and then “does” and then “not” — all of which is classified as “outside”.

Anyone smarter have a comment on this???
Daniel

> On Oct 6, 2017, at 11:26 AM, Benedict Holland <be...@gmail.com> wrote:
> 
> Hello all,
> 
> I am working on getting together a file with a list of tokenized sentences.
> I have a quick question:
> 
> Can name training data contain sentences without any tags?
> 
> For example, if I had a sentence like
> 
> <START:person> Molly <END> enjoys pancakes in the morning .
> She does not enjoy being woken up at 4:30 by her cat .
> 
> Does the second sentence provide any additional benefit to the ME model?
> The answer to this question should probably be in the documentation.
> 
> Thanks,
> ~Ben