You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Josh Patterson <jo...@cloudera.com> on 2011/12/11 05:07:24 UTC

Training

working with the examples and reading:

http://sourceforge.net/apps/mediawiki/opennlp/index.php?title=Sentence_Detector

I've noticed the section on "Training"; Given that the models already
detect things like sentences and POS, in what circumstances would one
want to "train" the model further?

Josh

-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Re: Training

Posted by Johnson J <jo...@gmail.com>.

Josh, I wrongly replied for your question.

On Sun, Dec 11, 2011 at 12:21 PM, Johnson J <jo...@gmail.com>wrote:

> Thanks for the information Josh, I want a model to identify the topic for
> the given website(this is actually for student to identify subject), for
> this I am using document categorizer with my own corpus with nearly 2 GB,
> (for eg: science <space> describing about science) .
>
> Thanks,
> Johnson.
>
> On Sun, Dec 11, 2011 at 9:37 AM, Josh Patterson <jo...@cloudera.com> wrote:
>
>> working with the examples and reading:
>>
>>
>> http://sourceforge.net/apps/mediawiki/opennlp/index.php?title=Sentence_Detector
>>
>> I've noticed the section on "Training"; Given that the models already
>> detect things like sentences and POS, in what circumstances would one
>> want to "train" the model further?
>>
>> Josh
>>
>> --
>> Twitter: @jpatanooga
>> Solution Architect @ Cloudera
>> hadoop: http://www.cloudera.com
>>
>
>

Re: Training

Posted by Johnson J <jo...@gmail.com>.

Thanks for the information Josh, I want a model to identify the topic for
the given website(this is actually for student to identify subject), for
this I am using document categorizer with my own corpus with nearly 2 GB,
(for eg: science <space> describing about science) .

Thanks,
Johnson.

On Sun, Dec 11, 2011 at 9:37 AM, Josh Patterson <jo...@cloudera.com> wrote:

> working with the examples and reading:
>
>
> http://sourceforge.net/apps/mediawiki/opennlp/index.php?title=Sentence_Detector
>
> I've noticed the section on "Training"; Given that the models already
> detect things like sentences and POS, in what circumstances would one
> want to "train" the model further?
>
> Josh
>
> --
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com
>

Re: Training

Posted by Srivatsan Ramanujam <va...@utexas.edu>.

An example of this was given by Andrew Bredenkamp of Acrolynx at the
SAS2011. In the Penn TreeBank corpus the word "object" is a VERB 99% of the
time, but if you are dealing with the SAP corpus, in most cases it refers
to an instance of a class.

On Sun, Dec 11, 2011 at 2:48 PM, Jason Baldridge
<ja...@gmail.com>wrote:

> Yep. Domain adaptation (and dealing with new languages) are as important,
> or more important, in NLP as they are in general for other types of
> problems that are addressed with machine learning. Once we get better at
> injecting better prior information about language (in the general sense)
> into our models, maybe that will start looking better.
>
> On Sun, Dec 11, 2011 at 11:04 AM, Josh Patterson <jo...@cloudera.com>
> wrote:
>
> > ok, that makes more sense. I'm not that familiar with how training
> > affects NLP, but I am versed in training for general ML purposes ---
> > which seems to be the same idea here.
> >
> > Thanks,
> >
> > JP
> >
> > On Sun, Dec 11, 2011 at 9:12 AM, Jason Baldridge
> > <ja...@gmail.com> wrote:
> > > For new domains (e.g. Twitter) and/or new languages, or using more data
> > to
> > > get a better model. -Jason
> > >
> > > On Sat, Dec 10, 2011 at 10:07 PM, Josh Patterson <jo...@cloudera.com>
> > wrote:
> > >
> > >> working with the examples and reading:
> > >>
> > >>
> > >>
> >
> http://sourceforge.net/apps/mediawiki/opennlp/index.php?title=Sentence_Detector
> > >>
> > >> I've noticed the section on "Training"; Given that the models already
> > >> detect things like sentences and POS, in what circumstances would one
> > >> want to "train" the model further?
> > >>
> > >> Josh
> > >>
> > >> --
> > >> Twitter: @jpatanooga
> > >> Solution Architect @ Cloudera
> > >> hadoop: http://www.cloudera.com
> > >>
> > >
> > >
> > >
> > > --
> > > Jason Baldridge
> > > Associate Professor, Department of Linguistics
> > > The University of Texas at Austin
> > > http://www.jasonbaldridge.com
> > > http://twitter.com/jasonbaldridge
> >
> >
> >
> > --
> > Twitter: @jpatanooga
> > Solution Architect @ Cloudera
> > hadoop: http://www.cloudera.com
> >
>
>
>
> --
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge
>

Re: Training

Posted by Jason Baldridge <ja...@gmail.com>.

Yep. Domain adaptation (and dealing with new languages) are as important,
or more important, in NLP as they are in general for other types of
problems that are addressed with machine learning. Once we get better at
injecting better prior information about language (in the general sense)
into our models, maybe that will start looking better.

On Sun, Dec 11, 2011 at 11:04 AM, Josh Patterson <jo...@cloudera.com> wrote:

> ok, that makes more sense. I'm not that familiar with how training
> affects NLP, but I am versed in training for general ML purposes ---
> which seems to be the same idea here.
>
> Thanks,
>
> JP
>
> On Sun, Dec 11, 2011 at 9:12 AM, Jason Baldridge
> <ja...@gmail.com> wrote:
> > For new domains (e.g. Twitter) and/or new languages, or using more data
> to
> > get a better model. -Jason
> >
> > On Sat, Dec 10, 2011 at 10:07 PM, Josh Patterson <jo...@cloudera.com>
> wrote:
> >
> >> working with the examples and reading:
> >>
> >>
> >>
> http://sourceforge.net/apps/mediawiki/opennlp/index.php?title=Sentence_Detector
> >>
> >> I've noticed the section on "Training"; Given that the models already
> >> detect things like sentences and POS, in what circumstances would one
> >> want to "train" the model further?
> >>
> >> Josh
> >>
> >> --
> >> Twitter: @jpatanooga
> >> Solution Architect @ Cloudera
> >> hadoop: http://www.cloudera.com
> >>
> >
> >
> >
> > --
> > Jason Baldridge
> > Associate Professor, Department of Linguistics
> > The University of Texas at Austin
> > http://www.jasonbaldridge.com
> > http://twitter.com/jasonbaldridge
>
>
>
> --
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: Training

Posted by Josh Patterson <jo...@cloudera.com>.

ok, that makes more sense. I'm not that familiar with how training
affects NLP, but I am versed in training for general ML purposes ---
which seems to be the same idea here.

Thanks,

JP

On Sun, Dec 11, 2011 at 9:12 AM, Jason Baldridge
<ja...@gmail.com> wrote:
> For new domains (e.g. Twitter) and/or new languages, or using more data to
> get a better model. -Jason
>
> On Sat, Dec 10, 2011 at 10:07 PM, Josh Patterson <jo...@cloudera.com> wrote:
>
>> working with the examples and reading:
>>
>>
>> http://sourceforge.net/apps/mediawiki/opennlp/index.php?title=Sentence_Detector
>>
>> I've noticed the section on "Training"; Given that the models already
>> detect things like sentences and POS, in what circumstances would one
>> want to "train" the model further?
>>
>> Josh
>>
>> --
>> Twitter: @jpatanooga
>> Solution Architect @ Cloudera
>> hadoop: http://www.cloudera.com
>>
>
>
>
> --
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge



-- 
Twitter: @jpatanooga
Solution Architect @ Cloudera
hadoop: http://www.cloudera.com

Re: Training

Posted by Jason Baldridge <ja...@gmail.com>.

For new domains (e.g. Twitter) and/or new languages, or using more data to
get a better model. -Jason

On Sat, Dec 10, 2011 at 10:07 PM, Josh Patterson <jo...@cloudera.com> wrote:

> working with the examples and reading:
>
>
> http://sourceforge.net/apps/mediawiki/opennlp/index.php?title=Sentence_Detector
>
> I've noticed the section on "Training"; Given that the models already
> detect things like sentences and POS, in what circumstances would one
> want to "train" the model further?
>
> Josh
>
> --
> Twitter: @jpatanooga
> Solution Architect @ Cloudera
> hadoop: http://www.cloudera.com
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge