You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by Jörn Kottmann <ko...@gmail.com> on 2013/04/17 23:31:36 UTC

Proposal: Move the coref component into the sandbox

Hi all,

I am proposing that we move the coref component into the sandbox until 
we manage
to train and test it on a publicly available dataset. In the current 
state it is complicated to maintain the
code because without training it can't be tested properly, which makes 
bigger changes on OpenNLP
difficult, for example the maxent refactoring.

I tried to implement parsers for the MUC corpus and added training code, 
but it does not yet work as
well as the current models on SourceForge. More work is needed to get 
everything fixed.

Additionally the code should be refactored like the other components in 
OpenNLP,
e.g. one model instantiation, build in evaluation, simple training, etc. 
There is a jira issue with
all the details.

Any opinions?

Jörn

Re: Proposal: Move the coref component into the sandbox

Posted by William Colen <wi...@gmail.com>.

+1


On Wed, Apr 17, 2013 at 11:07 PM, Jason Baldridge
<ja...@gmail.com>wrote:

> +1 to doing this. I already removed that from Chalk for similar reasons.
> Also, the best way to do coreference these days is to build on the
> rule-based sieve approach given in this paper:
>
> http://www.mitpressjournals.org/doi/abs/10.1162/COLI_a_00152
>
> -Jason
>
>
> On Wed, Apr 17, 2013 at 4:31 PM, Jörn Kottmann <ko...@gmail.com> wrote:
>
> > Hi all,
> >
> > I am proposing that we move the coref component into the sandbox until we
> > manage
> > to train and test it on a publicly available dataset. In the current
> state
> > it is complicated to maintain the
> > code because without training it can't be tested properly, which makes
> > bigger changes on OpenNLP
> > difficult, for example the maxent refactoring.
> >
> > I tried to implement parsers for the MUC corpus and added training code,
> > but it does not yet work as
> > well as the current models on SourceForge. More work is needed to get
> > everything fixed.
> >
> > Additionally the code should be refactored like the other components in
> > OpenNLP,
> > e.g. one model instantiation, build in evaluation, simple training, etc.
> > There is a jira issue with
> > all the details.
> >
> > Any opinions?
> >
> > Jörn
> >
>
>
>
> --
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge
>

Re: Proposal: Move the coref component into the sandbox

Posted by Jason Baldridge <ja...@gmail.com>.

+1 to doing this. I already removed that from Chalk for similar reasons.
Also, the best way to do coreference these days is to build on the
rule-based sieve approach given in this paper:

http://www.mitpressjournals.org/doi/abs/10.1162/COLI_a_00152

-Jason


On Wed, Apr 17, 2013 at 4:31 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> Hi all,
>
> I am proposing that we move the coref component into the sandbox until we
> manage
> to train and test it on a publicly available dataset. In the current state
> it is complicated to maintain the
> code because without training it can't be tested properly, which makes
> bigger changes on OpenNLP
> difficult, for example the maxent refactoring.
>
> I tried to implement parsers for the MUC corpus and added training code,
> but it does not yet work as
> well as the current models on SourceForge. More work is needed to get
> everything fixed.
>
> Additionally the code should be refactored like the other components in
> OpenNLP,
> e.g. one model instantiation, build in evaluation, simple training, etc.
> There is a jira issue with
> all the details.
>
> Any opinions?
>
> Jörn
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge