You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Jörn Kottmann <ko...@gmail.com> on 2012/03/27 17:13:13 UTC
Coreference Training on MUC 6/7 data
Hi all,
I would like to figure out how the coref component can be trained
on MUC 6 and 7 data. Does anybody know how to do that?
After searching for information in the forum and doing quite some
reverse engineering I think the process is something like this:
1. Load data via MUC plugin into wordfreak.
Getting wordfreak to work is a bit tricky, there seem to be a few
jar files which
are all have the 2.2 version in the name but are quite different. I
now use a self compiled head version.
2. Perform Named Entity Recognition via the opennlp plugin (I use it
with opennlp 1.4.3)
3. Do Chunking or Parsing (parsing still causes a stack overflow in my
setup, so I only did chunking)
4. Save the file to disk (Make sure it is named correctly, wordfreak
attaches a .txt which must be removed)
5. Do training with the coref opennlp wordfreak plugin via its main method
But I still have a couple of issues.
Wordfreak saves the linked mentions as "mention" annotations which can
then not
be retrieved by coref code (only looks for noun phrases, a mention is
not a noun phrase).
Not sure how this is supposed to work, do I have to write some code to
merge mentions
and the added noun phrases? Or is there some kind of trick I don't know yet?
Parsing in wordfreak does not work because of a stack overflow.
It looks like there is no util to do the actual coref resolution when
only a shallow parse
was used to train it.
Any hints are very welcome.
Jörn