You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Ninaad Joshi <ni...@gmail.com> on 2011/03/04 05:32:45 UTC
Model training error
Hi,
i am trying to train a job title model with a training data file. But, when
i try to train, i get the following error:
Exception in thread "main" java.lang.IllegalArgumentException: Model not
compatible with name finder!
at
opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
at App.Train(App.java:49)
at App.main(App.java:33)
My training code looks like this:
public void Train() throws IOException
{
String trainFilename = "training\\Designation-train.txt";
String modelFile = "models\\en-ner-designation.bin";
ObjectStream<String> lineStream = new PlainTextByLineStream(new
FileInputStream(trainFilename), "UTF-8");
ObjectStream<NameSample> sampleStream = new
NameSampleDataStream(lineStream);
TokenNameFinderModel model = NameFinderME.train("en", "designation",
sampleStream, Collections.<String, Object>emptyMap(), 10, 5);
BufferedOutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new
FileOutputStream(modelFile));
model.serialize(modelOut);
} finally {
if (modelOut != null)
modelOut.close();
}
}
My training file looks like this. I have set cutoff to 5 and hence including
the token 5 times.
<START:designation>Chief Operating Officer<END>
<START:designation>Chief Operating Officer<END>
<START:designation>Chief Operating Officer<END>
<START:designation>Chief Operating Officer<END>
<START:designation>Chief Operating Officer<END>
<START:designation>COO<END>
<START:designation>COO<END>
<START:designation>COO<END>
<START:designation>COO<END>
<START:designation>COO<END>
I tried putting some text before and after the tokens but same result. I
even tried adding more tokens but same result. Would appreciate if someone
can help me out. Thanks in advance for your help.
regards,
Ninaad
Re: Model training error
Posted by Jörn Kottmann <ko...@gmail.com>.
Hello,
your training data should also contain non job title tokens to be able
to work. Anyway the exception you are getting should not be thrown.
We already fix a bug related to that, can you tell us which version
you are using?
The training data format is actually:
<START:designation> COO <END>
See the two extra spaces.
There is no reason to write it 5 times into your training
file, you can just train with a cutoff 0.
If you are not using a new snapshot version already,
then give our 1.5.1 release candidate a try and see if the
bug is fixed there:
http://people.apache.org/~joern/releases/opennlp-1.5.1-incubating/rc2/
Hope that helps,
Jörn
On 3/4/11 5:32 AM, Ninaad Joshi wrote:
> Hi,
>
> i am trying to train a job title model with a training data file. But, when
> i try to train, i get the following error:
>
> Exception in thread "main" java.lang.IllegalArgumentException: Model not
> compatible with name finder!
> at
> opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
> at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
> at App.Train(App.java:49)
> at App.main(App.java:33)
>
> My training code looks like this:
>
> public void Train() throws IOException
> {
> String trainFilename = "training\\Designation-train.txt";
> String modelFile = "models\\en-ner-designation.bin";
>
> ObjectStream<String> lineStream = new PlainTextByLineStream(new
> FileInputStream(trainFilename), "UTF-8");
> ObjectStream<NameSample> sampleStream = new
> NameSampleDataStream(lineStream);
>
> TokenNameFinderModel model = NameFinderME.train("en", "designation",
> sampleStream, Collections.<String, Object>emptyMap(), 10, 5);
> BufferedOutputStream modelOut = null;
>
> try {
> modelOut = new BufferedOutputStream(new
> FileOutputStream(modelFile));
> model.serialize(modelOut);
> } finally {
> if (modelOut != null)
> modelOut.close();
> }
> }
>
>
> My training file looks like this. I have set cutoff to 5 and hence including
> the token 5 times.
>
> <START:designation>Chief Operating Officer<END>
> <START:designation>Chief Operating Officer<END>
> <START:designation>Chief Operating Officer<END>
> <START:designation>Chief Operating Officer<END>
> <START:designation>Chief Operating Officer<END>
> <START:designation>COO<END>
> <START:designation>COO<END>
> <START:designation>COO<END>
> <START:designation>COO<END>
> <START:designation>COO<END>
>
> I tried putting some text before and after the tokens but same result. I
> even tried adding more tokens but same result. Would appreciate if someone
> can help me out. Thanks in advance for your help.
>
> regards,
> Ninaad
>