You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Ninaad Joshi <ni...@gmail.com> on 2011/03/04 05:32:45 UTC

Model training error

Hi,

i am trying to train a job title model with a training data file. But, when
i try to train, i get the following error:

Exception in thread "main" java.lang.IllegalArgumentException: Model not
compatible with name finder!
    at
opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
    at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
    at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
    at App.Train(App.java:49)
    at App.main(App.java:33)

My training code looks like this:

    public void Train() throws IOException
    {
        String trainFilename = "training\\Designation-train.txt";
        String modelFile = "models\\en-ner-designation.bin";

        ObjectStream<String> lineStream = new PlainTextByLineStream(new
FileInputStream(trainFilename), "UTF-8");
        ObjectStream<NameSample> sampleStream = new
NameSampleDataStream(lineStream);

        TokenNameFinderModel model = NameFinderME.train("en", "designation",
sampleStream, Collections.<String, Object>emptyMap(), 10, 5);
        BufferedOutputStream modelOut = null;

        try {
          modelOut = new BufferedOutputStream(new
FileOutputStream(modelFile));
          model.serialize(modelOut);
        } finally {
          if (modelOut != null)
             modelOut.close();
        }
    }


My training file looks like this. I have set cutoff to 5 and hence including
the token 5 times.

<START:designation>Chief Operating Officer<END>
<START:designation>Chief Operating Officer<END>
<START:designation>Chief Operating Officer<END>
<START:designation>Chief Operating Officer<END>
<START:designation>Chief Operating Officer<END>
<START:designation>COO<END>
<START:designation>COO<END>
<START:designation>COO<END>
<START:designation>COO<END>
<START:designation>COO<END>

I tried putting some text before and after the tokens but same result. I
even tried adding more tokens but same result. Would appreciate if someone
can help me out. Thanks in advance for your help.

regards,
Ninaad

Re: Model training error

Posted by Jörn Kottmann <ko...@gmail.com>.
Hello,

your training data should also contain non job title tokens to be able
to work. Anyway the exception you are getting should not be thrown.
We already fix a bug related to that, can you tell us which version
you are using?

The training data format is actually:
<START:designation> COO <END>

See the two extra spaces.
There is no reason to write it 5 times into your training
file, you can just train with a cutoff 0.

If you are not using a new snapshot version already,
then give our 1.5.1 release candidate a try and see if the
bug is fixed there:
http://people.apache.org/~joern/releases/opennlp-1.5.1-incubating/rc2/

Hope that helps,
Jörn

On 3/4/11 5:32 AM, Ninaad Joshi wrote:
> Hi,
>
> i am trying to train a job title model with a training data file. But, when
> i try to train, i get the following error:
>
> Exception in thread "main" java.lang.IllegalArgumentException: Model not
> compatible with name finder!
>      at
> opennlp.tools.namefind.TokenNameFinderModel.<init>(TokenNameFinderModel.java:50)
>      at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:350)
>      at opennlp.tools.namefind.NameFinderME.train(NameFinderME.java:356)
>      at App.Train(App.java:49)
>      at App.main(App.java:33)
>
> My training code looks like this:
>
>      public void Train() throws IOException
>      {
>          String trainFilename = "training\\Designation-train.txt";
>          String modelFile = "models\\en-ner-designation.bin";
>
>          ObjectStream<String>  lineStream = new PlainTextByLineStream(new
> FileInputStream(trainFilename), "UTF-8");
>          ObjectStream<NameSample>  sampleStream = new
> NameSampleDataStream(lineStream);
>
>          TokenNameFinderModel model = NameFinderME.train("en", "designation",
> sampleStream, Collections.<String, Object>emptyMap(), 10, 5);
>          BufferedOutputStream modelOut = null;
>
>          try {
>            modelOut = new BufferedOutputStream(new
> FileOutputStream(modelFile));
>            model.serialize(modelOut);
>          } finally {
>            if (modelOut != null)
>               modelOut.close();
>          }
>      }
>
>
> My training file looks like this. I have set cutoff to 5 and hence including
> the token 5 times.
>
> <START:designation>Chief Operating Officer<END>
> <START:designation>Chief Operating Officer<END>
> <START:designation>Chief Operating Officer<END>
> <START:designation>Chief Operating Officer<END>
> <START:designation>Chief Operating Officer<END>
> <START:designation>COO<END>
> <START:designation>COO<END>
> <START:designation>COO<END>
> <START:designation>COO<END>
> <START:designation>COO<END>
>
> I tried putting some text before and after the tokens but same result. I
> even tried adding more tokens but same result. Would appreciate if someone
> can help me out. Thanks in advance for your help.
>
> regards,
> Ninaad
>