You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Nicolas Hernandez (Created) (JIRA)" <ji...@apache.org> on 2011/10/06 17:47:29 UTC

[jira] [Created] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Evaluator and CrossValidator programs of the main analyzers throw exceptions
----------------------------------------------------------------------------

                 Key: OPENNLP-316
                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
             Project: OpenNLP
          Issue Type: Bug
          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
    Affects Versions: tools-1.5.2-incubating
         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011

java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

            Reporter: Nicolas Hernandez


Evaluator and CrossValidator programs of the main analyzers throw an exception when running

(test performed on the 1.5.3 dist via command line)

It seems that the SentenceDetector, Tokenizer, PosTagger and the
chunker (at least) throw a java.lang.NullPointerException if the
misclassified parameter is set to false or not present for the
Evaluator programs. 
The Evaluator programs works (provide a result) when the
misclassified parameter is set.
The CrossValidator programs do not work at all.

I have not test the other opennlp programs.

See below some example of the runs.
I tested on the examples from the documentation and also with my data. 
For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
Tell if you want more details or anything

$opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
data/model/fr-sent.bin -data data/test/fr-sent.test
Loading Sentence Detector model ... done (0,013s)
Evaluating ...  in thread "main" java.lang.NullPointerException
       at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
       at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
       at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
       at opennlp.tools.cmdline.CLI.main(CLI.java:191)

$opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
data/train/fr-sent.train -misclassified true
Indexing events using cutoff of 5

       Computing event counts...  done. 0 events
       Indexing...  done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
       at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
       at opennlp.maxent.GIS.trainModel(GIS.java:256)
       at opennlp.model.TrainUtil.train(TrainUtil.java:182)
       at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
       at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
       at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
       at opennlp.tools.cmdline.CLI.main(CLI.java:191)

$ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
data/model/fr-token.bin -data data/test/fr-token.test
Loading Tokenizer model ... done (0,428s)
Evaluating ... Exception in thread "main" java.lang.NullPointerException
       at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
       at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
       at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
       at opennlp.tools.cmdline.CLI.main(CLI.java:191)

$ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
data/train/fr-token.train
Indexing events using cutoff of 5
       Computing event counts...  done. 100333 events
       Indexing...  done.
Sorting and merging events... done. Reduced 100333 events to 30168.
Done indexing.
Incorporating indexed data for training...
done.
       Number of Event Tokens: 30168
           Number of Outcomes: 2
         Number of Predicates: 8287
...done.
Computing model parameters ...
Performing 100 iterations.
 1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
 2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
...
 98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
 99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
Exception in thread "main" java.lang.NullPointerException
       at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
       at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
       at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
       at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "William Colen (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Colen reassigned OPENNLP-316:
-------------------------------------

    Assignee: William Colen
    
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "William Colen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122885#comment-13122885 ] 

William Colen edited comment on OPENNLP-316 at 10/7/11 3:42 PM:
----------------------------------------------------------------

No. The Chunker training format is based on CONLL 2000: http://www.cnts.ua.ac.be/conll2000/chunking

The O tag is just a coincidence. You can use any tag you want, but it should follow the B-<tag>, I-<tag>, O convention if you don't want to customize the OpenNLP code.
                
      was (Author: colen):
    No. The Chunker training format is based on CONLL 2000: http://www.cnts.ua.ac.be/conll2000/chunking

The O tag is just a coincidence. You can use any tag you want, but it should follow the B-<tag> I-<tag> convention if you don't want to customize the OpenNLP code.
                  
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122060#comment-13122060 ] 

William Colen commented on OPENNLP-316:
---------------------------------------

Looks like it is related to the new constructor we have in CrossValidator/Evaluator API, that uses variable-length array. If we pass null in a  "TokenizerEvaluationMonitor ... listeners" argument for example, it will create an array with one element with null value.

We have two options: or we change all constructors to validate if the listener is of size 1 and the only element is null, or we change the Evaluator class to check if if any listener is null before using it.

I prefer the last option. What do you think, Jörn?
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Closed] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Nicolas Hernandez (Closed) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Hernandez closed OPENNLP-316.
-------------------------------------

    Resolution: Fixed
    
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Nicolas Hernandez (Reopened) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Hernandez reopened OPENNLP-316:
---------------------------------------


Evaluators (SentenceDetector, Tokenizer, PosTagger and Chunker) work. 
But the problem with the CrossValidators remains. 
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "William Colen (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Colen resolved OPENNLP-316.
-----------------------------------

    Resolution: Fixed

Issue fixed. 

Nicolas Hernandez,
Can you try it and close if it is OK?

Thanks
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124373#comment-13124373 ] 

William Colen commented on OPENNLP-316:
---------------------------------------

Hi Nicolas,

As far as I know the format is exactly the same for the 3 tools. I checked the code and we load the Corpus using the same classes for Trainer, Evaluator or the CrossValidator. Can you please detail a little more the issue?

For the sentence detector and name finder there is an empty line to represent document boundary. It is described in the [documentation|http://incubator.apache.org/opennlp/documentation/manual/opennlp.html].

I didn't created the format, but as far as I understand the sentence detector takes advantage of knowing the boundaries of a document to train with real cases of sentences boundaries. If there is no document boundary occasionally we would train with the last sentence of a text and the first sentence of another one, and maybe that is not ideal. 

Can please older contributors correct me me if I am wrong or if there is a better reason?

The name finder takes advantage of the document boundary to create regions where the adaptive feature generators are valid. When it finds a document boundary it resets the adaptive feature generators.

Hope it helps. If it is OK can you please close this issue?

If there is any suggestion or question we can talk using the users list, or if there is a different issue please open a new Jira.

Thank you
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124905#comment-13124905 ] 

Joern Kottmann commented on OPENNLP-316:
----------------------------------------

Contributions to our documentation are always very welcome.
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Nicolas Hernandez (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122334#comment-13122334 ] 

Nicolas Hernandez commented on OPENNLP-316:
-------------------------------------------

Hi William,

I tried various Evaluators (SentenceDetector, Tokenizer, PosTagger and Chunker): it works.

But the problem with the CrossValidators remains.


                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Joern Kottmann (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann updated OPENNLP-316:
-----------------------------------

    Fix Version/s: tools-1.5.2-incubating
    
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Nicolas Hernandez (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124960#comment-13124960 ] 

Nicolas Hernandez commented on OPENNLP-316:
-------------------------------------------

I m a bit a newbie.
As much as I understand the process for modifying the documentation is
the same as the process for modification of the code.
It is by submitting patch right ?


On Tue, Oct 11, 2011 at 12:07 PM, Joern Kottmann (Commented) (JIRA)



-- 
nicolas.hernandez@univ-nantes.fr
#
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
#
Laboratoire Informatique de Nantes Atlantique CNRS UMR 6241
tel. +33 (0)2 51 12 58 55
#
Université de Nantes - Institut Universitaire de Technologie -
Département Informatique
tel. +33 (0)2 40 30 60 67

                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Nicolas Hernandez (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122803#comment-13122803 ] 

Nicolas Hernandez commented on OPENNLP-316:
-------------------------------------------

Hi William, 

Just to let you know, I'm trying to test my data to see if something goes wrong. But as much as I understand, the eval file should have the same format as the training data. 

I tried the Trainer and the CrossValidator program of the Sentence Detector, the Tokenizer, the PosTagger and the Chunker. Each time I used the same data for the trainer and the crossValidator. 
It works for the Tokenizer and the PosTagger. 
For the Sentence Detector and the Chunker, the trainer work but not the CrossValidator program though I use the same data ! 

Indeed, 0 events are reported in these cases. 

For the Sentence Detector I tried with 100, 1,000 and 1,000,000 of sentences. Same message. 
For the chunker I tried with 500 and 500,000 words. 

But for the chunker, I actually managed to get the line "Skipping corrupt line:..." displayed with lines in the wrong format on purpuse. 
But finally when I think to get a clean input, no event is counted. 

Below the output for the chunker. I still continue to check my data but soon I will have a look at the code. 

Indexing events using cutoff of 5 Computing event counts... done. 0 events Indexing... done. Sorting and merging events... Done indexing. Incorporating indexed data for training... Exception in thread "main" java.lang.NullPointerException at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263) at opennlp.maxent.GIS.trainModel(GIS.java:256) at opennlp.model.TrainUtil.train(TrainUtil.java:182) at opennlp.tools.chunker.ChunkerME.train(ChunkerME.java:208) at opennlp.tools.chunker.ChunkerCrossValidator.evaluate(ChunkerCrossValidator.java:78) at opennlp.tools.cmdline.chunker.ChunkerCrossValidatorTool.run(ChunkerCrossValidatorTool.java:102) at opennlp.tools.cmdline.CLI.main(CLI.java:191)
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122345#comment-13122345 ] 

William Colen commented on OPENNLP-316:
---------------------------------------

Thank you for testing.

Are you referring to this?

$opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
data/train/fr-sent.train -misclassified true
Indexing events using cutoff of 5

      Computing event counts...  done. 0 events
      Indexing...  done.
Sorting and merging events... Done indexing.
Incorporating indexed data for training...
Exception in thread "main" java.lang.NullPointerException
      at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
      at opennlp.maxent.GIS.trainModel(GIS.java:256)
      at opennlp.model.TrainUtil.train(TrainUtil.java:182)
      at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
      at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
      at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
      at opennlp.tools.cmdline.CLI.main(CLI.java:191)


A key point of this output is "Computing event counts...  done. 0 events". It means that OpenNLP could not find any event. Does your Sentence Detector training data match with the format expected by OpenNLP? 

Thank you
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Nicolas Hernandez (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122868#comment-13122868 ] 

Nicolas Hernandez commented on OPENNLP-316:
-------------------------------------------

About the chunker, am I right to think that the format is csv-like with a single whitespace character as separator ?
An end of sentence is set by a line with a 'O' as chunkTag.
I wonder, do all the ponctuation characters have to have a 'O' chunkTag ?

                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124971#comment-13124971 ] 

Joern Kottmann commented on OPENNLP-316:
----------------------------------------

Exactly. The documentation is located in opennlp-docs/src/docbkx. It is actually a docbook documentation, you need to modify the xml and then send us a patch.

To build the documentation you can just type mvn install in the opennlp-docs project.

We are really looking for new contributors. BTW, format support for this french corpus you worked on over at UIMA would also be very interesting for us.
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124403#comment-13124403 ] 

Joern Kottmann commented on OPENNLP-316:
----------------------------------------

Yes, the sentence detector training behaves as described. 
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Nicolas Hernandez (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122927#comment-13122927 ] 

Nicolas Hernandez commented on OPENNLP-316:
-------------------------------------------

Thanks Willian

looking with attention both at the link you sent and at some training data available on the web site, 
  * I read "The O chunk tag is used for tokens which are not part of any chunk." So the 'O' tag seems to be important.
  * I observed on the training data that each sentence should be separated by an empty line
I was not aware of this last point. I modified my data on purpose and magic... the chunkerCrossValidator works...

I still do not know why it worked before with the chunkerTrainer and the chunkerEvaluator...

I dared to try the same transformation with my training data for the Sentence Detector... and It works... It works by adding an empty line between each sentence...

I guess something must be fixed here. 
The Trainer, the Evaluator and the CrossValidator should work with the same format in input. 
In my opinion, the newline as separator is unnecessary and this format shouldn't be kept.

Best regards


                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122885#comment-13122885 ] 

William Colen commented on OPENNLP-316:
---------------------------------------

No. The Chunker training format is based on CONLL 2000: http://www.cnts.ua.ac.be/conll2000/chunking

The O tag is just a coincidence. You can use any tag you want, but it should follow the B-<tag> I-<tag> convention if you don't want to customize the OpenNLP code.
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Nicolas Hernandez (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124898#comment-13124898 ] 

Nicolas Hernandez commented on OPENNLP-316:
-------------------------------------------

All right.
My misunderstanding. Now I read again the documentation, it appears
clearer. There are indeed explicit mentions about "empty newlines for
document boundaries" in the Sentence Detector section and the mention
of "an empty line after each sentence" in the Chunker section.

Ok it is not a real issue but it is a bit confusing that the
CrossValidator does not work when the Trainer manages to give a
result.
Moreover, the Crossvalidator works for the PosTagger and the Tokenizer
without empty newlines...

I would suggest to modify the examples in the documentation to
illustrate such cases.

Thank you very much for your help

I close the issue

On Mon, Oct 10, 2011 at 9:16 PM, Joern Kottmann (Commented) (JIRA)

                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124406#comment-13124406 ] 

Joern Kottmann commented on OPENNLP-316:
----------------------------------------

But if there is only one big block of sentences, it will all end up in one SentenceSample object, which does not work for the cross validator. Maybe that is the issue here, because the cross validator code can only split the training data on a per sample basis. In the case there is only one sample that might lead to an empty input for the trainer.

In case you don't have any document boundaries I suggest that you simply enter empty lines every 20 sentences.
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Nicolas Hernandez (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13124897#comment-13124897 ] 

Nicolas Hernandez commented on OPENNLP-316:
-------------------------------------------

All right.
My misunderstanding. Now I read again the documentation, it appears
clearer. There are indeed explicit mentions about "empty newlines for
document boundaries" in the Sentence Detector section and the mention
of "an empty line after each sentence" in the Chunker section.

Ok it is not a real issue but it is a bit confusing that the
CrossValidator does not work when the Trainer manages to give a
result.
Moreover, the Crossvalidator works for the PosTagger and the Tokenizer
without empty newlines...

I would suggest to modify the examples in the documentation to
illustrate such cases.

Thank you very much for your help

I close the issue
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122093#comment-13122093 ] 

Joern Kottmann commented on OPENNLP-316:
----------------------------------------

I think it is more a question of if we want to allow null values (and keep vararg constructors) or not. Allowing null values here has the advantage that it is easier to use. In the case you want to pass two validators based on some configuration, it is easy to just pass one and null for the second one. Otherwise you would need to create a list and then convert it into an array (as we do in the name finder evaluator).

Therefore I believe it is easier if we allow null values, +1.

It might be an issue for a user who accidentally passes null, but he then will find out while he tries to retrieve the results from the validator, or is just missing any results. Allowing null will most likely not lead to a bug which is difficult to find in user code.
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira