You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Nicolas Hernandez (Reopened) (JIRA)" <ji...@apache.org> on 2011/10/06 23:51:30 UTC

[jira] [Reopened] (OPENNLP-316) Evaluator and CrossValidator programs of the main analyzers throw exceptions

     [ https://issues.apache.org/jira/browse/OPENNLP-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Hernandez reopened OPENNLP-316:
---------------------------------------


Evaluators (SentenceDetector, Tokenizer, PosTagger and Chunker) work. 
But the problem with the CrossValidators remains. 
                
> Evaluator and CrossValidator programs of the main analyzers throw exceptions
> ----------------------------------------------------------------------------
>
>                 Key: OPENNLP-316
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-316
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Chunker, POS Tagger, Sentence Detector, Tokenizer
>    Affects Versions: tools-1.5.2-incubating
>         Environment: Linux version 2.6.32-34-generic (buildd@yellow) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #77-Ubuntu SMP Tue Sep 13 19:39:17 UTC 2011
> java version "1.6.0_26"
> Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
>            Reporter: Nicolas Hernandez
>            Assignee: William Colen
>             Fix For: tools-1.5.2-incubating
>
>
> Evaluator and CrossValidator programs of the main analyzers throw an exception when running
> (test performed on the 1.5.3 dist via command line)
> It seems that the SentenceDetector, Tokenizer, PosTagger and the
> chunker (at least) throw a java.lang.NullPointerException if the
> misclassified parameter is set to false or not present for the
> Evaluator programs. 
> The Evaluator programs works (provide a result) when the
> misclassified parameter is set.
> The CrossValidator programs do not work at all.
> I have not test the other opennlp programs.
> See below some example of the runs.
> I tested on the examples from the documentation and also with my data. 
> For the SentenceDetector I tested with with 1 000 and 1 000 000 sentences per line.
> Tell if you want more details or anything
> $opennlp SentenceDetectorEvaluator -encoding UTF-8 -model
> data/model/fr-sent.bin -data data/test/fr-sent.test
> Loading Sentence Detector model ... done (0,013s)
> Evaluating ...  in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:80)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorEvaluatorTool.run(SentenceDetectorEvaluatorTool.java:80)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $opennlp SentenceDetectorCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-sent.train -misclassified true
> Indexing events using cutoff of 5
>        Computing event counts...  done. 0 events
>        Indexing...  done.
> Sorting and merging events... Done indexing.
> Incorporating indexed data for training...
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263)
>        at opennlp.maxent.GIS.trainModel(GIS.java:256)
>        at opennlp.model.TrainUtil.train(TrainUtil.java:182)
>        at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:283)
>        at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:104)
>        at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:98)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerMEEvaluator -encoding UTF-8 -model
> data/model/fr-token.bin -data data/test/fr-token.test
> Loading Tokenizer model ... done (0,428s)
> Evaluating ... Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerMEEvaluatorTool.run(TokenizerMEEvaluatorTool.java:81)
>        at opennlp.tools.cmdline.CLI.main(CLI.java:191)
> $ opennlp TokenizerCrossValidator -encoding UTF-8 -lang fr -data
> data/train/fr-token.train
> Indexing events using cutoff of 5
>        Computing event counts...  done. 100333 events
>        Indexing...  done.
> Sorting and merging events... done. Reduced 100333 events to 30168.
> Done indexing.
> Incorporating indexed data for training...
> done.
>        Number of Event Tokens: 30168
>            Number of Outcomes: 2
>          Number of Predicates: 8287
> ...done.
> Computing model parameters ...
> Performing 100 iterations.
>  1:  ... loglikelihood=-69545.53606709359      0.9337805108987073
>  2:  ... loglikelihood=-18987.123809719425     0.9497872085953774
> ...
>  98:  ... loglikelihood=-607.4216932752298      0.9989534848952987
>  99:  ... loglikelihood=-603.2346954947699      0.9989734185163406
> 100:  ... loglikelihood=-599.1235213848983      0.9989833853268616
> Exception in thread "main" java.lang.NullPointerException
>        at opennlp.tools.util.eval.Evaluator.evaluateSample(Evaluator.java:76)
>        at opennlp.tools.util.eval.Evaluator.evaluate(Evaluator.java:98)
>        at opennlp.tools.tokenize.TokenizerCrossValidator.evaluate(TokenizerCrossValidator.java:98)
>        at opennlp.tools.cmdline.tokenizer.TokenizerCrossValidatorTool.run(TokenizerCrossValidatorTool.java:94)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira