You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by "Miller, Timothy" <Ti...@childrens.harvard.edu> on 2016/01/23 16:12:40 UTC
problems with sentence cross validation
I'm working with 1.6.0-bin and trying to do sentence detection cross validation and getting an exception:
bin/opennlp SentenceDetectorCrossValidator -lang en -folds 5 -data ~/Data/Projects/sentdetect/wsj02to21.raw.words
Indexing events using cutoff of 5
Computing event counts... done. 0 events
Indexing... done.
Sorting and merging events... Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:653)
at java.util.ArrayList.get(ArrayList.java:429)
at opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
at opennlp.tools.ml.model.TrainUtil.train(TrainUtil.java:53)
at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:326)
at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:103)
at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:78)
at opennlp.tools.cmdline.CLI.main(CLI.java:224)
?
If I just run train and give it a model name everything works ok with the same dataset. Is there an option I'm missing or is there maybe an unknown issue with cross validation?
Thanks
Tim
Re: problems with sentence cross validation
Posted by Joern Kottmann <ko...@gmail.com>.
This bug can also be caused by running the evaluator with a very small
amount of training data.
How many sentences do you have in your dataset?
Jörn
On Wed, Jan 27, 2016 at 10:44 PM, Joern Kottmann <ko...@gmail.com> wrote:
> Hello,
>
> looks like there are zero input sentences.
>
> Can you post a piece of your training data?
>
> Jörn
>
> On Sat, Jan 23, 2016 at 4:12 PM, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
>
>> I'm working with 1.6.0-bin and trying to do sentence detection cross
>> validation and getting an exception:
>>
>>
>> bin/opennlp SentenceDetectorCrossValidator -lang en -folds 5 -data
>> ~/Data/Projects/sentdetect/wsj02to21.raw.words
>>
>> Indexing events using cutoff of 5
>>
>>
>> Computing event counts... done. 0 events
>>
>> Indexing... done.
>>
>> Sorting and merging events... Exception in thread "main"
>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>
>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>
>> at java.util.ArrayList.get(ArrayList.java:429)
>>
>> at
>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>
>> at
>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>
>> at
>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>
>> at
>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>
>> at opennlp.tools.ml.model.TrainUtil.train(TrainUtil.java:53)
>>
>> at
>> opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:326)
>>
>> at
>> opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:103)
>>
>> at
>> opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:78)
>>
>> at opennlp.tools.cmdline.CLI.main(CLI.java:224)
>>
>> ?
>>
>> If I just run train and give it a model name everything works ok with the
>> same dataset. Is there an option I'm missing or is there maybe an unknown
>> issue with cross validation?
>>
>> Thanks
>> Tim
>>
>>
>>
>
Re: problems with sentence cross validation
Posted by Joern Kottmann <ko...@gmail.com>.
Hello,
looks like there are zero input sentences.
Can you post a piece of your training data?
Jörn
On Sat, Jan 23, 2016 at 4:12 PM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:
> I'm working with 1.6.0-bin and trying to do sentence detection cross
> validation and getting an exception:
>
>
> bin/opennlp SentenceDetectorCrossValidator -lang en -folds 5 -data
> ~/Data/Projects/sentdetect/wsj02to21.raw.words
>
> Indexing events using cutoff of 5
>
>
> Computing event counts... done. 0 events
>
> Indexing... done.
>
> Sorting and merging events... Exception in thread "main"
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>
> at java.util.ArrayList.get(ArrayList.java:429)
>
> at
> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>
> at
> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>
> at
> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>
> at
> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>
> at opennlp.tools.ml.model.TrainUtil.train(TrainUtil.java:53)
>
> at
> opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:326)
>
> at
> opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:103)
>
> at
> opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:78)
>
> at opennlp.tools.cmdline.CLI.main(CLI.java:224)
>
> ?
>
> If I just run train and give it a model name everything works ok with the
> same dataset. Is there an option I'm missing or is there maybe an unknown
> issue with cross validation?
>
> Thanks
> Tim
>
>
>