You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by "Miller, Timothy" <Ti...@childrens.harvard.edu> on 2016/01/23 16:12:40 UTC

problems with sentence cross validation

I'm working with 1.6.0-bin and trying to do sentence detection cross validation and getting an exception:


bin/opennlp SentenceDetectorCrossValidator -lang en -folds 5 -data ~/Data/Projects/sentdetect/wsj02to21.raw.words

Indexing events using cutoff of 5


Computing event counts...  done. 0 events

Indexing...  done.

Sorting and merging events... Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 0, Size: 0

at java.util.ArrayList.rangeCheck(ArrayList.java:653)

at java.util.ArrayList.get(ArrayList.java:429)

at opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)

at opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)

at opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)

at opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)

at opennlp.tools.ml.model.TrainUtil.train(TrainUtil.java:53)

at opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:326)

at opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:103)

at opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:78)

at opennlp.tools.cmdline.CLI.main(CLI.java:224)

?

If I just run train and give it a model name everything works ok with the same dataset. Is there an option I'm missing or is there maybe an unknown issue with cross validation?

Thanks
Tim



Re: problems with sentence cross validation

Posted by Joern Kottmann <ko...@gmail.com>.
This bug can also be caused by running the evaluator with a very small
amount of training data.

How many sentences do you have in your dataset?

Jörn


On Wed, Jan 27, 2016 at 10:44 PM, Joern Kottmann <ko...@gmail.com> wrote:

> Hello,
>
> looks like there are zero input sentences.
>
> Can you post a piece of your training data?
>
> Jörn
>
> On Sat, Jan 23, 2016 at 4:12 PM, Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
>
>> I'm working with 1.6.0-bin and trying to do sentence detection cross
>> validation and getting an exception:
>>
>>
>> bin/opennlp SentenceDetectorCrossValidator -lang en -folds 5 -data
>> ~/Data/Projects/sentdetect/wsj02to21.raw.words
>>
>> Indexing events using cutoff of 5
>>
>>
>> Computing event counts...  done. 0 events
>>
>> Indexing...  done.
>>
>> Sorting and merging events... Exception in thread "main"
>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>>
>> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>>
>> at java.util.ArrayList.get(ArrayList.java:429)
>>
>> at
>> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>>
>> at
>> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>>
>> at
>> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>>
>> at
>> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>>
>> at opennlp.tools.ml.model.TrainUtil.train(TrainUtil.java:53)
>>
>> at
>> opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:326)
>>
>> at
>> opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:103)
>>
>> at
>> opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:78)
>>
>> at opennlp.tools.cmdline.CLI.main(CLI.java:224)
>>
>> ?
>>
>> If I just run train and give it a model name everything works ok with the
>> same dataset. Is there an option I'm missing or is there maybe an unknown
>> issue with cross validation?
>>
>> Thanks
>> Tim
>>
>>
>>
>

Re: problems with sentence cross validation

Posted by Joern Kottmann <ko...@gmail.com>.
Hello,

looks like there are zero input sentences.

Can you post a piece of your training data?

Jörn

On Sat, Jan 23, 2016 at 4:12 PM, Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> I'm working with 1.6.0-bin and trying to do sentence detection cross
> validation and getting an exception:
>
>
> bin/opennlp SentenceDetectorCrossValidator -lang en -folds 5 -data
> ~/Data/Projects/sentdetect/wsj02to21.raw.words
>
> Indexing events using cutoff of 5
>
>
> Computing event counts...  done. 0 events
>
> Indexing...  done.
>
> Sorting and merging events... Exception in thread "main"
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
>
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
>
> at java.util.ArrayList.get(ArrayList.java:429)
>
> at
> opennlp.tools.ml.model.AbstractDataIndexer.sortAndMerge(AbstractDataIndexer.java:89)
>
> at
> opennlp.tools.ml.model.TwoPassDataIndexer.<init>(TwoPassDataIndexer.java:105)
>
> at
> opennlp.tools.ml.AbstractEventTrainer.getDataIndexer(AbstractEventTrainer.java:74)
>
> at
> opennlp.tools.ml.AbstractEventTrainer.train(AbstractEventTrainer.java:91)
>
> at opennlp.tools.ml.model.TrainUtil.train(TrainUtil.java:53)
>
> at
> opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:326)
>
> at
> opennlp.tools.sentdetect.SDCrossValidator.evaluate(SDCrossValidator.java:103)
>
> at
> opennlp.tools.cmdline.sentdetect.SentenceDetectorCrossValidatorTool.run(SentenceDetectorCrossValidatorTool.java:78)
>
> at opennlp.tools.cmdline.CLI.main(CLI.java:224)
>
> ?
>
> If I just run train and give it a model name everything works ok with the
> same dataset. Is there an option I'm missing or is there maybe an unknown
> issue with cross validation?
>
> Thanks
> Tim
>
>
>