You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by XiaoboGu <gu...@gmail.com> on 2011/05/29 10:23:06 UTC

NullPointerException after getBest() during training a AdaptiveLogisticRegression.

Hi,

	The main process for MAHOUT-696 is as following, but it will always cause a NullPointerException after the first call to getBest, can we continue training AdaptiveLogisticRegressions after using getBest() to score some new lines just as TrainLogistic does?



double logPEstimate = 0;
			int k = 0;
			
			CsvRecordFactory csv = lmp.getCsvRecordFactory();
			model = lmp.createAdaptiveLogisticRegression();
			State<Wrapper, CrossFoldLearner> best = null;
			CrossFoldLearner learner = null;
			
			for (int pass = 0; pass < passes; pass++) {
				BufferedReader in = open(inputFile);

				// read variable names
				csv.firstLine(in.readLine());

				String line = in.readLine();
				
				while (line != null) {
					// for each new line, get target and predictors
					Vector input = new RandomAccessSparseVector(lmp.getNumFeatures());
					int targetValue = csv.processLine(line, input);
					
					// update model
					model.train(targetValue, input);
					
					k ++;
					
					if (scores && (k % (skipscorenum + 1) == 0) ) {
						
						best = model.getBest();
						if (null != best) {
							learner = best.getPayload().getLearner();
						}
						if (learner != null) {
						// check performance while this is still news
						double logP = learner.logLikelihood(targetValue, input);
						if (!Double.isInfinite(logP)) {
							if (k < 20) {
								logPEstimate = (k * logPEstimate + logP)
										/ (k + 1);
							} else {
								logPEstimate = 0.95 * logPEstimate + 0.05
										* logP;
							}							
						}
						double p = learner.classifyScalar(input);					
						output.printf(Locale.ENGLISH,
								"%10d %2d %10.2f %2.4f %10.4f %10.4f\n",
								k, targetValue,
								learner.percentCorrect(), p, logP,
								logPEstimate);
						}else{
							output.printf(Locale.ENGLISH,
									"%10d %2d %s\n", k, targetValue,
									"AdaptiveLogisticRegression is not ready for scoring ... ");
						}
					}
					

					line = in.readLine();
				}
				in.close();
			}




       100  1 AdaptiveLogisticRegression is not ready for scoring ... 
       200  0 AdaptiveLogisticRegression is not ready for scoring ... 
       300  1 AdaptiveLogisticRegression is not ready for scoring ... 
       400  0 AdaptiveLogisticRegression is not ready for scoring ... 
       500  1 AdaptiveLogisticRegression is not ready for scoring ... 
Exception in thread "main" java.lang.IllegalStateException: java.lang.NullPointerException
	at org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:144)
	at org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:117)
	at org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:103)
	at org.apache.mahout.classifier.sgd.TrainAdaptiveLogistic.main(TrainAdaptiveLogistic.java:72)
Caused by: java.lang.NullPointerException
	at org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:134)
	at org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:411)
	at org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:128)
	at org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:1)
	at org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:146)
	at org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:1)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

Re: NullPointerException after getBest() during training a AdaptiveLogisticRegression.

Posted by Ted Dunning <te...@gmail.com>.

Can you say what this does?

On Mon, May 30, 2011 at 6:29 PM, Xiaobo Gu <gu...@gmail.com> wrote:

> --skipscorenum 99

Re: NullPointerException after getBest() during training a AdaptiveLogisticRegression.

Posted by Xiaobo Gu <gu...@gmail.com>.

Hi ,

I have uploaded the patch to jira ,mahout-696-r4.patch is latest version

https://issues.apache.org/jira/browse/MAHOUT-696

You can use this command to run the trainAdaptiveLogisticRegression

trainAdaptiveLogistic --input donut.csv --output d:\model --target
color --categories 2 --predictors x shape --types numeric word
--features 20 --passes 100 --scores --skipscorenum 99

And you can run TrainAdaptiveLogisticRegression under Eclipse using
the following arguments:

--input donut.csv --output d:\model --target color --categories 2
--predictors x shape --types numeric word --features 20 --passes 100
--scores --skipscorenum 99




On Mon, May 30, 2011 at 12:26 PM, Ted Dunning <te...@gmail.com> wrote:
> Do you have a test case that demonstrates this?
>
> On Sun, May 29, 2011 at 6:53 PM, Xiaobo Gu <gu...@gmail.com> wrote:
>>
>> There is a internal buffer in AdaptiveLogisticRegression, the
>> NullPointerException is caused when the backend crossfloderlearners
>> starting training the examples.
>>
>> The default size of the buffer is 500, and the exception is caused
>> when I put the 501'th example to ALR.
>>
>>
>> On Mon, May 30, 2011 at 2:27 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>> > This usually means that you have fed the ALR enough data for it to push
>> > a
>> > batch of learning into the evolutionary algorithm.  That means that
>> > there
>> > isn't any best result yet.
>> > Getting that null doesn't impact the model, but you have to watch out
>> > for
>> > it.
>> >
>> > On Sun, May 29, 2011 at 1:23 AM, XiaoboGu <gu...@gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >>        The main process for MAHOUT-696 is as following, but it will
>> >> always
>> >> cause a NullPointerException after the first call to getBest, can we
>> >> continue training AdaptiveLogisticRegressions after using getBest() to
>> >> score
>> >> some new lines just as TrainLogistic does?
>> >>
>> >>
>> >>
>> >> double logPEstimate = 0;
>> >>                        int k = 0;
>> >>
>> >>                        CsvRecordFactory csv =
>> >> lmp.getCsvRecordFactory();
>> >>                        model = lmp.createAdaptiveLogisticRegression();
>> >>                        State<Wrapper, CrossFoldLearner> best = null;
>> >>                        CrossFoldLearner learner = null;
>> >>
>> >>                        for (int pass = 0; pass < passes; pass++) {
>> >>                                BufferedReader in = open(inputFile);
>> >>
>> >>                                // read variable names
>> >>                                csv.firstLine(in.readLine());
>> >>
>> >>                                String line = in.readLine();
>> >>
>> >>                                while (line != null) {
>> >>                                        // for each new line, get target
>> >> and predictors
>> >>                                        Vector input = new
>> >> RandomAccessSparseVector(lmp.getNumFeatures());
>> >>                                        int targetValue =
>> >> csv.processLine(line, input);
>> >>
>> >>                                        // update model
>> >>                                        model.train(targetValue, input);
>> >>
>> >>                                        k ++;
>> >>
>> >>                                        if (scores && (k % (skipscorenum
>> >> +
>> >> 1) == 0) ) {
>> >>
>> >>                                                best = model.getBest();
>> >>                                                if (null != best) {
>> >>                                                        learner =
>> >> best.getPayload().getLearner();
>> >>                                                }
>> >>                                                if (learner != null) {
>> >>                                                // check performance
>> >> while
>> >> this is still news
>> >>                                                double logP =
>> >> learner.logLikelihood(targetValue, input);
>> >>                                                if
>> >> (!Double.isInfinite(logP)) {
>> >>                                                        if (k < 20) {
>> >>
>> >>  logPEstimate = (k * logPEstimate + logP)
>> >>
>> >>      / (k + 1);
>> >>                                                        } else {
>> >>
>> >>  logPEstimate = 0.95 * logPEstimate + 0.05
>> >>
>> >>      * logP;
>> >>                                                        }
>> >>                                                }
>> >>                                                double p =
>> >> learner.classifyScalar(input);
>> >>
>> >>  output.printf(Locale.ENGLISH,
>> >>                                                                "%10d
>> >> %2d
>> >> %10.2f %2.4f %10.4f %10.4f\n",
>> >>                                                                k,
>> >> targetValue,
>> >>
>> >>  learner.percentCorrect(), p, logP,
>> >>
>> >>  logPEstimate);
>> >>                                                }else{
>> >>
>> >>  output.printf(Locale.ENGLISH,
>> >>
>> >>  "%10d %2d %s\n", k, targetValue,
>> >>
>> >>  "AdaptiveLogisticRegression is not ready for scoring ... ");
>> >>                                                }
>> >>                                        }
>> >>
>> >>
>> >>                                        line = in.readLine();
>> >>                                }
>> >>                                in.close();
>> >>                        }
>> >>
>> >>
>> >>
>> >>
>> >>       100  1 AdaptiveLogisticRegression is not ready for scoring ...
>> >>       200  0 AdaptiveLogisticRegression is not ready for scoring ...
>> >>       300  1 AdaptiveLogisticRegression is not ready for scoring ...
>> >>       400  0 AdaptiveLogisticRegression is not ready for scoring ...
>> >>       500  1 AdaptiveLogisticRegression is not ready for scoring ...
>> >> Exception in thread "main" java.lang.IllegalStateException:
>> >> java.lang.NullPointerException
>> >>        at
>> >>
>> >> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:144)
>> >>        at
>> >>
>> >> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:117)
>> >>        at
>> >>
>> >> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:103)
>> >>        at
>> >>
>> >> org.apache.mahout.classifier.sgd.TrainAdaptiveLogistic.main(TrainAdaptiveLogistic.java:72)
>> >> Caused by: java.lang.NullPointerException
>> >>        at
>> >>
>> >> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:134)
>> >>        at
>> >>
>> >> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:411)
>> >>        at
>> >>
>> >> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:128)
>> >>        at
>> >>
>> >> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:1)
>> >>        at
>> >>
>> >> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:146)
>> >>        at
>> >>
>> >> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:1)
>> >>        at
>> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >>        at
>> >>
>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >>        at
>> >>
>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >>        at java.lang.Thread.run(Thread.java:662)
>> >>
>> >
>> >
>
>

Re: NullPointerException after getBest() during training a AdaptiveLogisticRegression.

Posted by Ted Dunning <te...@gmail.com>.

Do you have a test case that demonstrates this?

On Sun, May 29, 2011 at 6:53 PM, Xiaobo Gu <gu...@gmail.com> wrote:

> There is a internal buffer in AdaptiveLogisticRegression, the
> NullPointerException is caused when the backend crossfloderlearners
> starting training the examples.
>
> The default size of the buffer is 500, and the exception is caused
> when I put the 501'th example to ALR.
>
>
> On Mon, May 30, 2011 at 2:27 AM, Ted Dunning <te...@gmail.com>
> wrote:
> > This usually means that you have fed the ALR enough data for it to push a
> > batch of learning into the evolutionary algorithm.  That means that there
> > isn't any best result yet.
> > Getting that null doesn't impact the model, but you have to watch out for
> > it.
> >
> > On Sun, May 29, 2011 at 1:23 AM, XiaoboGu <gu...@gmail.com>
> wrote:
> >>
> >> Hi,
> >>
> >>        The main process for MAHOUT-696 is as following, but it will
> always
> >> cause a NullPointerException after the first call to getBest, can we
> >> continue training AdaptiveLogisticRegressions after using getBest() to
> score
> >> some new lines just as TrainLogistic does?
> >>
> >>
> >>
> >> double logPEstimate = 0;
> >>                        int k = 0;
> >>
> >>                        CsvRecordFactory csv = lmp.getCsvRecordFactory();
> >>                        model = lmp.createAdaptiveLogisticRegression();
> >>                        State<Wrapper, CrossFoldLearner> best = null;
> >>                        CrossFoldLearner learner = null;
> >>
> >>                        for (int pass = 0; pass < passes; pass++) {
> >>                                BufferedReader in = open(inputFile);
> >>
> >>                                // read variable names
> >>                                csv.firstLine(in.readLine());
> >>
> >>                                String line = in.readLine();
> >>
> >>                                while (line != null) {
> >>                                        // for each new line, get target
> >> and predictors
> >>                                        Vector input = new
> >> RandomAccessSparseVector(lmp.getNumFeatures());
> >>                                        int targetValue =
> >> csv.processLine(line, input);
> >>
> >>                                        // update model
> >>                                        model.train(targetValue, input);
> >>
> >>                                        k ++;
> >>
> >>                                        if (scores && (k % (skipscorenum
> +
> >> 1) == 0) ) {
> >>
> >>                                                best = model.getBest();
> >>                                                if (null != best) {
> >>                                                        learner =
> >> best.getPayload().getLearner();
> >>                                                }
> >>                                                if (learner != null) {
> >>                                                // check performance
> while
> >> this is still news
> >>                                                double logP =
> >> learner.logLikelihood(targetValue, input);
> >>                                                if
> >> (!Double.isInfinite(logP)) {
> >>                                                        if (k < 20) {
> >>
> >>  logPEstimate = (k * logPEstimate + logP)
> >>
> >>      / (k + 1);
> >>                                                        } else {
> >>
> >>  logPEstimate = 0.95 * logPEstimate + 0.05
> >>
> >>      * logP;
> >>                                                        }
> >>                                                }
> >>                                                double p =
> >> learner.classifyScalar(input);
> >>
> >>  output.printf(Locale.ENGLISH,
> >>                                                                "%10d %2d
> >> %10.2f %2.4f %10.4f %10.4f\n",
> >>                                                                k,
> >> targetValue,
> >>
> >>  learner.percentCorrect(), p, logP,
> >>
> >>  logPEstimate);
> >>                                                }else{
> >>
> >>  output.printf(Locale.ENGLISH,
> >>
> >>  "%10d %2d %s\n", k, targetValue,
> >>
> >>  "AdaptiveLogisticRegression is not ready for scoring ... ");
> >>                                                }
> >>                                        }
> >>
> >>
> >>                                        line = in.readLine();
> >>                                }
> >>                                in.close();
> >>                        }
> >>
> >>
> >>
> >>
> >>       100  1 AdaptiveLogisticRegression is not ready for scoring ...
> >>       200  0 AdaptiveLogisticRegression is not ready for scoring ...
> >>       300  1 AdaptiveLogisticRegression is not ready for scoring ...
> >>       400  0 AdaptiveLogisticRegression is not ready for scoring ...
> >>       500  1 AdaptiveLogisticRegression is not ready for scoring ...
> >> Exception in thread "main" java.lang.IllegalStateException:
> >> java.lang.NullPointerException
> >>        at
> >>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:144)
> >>        at
> >>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:117)
> >>        at
> >>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:103)
> >>        at
> >>
> org.apache.mahout.classifier.sgd.TrainAdaptiveLogistic.main(TrainAdaptiveLogistic.java:72)
> >> Caused by: java.lang.NullPointerException
> >>        at
> >>
> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:134)
> >>        at
> >>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:411)
> >>        at
> >>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:128)
> >>        at
> >>
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:1)
> >>        at
> >>
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:146)
> >>        at
> >>
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:1)
> >>        at
> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >>        at java.lang.Thread.run(Thread.java:662)
> >>
> >
> >
>

Re: NullPointerException after getBest() during training a AdaptiveLogisticRegression.

Posted by Xiaobo Gu <gu...@gmail.com>.

There is a internal buffer in AdaptiveLogisticRegression, the
NullPointerException is caused when the backend crossfloderlearners
starting training the examples.

The default size of the buffer is 500, and the exception is caused
when I put the 501'th example to ALR.


On Mon, May 30, 2011 at 2:27 AM, Ted Dunning <te...@gmail.com> wrote:
> This usually means that you have fed the ALR enough data for it to push a
> batch of learning into the evolutionary algorithm.  That means that there
> isn't any best result yet.
> Getting that null doesn't impact the model, but you have to watch out for
> it.
>
> On Sun, May 29, 2011 at 1:23 AM, XiaoboGu <gu...@gmail.com> wrote:
>>
>> Hi,
>>
>>        The main process for MAHOUT-696 is as following, but it will always
>> cause a NullPointerException after the first call to getBest, can we
>> continue training AdaptiveLogisticRegressions after using getBest() to score
>> some new lines just as TrainLogistic does?
>>
>>
>>
>> double logPEstimate = 0;
>>                        int k = 0;
>>
>>                        CsvRecordFactory csv = lmp.getCsvRecordFactory();
>>                        model = lmp.createAdaptiveLogisticRegression();
>>                        State<Wrapper, CrossFoldLearner> best = null;
>>                        CrossFoldLearner learner = null;
>>
>>                        for (int pass = 0; pass < passes; pass++) {
>>                                BufferedReader in = open(inputFile);
>>
>>                                // read variable names
>>                                csv.firstLine(in.readLine());
>>
>>                                String line = in.readLine();
>>
>>                                while (line != null) {
>>                                        // for each new line, get target
>> and predictors
>>                                        Vector input = new
>> RandomAccessSparseVector(lmp.getNumFeatures());
>>                                        int targetValue =
>> csv.processLine(line, input);
>>
>>                                        // update model
>>                                        model.train(targetValue, input);
>>
>>                                        k ++;
>>
>>                                        if (scores && (k % (skipscorenum +
>> 1) == 0) ) {
>>
>>                                                best = model.getBest();
>>                                                if (null != best) {
>>                                                        learner =
>> best.getPayload().getLearner();
>>                                                }
>>                                                if (learner != null) {
>>                                                // check performance while
>> this is still news
>>                                                double logP =
>> learner.logLikelihood(targetValue, input);
>>                                                if
>> (!Double.isInfinite(logP)) {
>>                                                        if (k < 20) {
>>
>>  logPEstimate = (k * logPEstimate + logP)
>>
>>      / (k + 1);
>>                                                        } else {
>>
>>  logPEstimate = 0.95 * logPEstimate + 0.05
>>
>>      * logP;
>>                                                        }
>>                                                }
>>                                                double p =
>> learner.classifyScalar(input);
>>
>>  output.printf(Locale.ENGLISH,
>>                                                                "%10d %2d
>> %10.2f %2.4f %10.4f %10.4f\n",
>>                                                                k,
>> targetValue,
>>
>>  learner.percentCorrect(), p, logP,
>>
>>  logPEstimate);
>>                                                }else{
>>
>>  output.printf(Locale.ENGLISH,
>>
>>  "%10d %2d %s\n", k, targetValue,
>>
>>  "AdaptiveLogisticRegression is not ready for scoring ... ");
>>                                                }
>>                                        }
>>
>>
>>                                        line = in.readLine();
>>                                }
>>                                in.close();
>>                        }
>>
>>
>>
>>
>>       100  1 AdaptiveLogisticRegression is not ready for scoring ...
>>       200  0 AdaptiveLogisticRegression is not ready for scoring ...
>>       300  1 AdaptiveLogisticRegression is not ready for scoring ...
>>       400  0 AdaptiveLogisticRegression is not ready for scoring ...
>>       500  1 AdaptiveLogisticRegression is not ready for scoring ...
>> Exception in thread "main" java.lang.IllegalStateException:
>> java.lang.NullPointerException
>>        at
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:144)
>>        at
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:117)
>>        at
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:103)
>>        at
>> org.apache.mahout.classifier.sgd.TrainAdaptiveLogistic.main(TrainAdaptiveLogistic.java:72)
>> Caused by: java.lang.NullPointerException
>>        at
>> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:134)
>>        at
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:411)
>>        at
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:128)
>>        at
>> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:1)
>>        at
>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:146)
>>        at
>> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:1)
>>        at
>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>        at java.lang.Thread.run(Thread.java:662)
>>
>
>

Re: NullPointerException after getBest() during training a AdaptiveLogisticRegression.

Posted by Ted Dunning <te...@gmail.com>.

This usually means that you have fed the ALR enough data for it to push a
batch of learning into the evolutionary algorithm.  That means that there
isn't any best result yet.

Getting that null doesn't impact the model, but you have to watch out for
it.

On Sun, May 29, 2011 at 1:23 AM, XiaoboGu <gu...@gmail.com> wrote:

> Hi,
>
>        The main process for MAHOUT-696 is as following, but it will always
> cause a NullPointerException after the first call to getBest, can we
> continue training AdaptiveLogisticRegressions after using getBest() to score
> some new lines just as TrainLogistic does?
>
>
>
> double logPEstimate = 0;
>                        int k = 0;
>
>                        CsvRecordFactory csv = lmp.getCsvRecordFactory();
>                        model = lmp.createAdaptiveLogisticRegression();
>                        State<Wrapper, CrossFoldLearner> best = null;
>                        CrossFoldLearner learner = null;
>
>                        for (int pass = 0; pass < passes; pass++) {
>                                BufferedReader in = open(inputFile);
>
>                                // read variable names
>                                csv.firstLine(in.readLine());
>
>                                String line = in.readLine();
>
>                                while (line != null) {
>                                        // for each new line, get target and
> predictors
>                                        Vector input = new
> RandomAccessSparseVector(lmp.getNumFeatures());
>                                        int targetValue =
> csv.processLine(line, input);
>
>                                        // update model
>                                        model.train(targetValue, input);
>
>                                        k ++;
>
>                                        if (scores && (k % (skipscorenum +
> 1) == 0) ) {
>
>                                                best = model.getBest();
>                                                if (null != best) {
>                                                        learner =
> best.getPayload().getLearner();
>                                                }
>                                                if (learner != null) {
>                                                // check performance while
> this is still news
>                                                double logP =
> learner.logLikelihood(targetValue, input);
>                                                if
> (!Double.isInfinite(logP)) {
>                                                        if (k < 20) {
>                                                                logPEstimate
> = (k * logPEstimate + logP)
>
>    / (k + 1);
>                                                        } else {
>                                                                logPEstimate
> = 0.95 * logPEstimate + 0.05
>
>    * logP;
>                                                        }
>                                                }
>                                                double p =
> learner.classifyScalar(input);
>
>  output.printf(Locale.ENGLISH,
>                                                                "%10d %2d
> %10.2f %2.4f %10.4f %10.4f\n",
>                                                                k,
> targetValue,
>
>  learner.percentCorrect(), p, logP,
>
>  logPEstimate);
>                                                }else{
>
>  output.printf(Locale.ENGLISH,
>
>  "%10d %2d %s\n", k, targetValue,
>
>  "AdaptiveLogisticRegression is not ready for scoring ... ");
>                                                }
>                                        }
>
>
>                                        line = in.readLine();
>                                }
>                                in.close();
>                        }
>
>
>
>
>       100  1 AdaptiveLogisticRegression is not ready for scoring ...
>       200  0 AdaptiveLogisticRegression is not ready for scoring ...
>       300  1 AdaptiveLogisticRegression is not ready for scoring ...
>       400  0 AdaptiveLogisticRegression is not ready for scoring ...
>       500  1 AdaptiveLogisticRegression is not ready for scoring ...
> Exception in thread "main" java.lang.IllegalStateException:
> java.lang.NullPointerException
>        at
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:144)
>        at
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:117)
>        at
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:103)
>        at
> org.apache.mahout.classifier.sgd.TrainAdaptiveLogistic.main(TrainAdaptiveLogistic.java:72)
> Caused by: java.lang.NullPointerException
>        at
> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:134)
>        at
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:411)
>        at
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:128)
>        at
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:1)
>        at
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:146)
>        at
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:1)
>        at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
>
>