You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Bui, Tri" <Tr...@VerizonWireless.com.INVALID> on 2014/12/01 22:36:10 UTC

RE: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

Thanks Yanbo!  That works!

The only issue is that it won’t print the predicted value from lp.features, from code line below.

model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()

It prints the test input data correctly, but it keeps on printing “0.0” as the predicted values, which is the lp.features.

Thanks
Tri

From: Yanbo Liang [mailto:yanbohappy@gmail.com]
Sent: Thursday, November 27, 2014 12:22 AM
To: Bui, Tri
Cc: user@spark.apache.org
Subject: Re: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

Hi Tri,

Maybe my latest responds for your problem is lost, whatever, the following code snippet can run correctly.

val model = new StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt))

model.algorithm.setIntercept(true)

Because that all setXXX() function in StreamingLinearRegressionWithSGD will return this.type which is an instance of itself,
so we need set other configuration in a separate line w/o return value.

2014-11-27 1:04 GMT+08:00 Bui, Tri <Tr...@verizonwireless.com.invalid>>:
Thanks Yanbo!

Modified code below:

val conf = new SparkConf().setMaster("local[2]").setAppName("StreamingLinearRegression")
    val ssc = new StreamingContext(conf, Seconds(args(2).toLong))
    val trainingData = ssc.textFileStream(args(0)).map(LabeledPoint.parse)
    val testData = ssc.textFileStream(args(1)).map(LabeledPoint.parse)
    val model = new StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt)).setNumIterations(args(4).toInt).setStepSize(.0001).algorithm.setIntercept(true)
    model.trainOn(trainingData)
    model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()
    ssc.start()
    ssc.awaitTermination()

But I am getting compile error:
[error] /data/project/LinearRegression/src/main/scala/StreamingLinearRegression.scala:54: value trainOn is not a member
of org.apache.spark.mllib.regression.LinearRegressionWithSGD
[error]     model.trainOn(trainingData)
[error]           ^
[error] /data/project/LinearRegression/src/main/scala/StreamingLinearRegression.scala:55: value predictOnValues is not a
member of org.apache.spark.mllib.regression.LinearRegressionWithSGD
[error]     model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()
[error]           ^
[error] two errors found
[error] (compile:compile) Compilation failed

Thanks
Tri

From: Yanbo Liang [mailto:yanbohappy@gmail.com<ma...@gmail.com>]
Sent: Tuesday, November 25, 2014 8:57 PM
To: Bui, Tri
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

Hi Tri,

setIntercept() is not a member function of StreamingLinearRegressionWithSGD, it's a member function of LinearRegressionWithSGD(GeneralizedLinearAlgorithm) which is a member variable(named algorithm) of StreamingLinearRegressionWithSGD.

So you need to change your code to:
val model = new StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt))
.algorithm.setIntercept(true)

Thanks
Yanbo


2014-11-25 23:51 GMT+08:00 Bui, Tri <Tr...@verizonwireless.com.invalid>>:
Thanks Liang!

It was my bad, I fat finger one of the data point, correct it and the result match with yours.

I am still not able to get the intercept.  I am getting   [error] /data/project/LinearRegression/src/main/scala/StreamingLinearRegression.scala:47: value setIntercept
mber of org.apache.spark.mllib.regression.StreamingLinearRegressionWithSGD

I try code below:
val model = new StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt))
model.setIntercept(addIntercept = true).trainOn(trainingData)

and:

val model = new StreamingLinearRegressionWithSGD().setInitialWeights(Vectors.zeros(args(3).toInt))
.setIntercept(true)

But still get compilation error.

Thanks
Tri




From: Yanbo Liang [mailto:yanbohappy@gmail.com<ma...@gmail.com>]
Sent: Tuesday, November 25, 2014 4:08 AM
To: Bui, Tri
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: Inaccurate Estimate of weights model from StreamingLinearRegressionWithSGD

The case run correctly in my environment.

14/11/25 17:48:20 INFO regression.StreamingLinearRegressionWithSGD: Model updated at time 1416908900000 ms
14/11/25 17:48:20 INFO regression.StreamingLinearRegressionWithSGD: Current model: weights, [0.9999999999998588]

Can you provide more detail information if it is convenience?

Turn on the intercept value can be set as following:
val model = new StreamingLinearRegressionWithSGD()
      .algorithm.setIntercept(true)

2014-11-25 3:31 GMT+08:00 Bui, Tri <Tr...@verizonwireless.com.invalid>>:
Hi,

I am getting incorrect weights model from StreamingLinearRegressionwith SGD.

One feature Input data is:

(1,[1])
(2,[2])
…
.
(20,[20])

The result from the Current model: weights is [-4.432]….which is not correct.

Also, how do I turn on the intercept value for the StreamingLinearRegression ?

Thanks
Tri