You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by herbps10 <hp...@geneseo.edu> on 2014/01/27 02:35:56 UTC

Inaccurate Estimates from LinearRegressionWithSGD

Hello,

I just finished setting up a standalone Spark cluster and have moved on to
exploring MLlib.

I'm trying to perform Linear Regression on a very simple, contrived dataset.
I have  which contains


I then ran the following code through the Spark shell (modified very
slightly from
http://spark.incubator.apache.org/docs/latest/mllib-guide.html):



The problem is that the weights and intercept are extremely off:


It gets a little better if I adjust the step size:


But still doesn't converge on the correct estimates (I would of course
expect intercept=0, slope=1). Any idea what I'm doing wrong? I feel like I
must be missing something obvious.

Thanks!
Herb Susmann
SUNY Geneseo
hps1@geneseo.edu



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Inaccurate-Estimates-from-LinearRegressionWithSGD-tp942.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Inaccurate Estimates from LinearRegressionWithSGD

Posted by Sean Owen <so...@cloudera.com>.
This fix from 8 days ago might be related:
https://github.com/apache/incubator-spark/pull/459

If you are not building from HEAD, I might try again with that, or wait for
the 0.9 release that will contain it. May not be the cause.


On Mon, Jan 27, 2014 at 1:35 AM, herbps10 <hp...@geneseo.edu> wrote:

> Hello,
>
> I just finished setting up a standalone Spark cluster and have moved on to
> exploring MLlib.
>
> I'm trying to perform Linear Regression on a very simple, contrived
> dataset.
> I have  which contains
>
>
> I then ran the following code through the Spark shell (modified very
> slightly from
> http://spark.incubator.apache.org/docs/latest/mllib-guide.html):
>
>
>
> The problem is that the weights and intercept are extremely off:
>
>
> It gets a little better if I adjust the step size:
>
>
> But still doesn't converge on the correct estimates (I would of course
> expect intercept=0, slope=1). Any idea what I'm doing wrong? I feel like I
> must be missing something obvious.
>
> Thanks!
> Herb Susmann
> SUNY Geneseo
> hps1@geneseo.edu
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Inaccurate-Estimates-from-LinearRegressionWithSGD-tp942.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>