You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by afarahat <ay...@yahoo.com> on 2015/07/12 17:22:46 UTC

How can the RegressionMetrics produce negative R2 and explained variance?

Hello; 
I am using the ALS recommendation MLLibb. To select the optimal rank, I have
a number of users who used multiple items as my test. I then get the
prediction on these users and compare it to the observed. I use 
the  RegressionMetrics to estimate the R^2. 
I keep getting a negative value. 
r2 =   -1.18966999676 explained var =  -1.18955347415 count =  11620309
Here is my Pyspark code :

train1.cache()
test1.cache()

numIterations =10
for i in range(10) :
        rank = int(40+i*10)
        als = ALS(rank=rank, maxIter=numIterations,implicitPrefs=False)
        model = als.fit(train1)
        predobs =
model.transform(test1).select("prediction","rating").map(lambda p :
(p.prediction,p.rating)).filter(lambda p: (math.isnan(p[0]) == False))
        metrics = RegressionMetrics(predobs)
        mycount = predobs.count()
        myr2 = metrics.r2
        myvar = metrics.explainedVariance
        print "hooo",rank, " r2 =  ",myr2, "explained var = ", myvar, "count
= ",mycount




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-can-the-RegressionMetrics-produce-negative-R2-and-explained-variance-tp23779.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: How can the RegressionMetrics produce negative R2 and explained variance?

Posted by Feynman Liang <fl...@databricks.com>.
This might be a bug... R^2 should always be in [0,1] and variance should
never be negative.

Can you give more details on which version of Spark you are running?

On Sun, Jul 12, 2015 at 8:37 AM, Sean Owen <so...@cloudera.com> wrote:

> In general, R2 means the line that was fit is a very poor fit -- the
> mean would give a smaller squared error. But it can also mean you are
> applying R2 where it doesn't apply. Here, you're not performing a
> linear regression; why are you using R2?
>
> On Sun, Jul 12, 2015 at 4:22 PM, afarahat <ay...@yahoo.com> wrote:
> > Hello;
> > I am using the ALS recommendation MLLibb. To select the optimal rank, I
> have
> > a number of users who used multiple items as my test. I then get the
> > prediction on these users and compare it to the observed. I use
> > the  RegressionMetrics to estimate the R^2.
> > I keep getting a negative value.
> > r2 =   -1.18966999676 explained var =  -1.18955347415 count =  11620309
> > Here is my Pyspark code :
> >
> > train1.cache()
> > test1.cache()
> >
> > numIterations =10
> > for i in range(10) :
> >         rank = int(40+i*10)
> >         als = ALS(rank=rank, maxIter=numIterations,implicitPrefs=False)
> >         model = als.fit(train1)
> >         predobs =
> > model.transform(test1).select("prediction","rating").map(lambda p :
> > (p.prediction,p.rating)).filter(lambda p: (math.isnan(p[0]) == False))
> >         metrics = RegressionMetrics(predobs)
> >         mycount = predobs.count()
> >         myr2 = metrics.r2
> >         myvar = metrics.explainedVariance
> >         print "hooo",rank, " r2 =  ",myr2, "explained var = ", myvar,
> "count
> > = ",mycount
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-can-the-RegressionMetrics-produce-negative-R2-and-explained-variance-tp23779.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> > For additional commands, e-mail: user-help@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: How can the RegressionMetrics produce negative R2 and explained variance?

Posted by Sean Owen <so...@cloudera.com>.
In general, R2 means the line that was fit is a very poor fit -- the
mean would give a smaller squared error. But it can also mean you are
applying R2 where it doesn't apply. Here, you're not performing a
linear regression; why are you using R2?

On Sun, Jul 12, 2015 at 4:22 PM, afarahat <ay...@yahoo.com> wrote:
> Hello;
> I am using the ALS recommendation MLLibb. To select the optimal rank, I have
> a number of users who used multiple items as my test. I then get the
> prediction on these users and compare it to the observed. I use
> the  RegressionMetrics to estimate the R^2.
> I keep getting a negative value.
> r2 =   -1.18966999676 explained var =  -1.18955347415 count =  11620309
> Here is my Pyspark code :
>
> train1.cache()
> test1.cache()
>
> numIterations =10
> for i in range(10) :
>         rank = int(40+i*10)
>         als = ALS(rank=rank, maxIter=numIterations,implicitPrefs=False)
>         model = als.fit(train1)
>         predobs =
> model.transform(test1).select("prediction","rating").map(lambda p :
> (p.prediction,p.rating)).filter(lambda p: (math.isnan(p[0]) == False))
>         metrics = RegressionMetrics(predobs)
>         mycount = predobs.count()
>         myr2 = metrics.r2
>         myvar = metrics.explainedVariance
>         print "hooo",rank, " r2 =  ",myr2, "explained var = ", myvar, "count
> = ",mycount
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-can-the-RegressionMetrics-produce-negative-R2-and-explained-variance-tp23779.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org