You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by svattig <sr...@gmail.com> on 2018/04/18 19:37:32 UTC

GLM Poisson Model - Deviance calculations

In Spark 2.3, When Poisson Model(with labelCol having few counts as 0's) is
fit, the Deviance calculations are broken as result of log(0). I think this
is the same case as in spark 2.2. 
But the new toString method in Spark 2.3's
GeneralizedLinearRegressionTrainingSummary class is throwing error at line
1551 with NumberFormatException. Due to this exception, we are not able to
get the summary object from Model fit.

Can the toString method be fixed including Deviance calculations for example
taking log(1) when ever the count is 0 instead of having log(0) ?

Thanks,
Srikar.V    



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: GLM Poisson Model - Deviance calculations

Posted by Sean Owen <sr...@gmail.com>.
I see, this was handled for binomial deviance by the 'ylogy' method, which
computes y log (y / mu), defining this to be 0 when y = 0. It's not
necessary to add a delta or anything; 0 is the limit as y goes to 0 so it's
fine.

 The same change is appropriate for Poisson deviance. Gamma deviance looks
like it also has this issue but I suppose it isn't defined at 0 anyway. I
don't know if implementations still try to return something that isn't NaN
or what here.

Anyway, I think it's fine to open a JIRA and PR to make that change.

On Wed, Apr 18, 2018 at 9:30 PM svattig <sr...@gmail.com> wrote:

> Yes i’m referring to that method deviance. It fails when ever y is 0. I
> think
> R deviance calculation logic checks if y is 0 and assigns 1 to y for such
> cases.
>
> There are few deviances Like nulldeviance, residualdiviance and deviance
> that Glm regression summary object has.
> You might want to check those as well so the toString method doesn’t fail.
>
> Thank you,
> Srikar.V
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: GLM Poisson Model - Deviance calculations

Posted by svattig <sr...@gmail.com>.
Yes i’m referring to that method deviance. It fails when ever y is 0. I think
R deviance calculation logic checks if y is 0 and assigns 1 to y for such
cases.

There are few deviances Like nulldeviance, residualdiviance and deviance
that Glm regression summary object has.
You might want to check those as well so the toString method doesn’t fail.

Thank you,
Srikar.V



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: GLM Poisson Model - Deviance calculations

Posted by Joseph PENG <jo...@gmail.com>.
Are you referring this?

   override def deviance(y: Double, mu: Double, weight: Double): Double = {
      2.0 * weight * (y * math.*log(y / mu)* - (y - mu))
    }

Not sure how does R handle this, but my guess is they may add a small
number, e.g. 0.5, to the numerator and denominator. If you can confirm
that's the issue, I will look into it.

On Wed, Apr 18, 2018 at 6:46 PM, Sean Owen <sr...@gmail.com> wrote:

> GeneralizedLinearRegression.ylogy seems to handle this case; can you be
> more specific about where the log(0) happens? that's what should be fixed,
> right? if so, then a JIRA and PR are the right way to proceed.
>
> On Wed, Apr 18, 2018 at 2:37 PM svattig <sr...@gmail.com>
> wrote:
>
>> In Spark 2.3, When Poisson Model(with labelCol having few counts as 0's)
>> is
>> fit, the Deviance calculations are broken as result of log(0). I think
>> this
>> is the same case as in spark 2.2.
>> But the new toString method in Spark 2.3's
>> GeneralizedLinearRegressionTrainingSummary class is throwing error at
>> line
>> 1551 with NumberFormatException. Due to this exception, we are not able to
>> get the summary object from Model fit.
>>
>> Can the toString method be fixed including Deviance calculations for
>> example
>> taking log(1) when ever the count is 0 instead of having log(0) ?
>>
>> Thanks,
>> Srikar.V
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

Re: GLM Poisson Model - Deviance calculations

Posted by Sean Owen <sr...@gmail.com>.
GeneralizedLinearRegression.ylogy seems to handle this case; can you be
more specific about where the log(0) happens? that's what should be fixed,
right? if so, then a JIRA and PR are the right way to proceed.

On Wed, Apr 18, 2018 at 2:37 PM svattig <sr...@gmail.com> wrote:

> In Spark 2.3, When Poisson Model(with labelCol having few counts as 0's) is
> fit, the Deviance calculations are broken as result of log(0). I think this
> is the same case as in spark 2.2.
> But the new toString method in Spark 2.3's
> GeneralizedLinearRegressionTrainingSummary class is throwing error at line
> 1551 with NumberFormatException. Due to this exception, we are not able to
> get the summary object from Model fit.
>
> Can the toString method be fixed including Deviance calculations for
> example
> taking log(1) when ever the count is 0 instead of having log(0) ?
>
> Thanks,
> Srikar.V
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>