You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yanbo Liang (JIRA)" <ji...@apache.org> on 2016/07/01 10:46:10 UTC

[jira] [Commented] (SPARK-16064) Fix the GLM error caused by NA produced by reweight function

    [ https://issues.apache.org/jira/browse/SPARK-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358776#comment-15358776 ] 

Yanbo Liang commented on SPARK-16064:
-------------------------------------

I will have a look in the next few days.

> Fix the GLM error caused by NA produced by reweight function
> ------------------------------------------------------------
>
>                 Key: SPARK-16064
>                 URL: https://issues.apache.org/jira/browse/SPARK-16064
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.0.0
>            Reporter: Zhang Mengqi
>            Assignee: Yanbo Liang
>            Priority: Minor
>
> This case happens when users run GLM in with SparkR, the same dataset runs GLM well in native R.
> When users run the GLM model using glm with family of poisson, it generates a assertion errors by NA produced by reweight function.
> 16/06/20 16:40:22 ERROR RBackendHandler: fit on org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed
> Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
>   java.lang.AssertionError: assertion failed: Sum of weights cannot be zero.
> 	at scala.Predef$.assert(Predef.scala:170)
> 	at org.apache.spark.ml.optim.WeightedLeastSquares$Aggregator.validate(WeightedLeastSquares.scala:248)
> 	at org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:82)
> 	at org.apache.spark.ml.optim.IterativelyReweightedLeastSquares.fit(IterativelyReweightedLeastSquares.scala:85)
> 	at org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:276)
> 	at org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:134)
> 	at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> 	at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
> 	at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:148)
> 	at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:144)
> 	at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> 	at scala.collection.Abstra
> P.S The dataset is about a city ride flow between several planning area in Singapore.
> ride_flow_exp <- glm(flow~Origin+Destination+distance,ride_flow,family = poisson(link = "log"))
> SparkDataFrame[Origin:string, Destination:string, flow:double, Oi:int, Dj:int, distance:double]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org