You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by april_ZMQ <mq...@mais.smu.edu.sg> on 2016/06/13 08:14:37 UTC

Spark 2.0.0 : GLM problem

Hi ALL,

I’ve tried the GLM (General Linear Model) of Spark 2.0.0-preview. And I’ve
countered some unexpected problems.
•	First problem:
I test the “poisson” family type GLM with a very small dataset using SparkR
2.0.0 This dataset can run “poisson” family type GLM in general R
successfully. But SparkR showed the error below. And I have no idea where
this came from.

16/06/13 14:10:58 WARN WeightedLeastSquares: regParam is zero, which might
cause numerical instability and overfitting.
16/06/13 14:10:58 ERROR Executor: Exception in task 0.0 in stage 28.0 (TID
28)
java.lang.IllegalArgumentException: requirement failed: The response
variable of Poisson family should be positive, but got 0.0
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27145/P.png> 

•	Second problem:
When I run the same dataset which I ran successfully on Spark 1.6.0, Spark
2.0.0 generated the error below.

ERROR RBackendHandler: fit on
org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
  org.apache.spark.SparkException: Currently, GeneralizedLinearRegression
only supports number of features <= 4096. Found 7664 in the input dataset.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27145/P2.png> 

This is the R code:
“model <- glm(flow~Origin + Destination, data = distance_flow,family =
gaussian(link = "identity"))”
Dose this because Spark 2.0.0 not support as large dataset as the previous
version?






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-0-GLM-problem-tp27145.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark 2.0.0 : GLM problem

Posted by april_ZMQ <mq...@mais.smu.edu.sg>.
The picture below shows the reply from the creator for this package, Yanbo
Liang( https://github.com/yanboliang <https://github.com/yanboliang>  )

<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27203/P6.png> 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-0-GLM-problem-tp27145p27203.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark 2.0.0 : GLM problem

Posted by april_ZMQ <mq...@mais.smu.edu.sg>.
To update the post:

•	First problem:  This problem can be solved by adding a epsilon(very small
value to 0 value). Because in poisson model, it doesn't allow the y value to
be zero. But in general, it doesn't have this requirement.

But now I encounter another problem that in every GLM model.
"Values to assemble cannot be null"
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27164/P3.png> 

I've found the code in 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala
<https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala>  
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27164/p4.png> 

Can you guys explain what that mean?














--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-0-GLM-problem-tp27145p27164.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org