You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by april_ZMQ <mq...@mais.smu.edu.sg> on 2016/06/13 08:14:37 UTC
Spark 2.0.0 : GLM problem
Hi ALL,
I’ve tried the GLM (General Linear Model) of Spark 2.0.0-preview. And I’ve
countered some unexpected problems.
• First problem:
I test the “poisson” family type GLM with a very small dataset using SparkR
2.0.0 This dataset can run “poisson” family type GLM in general R
successfully. But SparkR showed the error below. And I have no idea where
this came from.
16/06/13 14:10:58 WARN WeightedLeastSquares: regParam is zero, which might
cause numerical instability and overfitting.
16/06/13 14:10:58 ERROR Executor: Exception in task 0.0 in stage 28.0 (TID
28)
java.lang.IllegalArgumentException: requirement failed: The response
variable of Poisson family should be positive, but got 0.0
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27145/P.png>
• Second problem:
When I run the same dataset which I ran successfully on Spark 1.6.0, Spark
2.0.0 generated the error below.
ERROR RBackendHandler: fit on
org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper failed
Error in invokeJava(isStatic = TRUE, className, methodName, ...) :
org.apache.spark.SparkException: Currently, GeneralizedLinearRegression
only supports number of features <= 4096. Found 7664 in the input dataset.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27145/P2.png>
This is the R code:
“model <- glm(flow~Origin + Destination, data = distance_flow,family =
gaussian(link = "identity"))”
Dose this because Spark 2.0.0 not support as large dataset as the previous
version?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-0-GLM-problem-tp27145.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark 2.0.0 : GLM problem
Posted by april_ZMQ <mq...@mais.smu.edu.sg>.
The picture below shows the reply from the creator for this package, Yanbo
Liang( https://github.com/yanboliang <https://github.com/yanboliang> )
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27203/P6.png>
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-0-GLM-problem-tp27145p27203.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Spark 2.0.0 : GLM problem
Posted by april_ZMQ <mq...@mais.smu.edu.sg>.
To update the post:
• First problem: This problem can be solved by adding a epsilon(very small
value to 0 value). Because in poisson model, it doesn't allow the y value to
be zero. But in general, it doesn't have this requirement.
But now I encounter another problem that in every GLM model.
"Values to assemble cannot be null"
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27164/P3.png>
I've found the code in
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala
<https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/VectorAssembler.scala>
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n27164/p4.png>
Can you guys explain what that mean?
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-0-GLM-problem-tp27145p27164.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org