You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Evan Zamir (Jira)" <ji...@apache.org> on 2022/01/25 22:34:00 UTC

[jira] [Commented] (SPARK-38027) Undefined link function causing error in GLM that uses Tweedie family

    [ https://issues.apache.org/jira/browse/SPARK-38027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482128#comment-17482128 ] 

Evan Zamir commented on SPARK-38027:
------------------------------------

Looking into this further I think the issue is arising upon serializing the model either logging it or persisting it to disk. From my logs:

2022-01-25 14:21:33,664 root ERROR An error occurred while calling o1538.toString.
: java.util.NoSuchElementException: Failed to find a default value for link
	at org.apache.spark.ml.param.Params.$anonfun$getOrDefault$2(params.scala:756)
	at scala.Option.getOrElse(Option.scala:189)
	at org.apache.spark.ml.param.Params.getOrDefault(params.scala:756)
	at org.apache.spark.ml.param.Params.getOrDefault$(params.scala:753)
	at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:41)
	at org.apache.spark.ml.param.Params.$(params.scala:762)
	at org.apache.spark.ml.param.Params.$$(params.scala:762)
	at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:41)
	at org.apache.spark.ml.regression.GeneralizedLinearRegressionModel.toString(GeneralizedLinearRegression.scala:1117)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)


> Undefined link function causing error in GLM that uses Tweedie family
> ---------------------------------------------------------------------
>
>                 Key: SPARK-38027
>                 URL: https://issues.apache.org/jira/browse/SPARK-38027
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 3.1.2
>         Environment: Running on Mac OS X Monterey
>            Reporter: Evan Zamir
>            Priority: Major
>              Labels: GLM, pyspark
>
> I am trying to use the GLM regression with a Tweedie distribution so I can model insurance use cases. I have set up a very simple example adapted from the docs:
> {code:python}
>     def create_fake_losses_data(self):
>         df = self._spark.createDataFrame([
>             ("a", 100.0, 12, 1, Vectors.dense(0.0, 0.0)),
>             ("b", 0.0, 12, 1, Vectors.dense(1.0, 2.0)),
>             ("c", 0.0, 12, 1, Vectors.dense(0.0, 0.0)),
>             ("d", 2000.0, 12, 1, Vectors.dense(1.0, 1.0)), ], ["user", "label", "offset", "weight", "features"])
>         logging.info(df.collect())
>         setattr(self, 'fake_data', df)
>         try:
>             glr = GeneralizedLinearRegression(
>                 family="tweedie", variancePower=1.5, linkPower=-1, offsetCol='offset')
>             glr.setRegParam(0.3)
>             model = glr.fit(df)
>             logging.info(model)
>         except Py4JJavaError as e:
>             print(e)
>         return self
> {code}
> This causes the following error:
> *py4j.protocol.Py4JJavaError: An error occurred while calling o99.toString.
> : java.util.NoSuchElementException: Failed to find a default value for link*
>         at org.apache.spark.ml.param.Params.$anonfun$getOrDefault$2(params.scala:756)
>         at scala.Option.getOrElse(Option.scala:189)
>         at org.apache.spark.ml.param.Params.getOrDefault(params.scala:756)
>         at org.apache.spark.ml.param.Params.getOrDefault$(params.scala:753)
>         at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:41)
>         at org.apache.spark.ml.param.Params.$(params.scala:762)
>         at org.apache.spark.ml.param.Params.$$(params.scala:762)
>         at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:41)
>         at org.apache.spark.ml.regression.GeneralizedLinearRegressionModel.toString(GeneralizedLinearRegression.scala:1117)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>         at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>         at py4j.Gateway.invoke(Gateway.java:282)
>         at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>         at py4j.commands.CallCommand.execute(CallCommand.java:79)
>         at py4j.GatewayConnection.run(GatewayConnection.java:238)
>         at java.lang.Thread.run(Thread.java:748)
> I was under the assumption that the default value for link is None, if not defined otherwise.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org