You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/09/13 09:35:20 UTC

[jira] [Updated] (SPARK-17508) Setting weightCol to None in ML library causes an error

     [ https://issues.apache.org/jira/browse/SPARK-17508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated SPARK-17508:
------------------------------
      Priority: Minor  (was: Major)
    Issue Type: Improvement  (was: Bug)

I agree it ends up being more of an improvement, but also seems worth fixing if possible. Is the right behavior that None acts the same as if it weren't set? surely that's possible on the Python API side?

> Setting weightCol to None in ML library causes an error
> -------------------------------------------------------
>
>                 Key: SPARK-17508
>                 URL: https://issues.apache.org/jira/browse/SPARK-17508
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.0.0
>            Reporter: Evan Zamir
>            Priority: Minor
>
> The following code runs without error:
> {code}
> spark = SparkSession.builder.appName('WeightBug').getOrCreate()
> df = spark.createDataFrame(
>     [
>         (1.0, 1.0, Vectors.dense(1.0)),
>         (0.0, 1.0, Vectors.dense(-1.0))
>     ],
>     ["label", "weight", "features"])
> lr = LogisticRegression(maxIter=5, regParam=0.0, weightCol="weight")
> model = lr.fit(df)
> {code}
> My expectation from reading the documentation is that setting weightCol=None should treat all weights as 1.0 (regardless of whether a column exists). However, the same code with weightCol set to None causes the following errors:
> Traceback (most recent call last):
>   File "/Users/evanzamir/ams/px-seed-model/scripts/bug.py", line 32, in <module>
>     model = lr.fit(df)
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/base.py", line 64, in fit
>     return self._fit(dataset)
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 213, in _fit
>     java_model = self._fit_java(dataset)
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 210, in _fit_java
>     return self._java_obj.fit(dataset._jdf)
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
>     return f(*a, **kw)
>   File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o38.fit.
> : java.lang.NullPointerException
> 	at org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:264)
> 	at org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:259)
> 	at org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:159)
> 	at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
> 	at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
> 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> 	at py4j.Gateway.invoke(Gateway.java:280)
> 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
> 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
> 	at py4j.GatewayConnection.run(GatewayConnection.java:211)
> 	at java.lang.Thread.run(Thread.java:745)
> Process finished with exit code 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org