You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Evan Zamir (JIRA)" <ji...@apache.org> on 2016/09/12 16:58:20 UTC
[jira] [Created] (SPARK-17508) Setting weightCol to None in ML
library causes an error
Evan Zamir created SPARK-17508:
----------------------------------
Summary: Setting weightCol to None in ML library causes an error
Key: SPARK-17508
URL: https://issues.apache.org/jira/browse/SPARK-17508
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 2.0.0
Reporter: Evan Zamir
The following code runs without error:
{code}
spark = SparkSession.builder.appName('WeightBug').getOrCreate()
df = spark.createDataFrame(
[
(1.0, 1.0, Vectors.dense(1.0)),
(0.0, 1.0, Vectors.dense(-1.0))
],
["label", "weight", "features"])
lr = LogisticRegression(maxIter=5, regParam=0.0, weightCol="weight")
model = lr.fit(df)
{code}
My expectation from reading the documentation is that setting weightCol=None should treat all weights as 1.0 (regardless of whether a column exists). However, the same code with weightCol set to None causes the following errors:
Traceback (most recent call last):
File "/Users/evanzamir/ams/px-seed-model/scripts/bug.py", line 32, in <module>
model = lr.fit(df)
File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/base.py", line 64, in fit
return self._fit(dataset)
File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 213, in _fit
java_model = self._fit_java(dataset)
File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 210, in _fit_java
return self._java_obj.fit(dataset._jdf)
File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o38.fit.
: java.lang.NullPointerException
at org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:264)
at org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:259)
at org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:159)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745)
Process finished with exit code 1
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org