You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Evan Zamir (JIRA)" <ji...@apache.org> on 2016/09/12 16:58:20 UTC
[jira] [Created] (SPARK-17508) Setting weightCol to None in ML library causes an error

Evan Zamir created SPARK-17508:
----------------------------------

             Summary: Setting weightCol to None in ML library causes an error
                 Key: SPARK-17508
                 URL: https://issues.apache.org/jira/browse/SPARK-17508
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.0.0
            Reporter: Evan Zamir


The following code runs without error:

{code}
spark = SparkSession.builder.appName('WeightBug').getOrCreate()
df = spark.createDataFrame(
    [
        (1.0, 1.0, Vectors.dense(1.0)),
        (0.0, 1.0, Vectors.dense(-1.0))
    ],
    ["label", "weight", "features"])
lr = LogisticRegression(maxIter=5, regParam=0.0, weightCol="weight")
model = lr.fit(df)
{code}

My expectation from reading the documentation is that setting weightCol=None should treat all weights as 1.0 (regardless of whether a column exists). However, the same code with weightCol set to None causes the following errors:

Traceback (most recent call last):

  File "/Users/evanzamir/ams/px-seed-model/scripts/bug.py", line 32, in <module>
    model = lr.fit(df)
  File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/base.py", line 64, in fit
    return self._fit(dataset)
  File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 213, in _fit
    java_model = self._fit_java(dataset)
  File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/ml/wrapper.py", line 210, in _fit_java
    return self._java_obj.fit(dataset._jdf)
  File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in __call__
  File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o38.fit.
: java.lang.NullPointerException
	at org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:264)
	at org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:259)
	at org.apache.spark.ml.classification.LogisticRegression.train(LogisticRegression.scala:159)
	at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
	at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:280)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:211)
	at java.lang.Thread.run(Thread.java:745)


Process finished with exit code 1






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org