You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/08/27 00:27:45 UTC
[jira] [Assigned] (SPARK-10305) PySpark createDataFrame on list of
LabeledPoints fails (regression)
[ https://issues.apache.org/jira/browse/SPARK-10305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-10305:
------------------------------------
Assignee: Apache Spark
> PySpark createDataFrame on list of LabeledPoints fails (regression)
> -------------------------------------------------------------------
>
> Key: SPARK-10305
> URL: https://issues.apache.org/jira/browse/SPARK-10305
> Project: Spark
> Issue Type: Bug
> Components: ML, PySpark, SQL
> Affects Versions: 1.5.0
> Reporter: Joseph K. Bradley
> Assignee: Apache Spark
> Priority: Critical
>
> The following code works in 1.4 but fails in 1.5:
> {code}
> import numpy as np
> from pyspark.mllib.regression import LabeledPoint
> from pyspark.mllib.linalg import Vectors
> lp1 = LabeledPoint(1.0, Vectors.sparse(5, np.array([0, 1]), np.array([2.0, 21.0])))
> lp2 = LabeledPoint(0.0, Vectors.sparse(5, np.array([2, 3]), np.array([2.0, 21.0])))
> tmp = [lp1, lp2]
> sqlContext.createDataFrame(tmp).show()
> {code}
> The failure is:
> {code}
> ValueError: Unexpected tuple LabeledPoint(1.0, (5,[0,1],[2.0,21.0])) with StructType
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call last)
> <ipython-input-1-0e7cb8772e10> in <module>()
> 6 lp2 = LabeledPoint(0.0, Vectors.sparse(5, np.array([2, 3]), np.array([2.0, 21.0])))
> 7 tmp = [lp1, lp2]
> ----> 8 sqlContext.createDataFrame(tmp).show()
> /home/ubuntu/databricks/spark/python/pyspark/sql/context.pyc in createDataFrame(self, data, schema, samplingRatio)
> 404 rdd, schema = self._createFromRDD(data, schema, samplingRatio)
> 405 else:
> --> 406 rdd, schema = self._createFromLocal(data, schema)
> 407 jrdd = self._jvm.SerDeUtil.toJavaArray(rdd._to_java_object_rdd())
> 408 jdf = self._ssql_ctx.applySchemaToPythonRDD(jrdd.rdd(), schema.json())
> /home/ubuntu/databricks/spark/python/pyspark/sql/context.pyc in _createFromLocal(self, data, schema)
> 335
> 336 # convert python objects to sql data
> --> 337 data = [schema.toInternal(row) for row in data]
> 338 return self._sc.parallelize(data), schema
> 339
> /home/ubuntu/databricks/spark/python/pyspark/sql/types.pyc in toInternal(self, obj)
> 539 return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
> 540 else:
> --> 541 raise ValueError("Unexpected tuple %r with StructType" % obj)
> 542 else:
> 543 if isinstance(obj, dict):
> ValueError: Unexpected tuple LabeledPoint(1.0, (5,[0,1],[2.0,21.0])) with StructType
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org