You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiang Gao (JIRA)" <ji...@apache.org> on 2016/07/14 09:01:20 UTC

[jira] [Created] (SPARK-16542) bugs about types that result an array of null when creating dataframe using python

Xiang Gao created SPARK-16542:
---------------------------------

             Summary: bugs about types that result an array of null when creating dataframe using python
                 Key: SPARK-16542
                 URL: https://issues.apache.org/jira/browse/SPARK-16542
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
            Reporter: Xiang Gao


This is a bugs about types that result an array of null when creating dataframe using python.

Python's array.array have richer type than python itself, e.g. we can have array('f',[1,2,3]) and array('d',[1,2,3]). Codes in spark-sql didn't take this into consideration which might cause a problem that you get an array of null values when you have array('f') in your rows.

A simple code to reproduce this is:

from pyspark import SparkContext
from pyspark.sql import SQLContext,Row,DataFrame
from array import array

sc = SparkContext()
sqlContext = SQLContext(sc)

row1 = Row(floatarray=array('f',[1,2,3]), doublearray=array('d',[1,2,3]))
rows = sc.parallelize([ row1 ])
df = sqlContext.createDataFrame(rows)
df.show()

which have output
+---------------+------------------+
| doublearray| floatarray|
+---------------+------------------+
|[1.0, 2.0, 3.0]|[null, null, null]|
+---------------+------------------+



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org