You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takuya Ueshin (JIRA)" <ji...@apache.org> on 2017/07/20 03:46:00 UTC

[jira] [Assigned] (SPARK-16542) bugs about types that result an array of null when creating dataframe using python

     [ https://issues.apache.org/jira/browse/SPARK-16542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Takuya Ueshin reassigned SPARK-16542:
-------------------------------------

    Assignee: Xiang Gao

> bugs about types that result an array of null when creating dataframe using python
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-16542
>                 URL: https://issues.apache.org/jira/browse/SPARK-16542
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>            Reporter: Xiang Gao
>            Assignee: Xiang Gao
>
> This is a bugs about types that result an array of null when creating DataFrame using python.
> Python's array.array have richer type than python itself, e.g. we can have {{array('f',[1,2,3])}} and {{array('d',[1,2,3])}}. Codes in spark-sql didn't take this into consideration which might cause a problem that you get an array of null values when you have {{array('f')}} in your rows.
> A simple code to reproduce this is:
> {code}
> from pyspark import SparkContext
> from pyspark.sql import SQLContext,Row,DataFrame
> from array import array
> sc = SparkContext()
> sqlContext = SQLContext(sc)
> row1 = Row(floatarray=array('f',[1,2,3]), doublearray=array('d',[1,2,3]))
> rows = sc.parallelize([ row1 ])
> df = sqlContext.createDataFrame(rows)
> df.show()
> {code}
> which have output
> {code}
> +---------------+------------------+
> |    doublearray|        floatarray|
> +---------------+------------------+
> |[1.0, 2.0, 3.0]|[null, null, null]|
> +---------------+------------------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org