You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bryan Cutler (JIRA)" <ji...@apache.org> on 2017/05/04 18:45:04 UTC

[jira] [Commented] (SPARK-20563) going to DataFrame to RDD and back changes the schema, if the schema is not explicitly provided

    [ https://issues.apache.org/jira/browse/SPARK-20563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15997205#comment-15997205 ] 

Bryan Cutler commented on SPARK-20563:
--------------------------------------

I think this is to be expected.  An RDD does not define a schema, so the conversion to it basically discards it.  Then going back to DataFrame, the schema has to be inferred by the data.  Since Python ints can go above 32 bits, it will infer a LongType.

> going to DataFrame to RDD and back changes the schema, if the schema is not explicitly provided
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-20563
>                 URL: https://issues.apache.org/jira/browse/SPARK-20563
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.1.0
>            Reporter: Danil Kirsanov
>            Priority: Minor
>
> df.rdd.toDF() converts the DataFrame of IntegerType to the LongType if the schema is not explicitly provided in toDF().
> Below is a full reproduction code
> -------------------------------------
> from pyspark.sql.types import IntegerType, StructType, StructField
> schema = StructType([StructField("a",IntegerType(),True), StructField("b",IntegerType(),True)])
> df_test = spark.createDataFrame([(1,2)], schema)
> df_test.printSchema()
> df_test.rdd.toDF().printSchema()



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org