You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Lantao Jin (Jira)" <ji...@apache.org> on 2020/07/01 12:35:00 UTC

[jira] [Commented] (SPARK-32147) Spark: PartitionBy changing the columns value

    [ https://issues.apache.org/jira/browse/SPARK-32147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149403#comment-17149403 ] 

Lantao Jin commented on SPARK-32147:
------------------------------------

set spark.sql.sources.partitionColumnTypeInference.enabled to false will print the right values.

> Spark: PartitionBy changing the columns value 
> ----------------------------------------------
>
>                 Key: SPARK-32147
>                 URL: https://issues.apache.org/jira/browse/SPARK-32147
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, Spark Shell
>    Affects Versions: 3.0.0
>            Reporter: Shankar Koirala
>            Priority: Major
>              Labels: spark
>
> While saving dataframe as parquet or csv with partitionBy column having 'f' and 'd' with numbers are changing the values.
> Below is the example 
> {code:java}
> scala> val df = Seq(
>  | ("9q", 1),
>  | ("3k", 2),
>  | ("6f", 3),
>  | ("7f", 4),
>  | ("7d", 5)
>  | ).toDF("value", "id")
> df: org.apache.spark.sql.DataFrame = [value: string, id: int]
> scala> df.show(false)
> +-----+---+
> |value|id |
> +-----+---+
> |  9q | 1 |
> |  3k | 2 |
> |  6f | 3 |
> |  7f | 4 |
> |  7d | 5 |
> +-----+---+
> scala> df.write.partitionBy("value").mode(SaveMode.Overwrite).parquet("tmp_parquet")
> scala> spark.read.parquet("tmp_parquet").show(false)
> +---+-----+
> |id |value|
> +---+-----+
> |5  | 7.0 |
> |3  | 6.0 |
> |2  | 3k  |
> |4  | 7.0 |
> |1  | 9q  |
> +---+-----+
> {code}
> Same with the other format too, Is this a bug or is it normal.
> Taken from [SO|[https://stackoverflow.com/questions/62671684/spark-incorrectly-intepret-partition-name-ending-with-d-or-f-as-number-when]]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org