You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Lantao Jin (Jira)" <ji...@apache.org> on 2020/07/01 12:35:00 UTC
[jira] [Commented] (SPARK-32147) Spark: PartitionBy changing the
columns value
[ https://issues.apache.org/jira/browse/SPARK-32147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17149403#comment-17149403 ]
Lantao Jin commented on SPARK-32147:
------------------------------------
set spark.sql.sources.partitionColumnTypeInference.enabled to false will print the right values.
> Spark: PartitionBy changing the columns value
> ----------------------------------------------
>
> Key: SPARK-32147
> URL: https://issues.apache.org/jira/browse/SPARK-32147
> Project: Spark
> Issue Type: Bug
> Components: Spark Core, Spark Shell
> Affects Versions: 3.0.0
> Reporter: Shankar Koirala
> Priority: Major
> Labels: spark
>
> While saving dataframe as parquet or csv with partitionBy column having 'f' and 'd' with numbers are changing the values.
> Below is the example
> {code:java}
> scala> val df = Seq(
> | ("9q", 1),
> | ("3k", 2),
> | ("6f", 3),
> | ("7f", 4),
> | ("7d", 5)
> | ).toDF("value", "id")
> df: org.apache.spark.sql.DataFrame = [value: string, id: int]
> scala> df.show(false)
> +-----+---+
> |value|id |
> +-----+---+
> | 9q | 1 |
> | 3k | 2 |
> | 6f | 3 |
> | 7f | 4 |
> | 7d | 5 |
> +-----+---+
> scala> df.write.partitionBy("value").mode(SaveMode.Overwrite).parquet("tmp_parquet")
> scala> spark.read.parquet("tmp_parquet").show(false)
> +---+-----+
> |id |value|
> +---+-----+
> |5 | 7.0 |
> |3 | 6.0 |
> |2 | 3k |
> |4 | 7.0 |
> |1 | 9q |
> +---+-----+
> {code}
> Same with the other format too, Is this a bug or is it normal.
> Taken from [SO|[https://stackoverflow.com/questions/62671684/spark-incorrectly-intepret-partition-name-ending-with-d-or-f-as-number-when]]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org