You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxim Gekk (Jira)" <ji...@apache.org> on 2021/02/02 06:22:00 UTC
[jira] [Updated] (SPARK-34314) Wrong discovered partition value
[ https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maxim Gekk updated SPARK-34314:
-------------------------------
Affects Version/s: 3.1.0
3.0.2
2.4.8
> Wrong discovered partition value
> --------------------------------
>
> Key: SPARK-34314
> URL: https://issues.apache.org/jira/browse/SPARK-34314
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0
> Reporter: Maxim Gekk
> Priority: Major
>
> The example below portraits the issue:
> {code:scala}
> val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
> df.write
> .partitionBy("part")
> .format("parquet")
> .save(path)
> val readback = spark.read.parquet(path)
> readback.printSchema()
> readback.show(false)
> {code}
> It write the partition value as string:
> {code}
> /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
> ├── _SUCCESS
> ├── part=-0
> │ └── part-00001-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> └── part=AA
> └── part-00000-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> {code}
> *"-0"* and "AA".
> but when Spark reads data back, it transforms "-0" to "0"
> {code}
> root
> |-- id: integer (nullable = true)
> |-- part: string (nullable = true)
> +---+----+
> |id |part|
> +---+----+
> |0 |AA |
> |1 |0 |
> +---+----+
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org