You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxim Gekk (Jira)" <ji...@apache.org> on 2021/02/02 06:22:00 UTC

[jira] [Updated] (SPARK-34314) Wrong discovered partition value

     [ https://issues.apache.org/jira/browse/SPARK-34314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maxim Gekk updated SPARK-34314:
-------------------------------
    Affects Version/s: 3.1.0
                       3.0.2
                       2.4.8

> Wrong discovered partition value
> --------------------------------
>
>                 Key: SPARK-34314
>                 URL: https://issues.apache.org/jira/browse/SPARK-34314
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0
>            Reporter: Maxim Gekk
>            Priority: Major
>
> The example below portraits the issue:
> {code:scala}
>       val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
>       df.write
>         .partitionBy("part")
>         .format("parquet")
>         .save(path)
>       val readback = spark.read.parquet(path)
>       readback.printSchema()
>       readback.show(false)
> {code}
> It write the partition value as string:
> {code}
> /private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
> ├── _SUCCESS
> ├── part=-0
> │   └── part-00001-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> └── part=AA
>     └── part-00000-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
> {code}
> *"-0"* and "AA".
> but when Spark reads data back, it transforms "-0" to "0"
> {code}
> root
>  |-- id: integer (nullable = true)
>  |-- part: string (nullable = true)
> +---+----+
> |id |part|
> +---+----+
> |0  |AA  |
> |1  |0   |
> +---+----+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org