You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxim Gekk (Jira)" <ji...@apache.org> on 2021/02/01 19:30:00 UTC

[jira] [Created] (SPARK-34314) Wrong discovered partition value

Maxim Gekk created SPARK-34314:
----------------------------------

             Summary: Wrong discovered partition value
                 Key: SPARK-34314
                 URL: https://issues.apache.org/jira/browse/SPARK-34314
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.0
            Reporter: Maxim Gekk


The example below portraits the issue:
{code:scala}
      val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
      df.write
        .partitionBy("part")
        .format("parquet")
        .save(path)
      val readback = spark.read.parquet(path)
      readback.printSchema()
      readback.show(false)
{code}

It write the partition value as string:
{code}
/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
├── _SUCCESS
├── part=-0
│   └── part-00001-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
└── part=AA
    └── part-00000-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
{code}
*"-0"* and "AA".

but when Spark reads data back, it transforms "-0" to "0"
{code}
root
 |-- id: integer (nullable = true)
 |-- part: string (nullable = true)

+---+----+
|id |part|
+---+----+
|0  |AA  |
|1  |0   |
+---+----+
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org