You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxim Gekk (Jira)" <ji...@apache.org> on 2021/02/01 19:30:00 UTC
[jira] [Created] (SPARK-34314) Wrong discovered partition value
Maxim Gekk created SPARK-34314:
----------------------------------
Summary: Wrong discovered partition value
Key: SPARK-34314
URL: https://issues.apache.org/jira/browse/SPARK-34314
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk
The example below portraits the issue:
{code:scala}
val df = Seq((0, "AA"), (1, "-0")).toDF("id", "part")
df.write
.partitionBy("part")
.format("parquet")
.save(path)
val readback = spark.read.parquet(path)
readback.printSchema()
readback.show(false)
{code}
It write the partition value as string:
{code}
/private/var/folders/p3/dfs6mf655d7fnjrsjvldh0tc0000gn/T/spark-e09eae99-7ecf-4ab2-b99b-f63f8dea658d
├── _SUCCESS
├── part=-0
│ └── part-00001-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
└── part=AA
└── part-00000-02144398-2896-4d21-9628-a8743d098cb4.c000.snappy.parquet
{code}
*"-0"* and "AA".
but when Spark reads data back, it transforms "-0" to "0"
{code}
root
|-- id: integer (nullable = true)
|-- part: string (nullable = true)
+---+----+
|id |part|
+---+----+
|0 |AA |
|1 |0 |
+---+----+
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org