You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nathan Grand (Jira)" <ji...@apache.org> on 2020/02/29 21:26:00 UTC
[jira] [Created] (SPARK-30996) Able to write parquet file subsequently unable to read

Nathan Grand created SPARK-30996:
------------------------------------

             Summary: Able to write parquet file subsequently unable to read
                 Key: SPARK-30996
                 URL: https://issues.apache.org/jira/browse/SPARK-30996
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.4
         Environment: spark-shell 2.4.4
            Reporter: Nathan Grand


{code:java}
case class StructKey(i: Int)
case class StructValue(l: Long)
case class Outer(m: Map[StructKey, StructValue])

val data = Seq(Seq(Outer(Map(StructKey(0) -> StructValue(1L)))))
val ds = data.toDS

ds.write.mode("overwrite").parquet("ds.parquet")
val in = spark.read.parquet("ds.parquet")

ds.printSchema

root
 |-- value: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- m: map (nullable = true)
 |    |    |    |-- key: struct
 |    |    |    |    |-- i: integer (nullable = false)
 |    |    |    |-- value: struct (valueContainsNull = true)
 |    |    |    |    |-- l: long (nullable = false)


ds.show(false)

+----------------+
|value           |
+----------------+
|[[[[0] -> [1]]]]|
+----------------+

in.printSchema

root
 |-- value: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- m: map (nullable = true)
 |    |    |    |-- key: struct
 |    |    |    |    |-- i: integer (nullable = true)
 |    |    |    |-- value: struct (valueContainsNull = true)
 |    |    |    |    |-- l: long (nullable = true)

in.show(false)

Caused by: org.apache.spark.sql.AnalysisException: Map key type is expected to be a primitive type, but found: required group key {
  required int32 i;
};
  at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$.checkConversionRequirement(ParquetSchemaConverter.scala:583)
  at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:228)
  at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter$$anonfun$convertGroupField$2.apply(ParquetSchemaConverter.scala:183)
  at scala.Option.fold(Option.scala:158)
  at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertGroupField(ParquetSchemaConverter.scala:183)
  at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:90)
{code}
You should not be able to write something you subsequently can't read; if attempting to write invalid parquet it should error at write, otherwise it should successfully read. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org