You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by li...@apache.org on 2016/09/27 16:39:54 UTC
spark git commit: [SPARK-16777][SQL] Do not use deprecated listType
API in ParquetSchemaConverter
Repository: spark
Updated Branches:
refs/heads/master 6a68c5d7b -> 5de1737b0
[SPARK-16777][SQL] Do not use deprecated listType API in ParquetSchemaConverter
## What changes were proposed in this pull request?
This PR removes build waning as below.
```scala
[WARNING] .../spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:448: method listType in object ConversionPatterns is deprecated: see corresponding Javadoc for more information.
[WARNING] ConversionPatterns.listType(
[WARNING] ^
[WARNING] .../spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala:464: method listType in object ConversionPatterns is deprecated: see corresponding Javadoc for more information.
[WARNING] ConversionPatterns.listType(
[WARNING] ^
```
This should not use `listOfElements` (recommended to be replaced from `listType`) instead because the new method checks if the name of elements in Parquet's `LIST` is `element` in Parquet schema and throws an exception if not. However, It seems Spark prior to 1.4.x writes `ArrayType` with Parquet's `LIST` but with `array` as its element name.
Therefore, this PR avoids to use both `listOfElements` and `listType` but just use the existing schema builder to construct the same `GroupType`.
## How was this patch tested?
Existing tests should cover this.
Author: hyukjinkwon <gu...@gmail.com>
Closes #14399 from HyukjinKwon/SPARK-16777.
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5de1737b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5de1737b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5de1737b
Branch: refs/heads/master
Commit: 5de1737b02710e36f6804d2ae243d1aeb30a0b32
Parents: 6a68c5d
Author: hyukjinkwon <gu...@gmail.com>
Authored: Wed Sep 28 00:39:47 2016 +0800
Committer: Cheng Lian <li...@databricks.com>
Committed: Wed Sep 28 00:39:47 2016 +0800
----------------------------------------------------------------------
.../parquet/ParquetSchemaConverter.scala | 26 +++++++++++++-------
1 file changed, 17 insertions(+), 9 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/5de1737b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
----------------------------------------------------------------------
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
index c81a65f..b4f36ce 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
@@ -445,14 +445,20 @@ private[parquet] class ParquetSchemaConverter(
// repeated <element-type> array;
// }
// }
- ConversionPatterns.listType(
- repetition,
- field.name,
- Types
+
+ // This should not use `listOfElements` here because this new method checks if the
+ // element name is `element` in the `GroupType` and throws an exception if not.
+ // As mentioned above, Spark prior to 1.4.x writes `ArrayType` as `LIST` but with
+ // `array` as its element name as below. Therefore, we build manually
+ // the correct group type here via the builder. (See SPARK-16777)
+ Types
+ .buildGroup(repetition).as(LIST)
+ .addField(Types
.buildGroup(REPEATED)
- // "array_element" is the name chosen by parquet-hive (1.7.0 and prior version)
+ // "array" is the name chosen by parquet-hive (1.7.0 and prior version)
.addField(convertField(StructField("array", elementType, nullable)))
.named("bag"))
+ .named(field.name)
// Spark 1.4.x and prior versions convert ArrayType with non-nullable elements into a 2-level
// LIST structure. This behavior mimics parquet-avro (1.6.0rc3). Note that this case is
@@ -461,11 +467,13 @@ private[parquet] class ParquetSchemaConverter(
// <list-repetition> group <name> (LIST) {
// repeated <element-type> element;
// }
- ConversionPatterns.listType(
- repetition,
- field.name,
+
+ // Here too, we should not use `listOfElements`. (See SPARK-16777)
+ Types
+ .buildGroup(repetition).as(LIST)
// "array" is the name chosen by parquet-avro (1.7.0 and prior version)
- convertField(StructField("array", elementType, nullable), REPEATED))
+ .addField(convertField(StructField("array", elementType, nullable), REPEATED))
+ .named(field.name)
// Spark 1.4.x and prior versions convert MapType into a 3-level group annotated by
// MAP_KEY_VALUE. This is covered by `convertGroupField(field: GroupType): DataType`.
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org