You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/09/10 05:51:36 UTC
[GitHub] spark pull request #22343: [SPARK-25391][SQL] Make behaviors consistent when...
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22343#discussion_r216204114
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala ---
@@ -69,12 +69,25 @@ class ParquetOptions(
.get(MERGE_SCHEMA)
.map(_.toBoolean)
.getOrElse(sqlConf.isParquetSchemaMergingEnabled)
+
+ /**
+ * How to resolve duplicated field names. By default, parquet data source fails when hitting
+ * duplicated field names in case-insensitive mode. When converting hive parquet table to parquet
+ * data source, we need to ask parquet data source to pick the first matched field - the same
+ * behavior as hive parquet table - to keep behaviors consistent.
+ */
+ val duplicatedFieldsResolutionMode: String = {
+ parameters.getOrElse(DUPLICATED_FIELDS_RESOLUTION_MODE,
--- End diff --
whether we have a SQL config for it or not, we must define an option here. The conversion happens per-query, so we must have a per-query option to switch the behavior, instead of a per-session SQL config.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org