You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by seancxmao <gi...@git.apache.org> on 2018/08/20 03:24:50 UTC

[GitHub] spark pull request #22148: [SPARK-25132][SQL] Case-insensitive field resolut...

GitHub user seancxmao opened a pull request:

    https://github.com/apache/spark/pull/22148

    [SPARK-25132][SQL] Case-insensitive field resolution when reading from Parquet

    ## What changes were proposed in this pull request?
    Spark SQL returns NULL for a column whose Hive metastore schema and Parquet schema are in different letter cases, regardless of spark.sql.caseSensitive set to true or false. This PR aims to add case-insensitive field resolution for ParquetFileFormat.
    * Do case-insensitive resolution only if Spark is in case-insensitive mode.
    * Field resolution should fail if there is ambiguity, i.e. more than one field is matched.
    
    ## How was this patch tested?
    Unit tests added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/seancxmao/spark SPARK-25132-Parquet

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22148
    
----
commit e0a555354436b0099a7b48b3269b8c5ccb73571b
Author: seancxmao <se...@...>
Date:   2018-08-17T10:06:28Z

    [SPARK-25132][SQL] Case-insensitive field resolution when reading from Parquet

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22148: [SPARK-25132][SQL] Case-insensitive field resolut...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22148#discussion_r211148657
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala ---
    @@ -277,14 +291,35 @@ private[parquet] object ParquetReadSupport {
        * @return A list of clipped [[GroupType]] fields, which can be empty.
        */
       private def clipParquetGroupFields(
    -      parquetRecord: GroupType, structType: StructType): Seq[Type] = {
    -    val parquetFieldMap = parquetRecord.getFields.asScala.map(f => f.getName -> f).toMap
    +      parquetRecord: GroupType, structType: StructType, caseSensitive: Boolean): Seq[Type] = {
         val toParquet = new SparkToParquetSchemaConverter(writeLegacyParquetFormat = false)
    -    structType.map { f =>
    -      parquetFieldMap
    -        .get(f.name)
    -        .map(clipParquetType(_, f.dataType))
    -        .getOrElse(toParquet.convertField(f))
    +    if (caseSensitive) {
    +      val caseSensitiveParquetFieldMap =
    +        parquetRecord.getFields.asScala.map(f => f.getName -> f).toMap
    +      structType.map { f =>
    +        caseSensitiveParquetFieldMap
    +          .get(f.name)
    +          .map(clipParquetType(_, f.dataType, caseSensitive))
    +          .getOrElse(toParquet.convertField(f))
    +      }
    +    } else {
    +      // Do case-insensitive resolution only if in case-insensitive mode
    +      val caseInsensitiveParquetFieldMap =
    +        parquetRecord.getFields.asScala.groupBy(_.getName.toLowerCase)
    --- End diff --
    
    nit: `toLowerCase(Locale.ROOT)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    LGTM.
    @cloud-fan @gatorsmile Could you kindly help trigger Jenkins and review?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    **[Test build #94943 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94943/testReport)** for PR 22148 at commit [`9261beb`](https://github.com/apache/spark/commit/9261bebf12ab5a563f21d03ad2e5ac4cc03dbd63).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94954/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    **[Test build #94954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94954/testReport)** for PR 22148 at commit [`0176d29`](https://github.com/apache/spark/commit/0176d296bfa861ce71cc09e61be76e8bca761801).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94957/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22148: [SPARK-25132][SQL] Case-insensitive field resolut...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22148


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    **[Test build #94957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94957/testReport)** for PR 22148 at commit [`0176d29`](https://github.com/apache/spark/commit/0176d296bfa861ce71cc09e61be76e8bca761801).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by seancxmao <gi...@git.apache.org>.
Github user seancxmao commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    **[Test build #94957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94957/testReport)** for PR 22148 at commit [`0176d29`](https://github.com/apache/spark/commit/0176d296bfa861ce71cc09e61be76e8bca761801).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    **[Test build #94945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94945/testReport)** for PR 22148 at commit [`c8279d2`](https://github.com/apache/spark/commit/c8279d2ae8a72ec520a0627c3a3f9dbb34da49ee).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    **[Test build #94945 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94945/testReport)** for PR 22148 at commit [`c8279d2`](https://github.com/apache/spark/commit/c8279d2ae8a72ec520a0627c3a3f9dbb34da49ee).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    retest this pleasr


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22148: [SPARK-25132][SQL] Case-insensitive field resolut...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22148#discussion_r211149529
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -430,6 +430,48 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  test(s"SPARK-25132: case-insensitive field resolution when reading from Parquet") {
    +    withTempDir { dir =>
    +      val format = "parquet"
    +      val tableDir = dir.getCanonicalPath + s"/$format"
    +      val tableName = s"spark_25132_${format}"
    +      withTable(tableName) {
    +        val end = 5
    +        val data = spark.range(end).selectExpr("id as A", "id * 2 as b", "id * 3 as B")
    +        withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") {
    +          data.write.format(format).mode("overwrite").save(tableDir)
    +        }
    +        sql(s"CREATE TABLE $tableName (a LONG, b LONG) USING $format LOCATION '$tableDir'")
    --- End diff --
    
    not related to this PR, but it makes me think that case-sensitivity should be a global or at least table level config, otherwise the behavior is a little confusing. cc @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by yucai <gi...@git.apache.org>.
Github user yucai commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    I trigged 3 hours ago, but see many Jenkins submission is in the queue.
    And it says "Jenkins is about to shut down" ?
    
    ![image](https://user-images.githubusercontent.com/2989575/44340714-44cca480-a4b8-11e8-8c2d-aa432a9516ca.png)



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22148: [SPARK-25132][SQL] Case-insensitive field resolut...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22148#discussion_r211783541
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
    @@ -430,6 +430,48 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext with Befo
           }
         }
       }
    +
    +  test(s"SPARK-25132: case-insensitive field resolution when reading from Parquet") {
    +    withTempDir { dir =>
    +      val format = "parquet"
    +      val tableDir = dir.getCanonicalPath + s"/$format"
    +      val tableName = s"spark_25132_${format}"
    +      withTable(tableName) {
    +        val end = 5
    +        val data = spark.range(end).selectExpr("id as A", "id * 2 as b", "id * 3 as B")
    +        withSQLConf(SQLConf.CASE_SENSITIVE.key -> "true") {
    +          data.write.format(format).mode("overwrite").save(tableDir)
    +        }
    +        sql(s"CREATE TABLE $tableName (a LONG, b LONG) USING $format LOCATION '$tableDir'")
    --- End diff --
    
    table-level conf is reasonable. Let us do it in 3.0?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94943/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22148: [SPARK-25132][SQL] Case-insensitive field resolut...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22148#discussion_r211136238
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala ---
    @@ -277,14 +291,38 @@ private[parquet] object ParquetReadSupport {
        * @return A list of clipped [[GroupType]] fields, which can be empty.
        */
       private def clipParquetGroupFields(
    -      parquetRecord: GroupType, structType: StructType): Seq[Type] = {
    -    val parquetFieldMap = parquetRecord.getFields.asScala.map(f => f.getName -> f).toMap
    +      parquetRecord: GroupType, structType: StructType, caseSensitive: Boolean): Seq[Type] = {
         val toParquet = new SparkToParquetSchemaConverter(writeLegacyParquetFormat = false)
    -    structType.map { f =>
    -      parquetFieldMap
    -        .get(f.name)
    -        .map(clipParquetType(_, f.dataType))
    -        .getOrElse(toParquet.convertField(f))
    +    if (caseSensitive) {
    +      val caseSensitiveParquetFieldMap =
    +        parquetRecord.getFields.asScala.map(f => f.getName -> f).toMap
    +      structType.map { f => {
    +        caseSensitiveParquetFieldMap
    +          .get(f.name)
    +          .map(clipParquetType(_, f.dataType, caseSensitive))
    +          .getOrElse(toParquet.convertField(f))
    +      }
    --- End diff --
    
    nit: I would remove this brace per https://github.com/databricks/scala-style-guide#anonymous-methods


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22148: [SPARK-25132][SQL] Case-insensitive field resolut...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22148#discussion_r211149160
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala ---
    @@ -277,14 +291,35 @@ private[parquet] object ParquetReadSupport {
        * @return A list of clipped [[GroupType]] fields, which can be empty.
        */
       private def clipParquetGroupFields(
    -      parquetRecord: GroupType, structType: StructType): Seq[Type] = {
    -    val parquetFieldMap = parquetRecord.getFields.asScala.map(f => f.getName -> f).toMap
    +      parquetRecord: GroupType, structType: StructType, caseSensitive: Boolean): Seq[Type] = {
         val toParquet = new SparkToParquetSchemaConverter(writeLegacyParquetFormat = false)
    -    structType.map { f =>
    -      parquetFieldMap
    -        .get(f.name)
    -        .map(clipParquetType(_, f.dataType))
    -        .getOrElse(toParquet.convertField(f))
    +    if (caseSensitive) {
    +      val caseSensitiveParquetFieldMap =
    +        parquetRecord.getFields.asScala.map(f => f.getName -> f).toMap
    +      structType.map { f =>
    +        caseSensitiveParquetFieldMap
    +          .get(f.name)
    +          .map(clipParquetType(_, f.dataType, caseSensitive))
    +          .getOrElse(toParquet.convertField(f))
    +      }
    +    } else {
    +      // Do case-insensitive resolution only if in case-insensitive mode
    +      val caseInsensitiveParquetFieldMap =
    +        parquetRecord.getFields.asScala.groupBy(_.getName.toLowerCase)
    +      structType.map { f =>
    +        caseInsensitiveParquetFieldMap
    +          .get(f.name.toLowerCase)
    +          .map { parquetTypes =>
    +            if (parquetTypes.size > 1) {
    +              // Need to fail if there is ambiguity, i.e. more than one field is matched
    +              val parquetTypesString = parquetTypes.map(_.getName).mkString("[", ", ", "]")
    +              throw new AnalysisException(s"""Found duplicate field(s) "${f.name}": """ +
    --- End diff --
    
    This is trigger at runtime at executor side, we should probably use `RuntimeException` here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94945/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    **[Test build #94954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94954/testReport)** for PR 22148 at commit [`0176d29`](https://github.com/apache/spark/commit/0176d296bfa861ce71cc09e61be76e8bca761801).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    **[Test build #94943 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94943/testReport)** for PR 22148 at commit [`9261beb`](https://github.com/apache/spark/commit/9261bebf12ab5a563f21d03ad2e5ac4cc03dbd63).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    LGTM except a few minor comments


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22148: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22148
  
    @seancxmao Please submit a follow-up PR to document the behavior changes in the migration guide of Spark SQL?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org