You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by seancxmao <gi...@git.apache.org> on 2018/08/19 16:30:57 UTC

[GitHub] spark pull request #22142: [SPARK-25132][SQL] case-insensitive field resolut...

GitHub user seancxmao opened a pull request:

    https://github.com/apache/spark/pull/22142

    [SPARK-25132][SQL] case-insensitive field resolution when reading from Parquet/ORC

    ## What changes were proposed in this pull request?
    Spark SQL returns NULL for a column whose Hive metastore schema and Parquet schema are in different letter cases, regardless of spark.sql.caseSensitive set to true or false. This applies not only to Parquet, but also to ORC. Following is a brief summary:
    * ParquetFileFormat doesn't support case-insensitive field resolution.
    * native OrcFileFormat supports case-insensitive field resolution, however it cannot handle duplicate fields.
    * hive OrcFileFormat doesn't support case-insensitive field resolution.
    
    https://github.com/apache/spark/pull/15799 reverted case-insensitive resolution for ParquetFileFormat and hive OrcFileFormat. This PR brings it back and improves it to do case-insensitive resolution only if Spark is in case-insensitive mode. And field resolution will fail if there is ambiguity, i.e. more than one field is matched. ParquetFileFormat, native OrcFileFormat and hive OrcFileFormat are all supported.
    
    ## How was this patch tested?
    Unit tests added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/seancxmao/spark SPARK-25132

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22142
    
----
commit 5c3d20b654609c86de9c24c9751ec34916f3aabd
Author: seancxmao <se...@...>
Date:   2018-08-17T10:06:28Z

    SPARK-25132: case-insensitive field resolution when reading from Parquet/ORC
    
    * Fix ParquetFileFormat
    * More than one Parquet column is matched
    * Fix OrcFileFormat (both native and hive implementations)
    * Fix issues according to review results: refactor test cases, code style, ...
    * Test cases: change paruqet/orc file schema from a to A
    * Test cases: let different columns have different value series
    * Refine error message
    * Split multi-format test suite
    * Simplify test cases for ambiguous resolution
    * Simplify test cases to reduce code lines
    * Refine tests and  comments

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22142: [SPARK-25132][SQL] case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22142
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22142: [SPARK-25132][SQL] case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22142
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22142: [SPARK-25132][SQL] Case-insensitive field resolution whe...

Posted by seancxmao <gi...@git.apache.org>.
Github user seancxmao commented on the issue:

    https://github.com/apache/spark/pull/22142
  
    Split this into 2 PRs, one for Parquet and ORC respectively.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22142: [SPARK-25132][SQL] Case-insensitive field resolut...

Posted by seancxmao <gi...@git.apache.org>.
Github user seancxmao closed the pull request at:

    https://github.com/apache/spark/pull/22142


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22142: [SPARK-25132][SQL] case-insensitive field resolution whe...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22142
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org