You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/27 18:49:05 UTC

[GitHub] bersprockets opened a new pull request #23909: [SPARK-26990][SQL][BACKPORT-2.4] FileIndex: use user specified field names if possible

bersprockets opened a new pull request #23909: [SPARK-26990][SQL][BACKPORT-2.4] FileIndex: use user specified field names if possible
URL: https://github.com/apache/spark/pull/23909
 
 
   ## What changes were proposed in this pull request?
   
   Back-port of #23894 to branch-2.4.
   
   WIth the following file structure:
   ```
   /tmp/data
   └── a=5
   ```
   
   In the previous release:
   ```
   scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
   root
    |-- ID: long (nullable = true)
    |-- A: integer (nullable = true)
   ```
   
   While in current code:
   ```
   scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
   root
    |-- ID: long (nullable = true)
    |-- a: integer (nullable = true)
   ```
   
   We can see that the partition column name `a` is different from `A` as user specifed. This PR is to fix the case and make it more user-friendly.
   
   
   
   Closes #23894 from gengliangwang/fileIndexSchema.
   
   Authored-by: Gengliang Wang <ge...@databricks.com>
   Signed-off-by: Wenchen Fan <we...@databricks.com>
   
   ## How was this patch tested?
   
   Unit test
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org