You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2018/12/26 19:07:08 UTC

[GitHub] cloud-fan commented on a change in pull request #23387: [SPARK-26447][SQL]Allow OrcColumnarBatchReader to return less partition columns

cloud-fan commented on a change in pull request #23387: [SPARK-26447][SQL]Allow OrcColumnarBatchReader to return less partition columns
URL: https://github.com/apache/spark/pull/23387#discussion_r244036202

##########
File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java
##########
@@ -58,10 +58,16 @@

/**
* The column IDs of the physical ORC file schema which are required by this reader.
- * -1 means this required column doesn't exist in the ORC file.
+ * -1 means this required column is partition column, or it doesn't exist in the ORC file.

Review comment:
I think we need more comments here.

Ideally partition column should never appear in the physical file, and should only appear in the directory name. However, Spark is OK with partition columns inside physical file, but Spark will discard the values from the file, and use the partition value got from directory name.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org