You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/13 16:41:30 UTC

[GitHub] [spark] timarmstrong edited a comment on pull request #33639: [SPARK-36645][SQL] Aggregate (Min/Max/Count) push down for Parquet

timarmstrong edited a comment on pull request #33639:
URL: https://github.com/apache/spark/pull/33639#issuecomment-942489610


   > If the aggregate column is on partition column, only Count will be pushed, Min or Max will not be pushed down because Parquet doesn't return max/min for partition column.
   
   In the traditional Hive table layout partition columns are not stored in the files at all and the reader needs to materialise the partition column values via a different mechanism (e.g. the partition column value is included in plan metadata somewhere). I don't know the spark readers well but I think the spark parquet reader must have access to the partition values.
   
   I.e. so the min/max could be materialised from the partition column too, it just needs to use a different mechanism.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org