You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/15 15:13:29 UTC

[GitHub] [arrow-datafusion] b41sh commented on pull request #719: Optimize min/max queries with table statistics

b41sh commented on pull request #719:
URL: https://github.com/apache/arrow-datafusion/pull/719#issuecomment-880777702


   > This looks really cool @b41sh -- thank you very much for the contribution. It is not all that often one gets a 600x speedup :)
   > 
   > The one thing I worry about / wonder about is "how do we ensure no one breaks this by accident as we refactor or change the code in the future"
   > 
   > Perhaps we could follow the model of https://github.com/apache/arrow-datafusion/blob/master/datafusion/tests/parquet_pruning.rs#L44 (or maybe just extend that test) by:
   > 
   > 1. Adding some statistics to the parquet scan about total row groups read  or rows read
   > 2. Run a query with min/max and validate that no actual row groups are read.
   > 
   > What do you think?
   
   hi, @alamb 
   Thanks for your review.
   I will add some tests for this case, I'm still working on it and will submit it later
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org