You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/28 19:37:22 UTC

[GitHub] [arrow-datafusion] alamb commented on issue #4003: Representing statistics in logical or physical plans (or in both of them)

alamb commented on issue #4003:
URL: https://github.com/apache/arrow-datafusion/issues/4003#issuecomment-1295381695

   Thank you for this discussion @isidentical - the summary is quite nice
   
   There are some cases where the physical plan will have better information available to it (e.g. it may have read the parquet metadata header and have much more accurate statistics than just the file names) than the logical plan. 
   
   Thus I think having statistics available for both logical and physical planning makes sense (aka option 3) -- that way DataFusion can take best advantage of what information it has
   
   My preferred solution is to keep statistics in both places and then keep the code that operates on them (expression range analysis code (e.g #3912), etc)   in the physical exprs (as it is very tightly tied to how each expression is evaluated (nulls, etc)).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org