You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/02 07:20:41 UTC

[GitHub] [arrow-rs] crepererum commented on pull request #512: Pre-compute parquet stats in arrow writer

crepererum commented on pull request #512:
URL: https://github.com/apache/arrow-rs/pull/512#issuecomment-872779411


   For the distinct count, but also in general for the stats: what's kinda unfortunate is that in IOx, we have most of the information available for the record batches prior to writing them to parquet. For the min/max values and null counts I think it's OK to recompute them, but for the distinct count it seems a bit of a waste.
   
   So I would like through some future PR (which I can contribute) have the ability to pass through pre-calculated stats.
   
   Furthermore, the "pass through pre-computed stats" might also be a good point to find some arrow-type-level representation of the stats, because if you wanna currently want consume the stats from parquet, you have to do the scalar physical=>logical type conversion yourself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org