You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/30 20:59:23 UTC

[GitHub] [arrow-rs] Dandandan commented on pull request #512: Pre-compute parquet stats in arrow writer

Dandandan commented on pull request #512:
URL: https://github.com/apache/arrow-rs/pull/512#issuecomment-871723163


   Distinct count AFAIK is often not included for parquet stats as calculating it is expensive.
   
   The distinct count calculation in DataFusion is not really optimized yet (and quite high in memory usage), so not sure whether that's super useful for Arrow to use.
   
   Also for DataFusion it would need to be over multiple arrays whether maybe in arrow it can be for one array? I think it would be great to have some kernel that can be used by DataFusion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org