You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/06 10:42:23 UTC

[GitHub] [arrow-rs] tustvold commented on pull request #1774: Access metadata of flushed row groups on write (#1691)

tustvold commented on PR #1774:
URL: https://github.com/apache/arrow-rs/pull/1774#issuecomment-1147310698

   > Maybe this interface would help me simplify things
   
   It would theoretically allow for more complex heuristics, but ultimately it is just a different way to access the data returned by row_group_writer.close()
   
   > Or is there a way to be more "precise" in the resulting file size
   
   Not easily, it is very hard to predict the encoded size accurately, especially with block compression. The way this is handled internally for pages is writing in small batches, and tracking the size as they go. This is much the same as your approach for row groups.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org