You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/23 16:54:28 UTC

[GitHub] [arrow-rs] nevi-me commented on issue #1474: Replace Custom Buffer Implementation with Bytes in Parquet

nevi-me commented on issue #1474:
URL: https://github.com/apache/arrow-rs/issues/1474#issuecomment-1076549007


   If possible, we could use Arrow's buffer based on the `arrow` feature, then use some abstraction (I'd be fine with `bytes`) for the other cases. The perf cliff is whenever we create multiple small `ByteBuffer` instances (e.g. representing vec!["hello", "there"]` as 2 instances instead of a single `ByteBuffer` with offsets into the 2 values. I think having a single buffer per page/row group would be helpful.
   
   The upside of using Arrow's buffer is minimising/eliminating data copies. I was able to improve the Arrow side here (https://github.com/apache/arrow-rs/pull/820), and see @alamb's comment (https://github.com/apache/arrow-rs/pull/820#discussion_r724365907).
   
   > I have a long-term hope to eventually phase out MutableBuffer and replace it with a typed construction that is easier to use without unsafe. Something with a similar interface to the ScalarBuffer I added to parquet might be a candidate
   
   This would be great, as it seems that a lot of the safety (and some perf) issues lie there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org