You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "ShiKaiWi (via GitHub)" <gi...@apache.org> on 2023/03/28 11:48:18 UTC

[GitHub] [arrow-rs] ShiKaiWi commented on a diff in pull request #3967: Async writer tweaks

ShiKaiWi commented on code in PR #3967:
URL: https://github.com/apache/arrow-rs/pull/3967#discussion_r1150469408


##########
parquet/src/arrow/async_writer/mod.rs:
##########
@@ -88,22 +85,25 @@ pub struct AsyncArrowWriter<W> {
 impl<W: AsyncWrite + Unpin + Send> AsyncArrowWriter<W> {
     /// Try to create a new Async Arrow Writer.
     ///
-    /// `buffer_flush_threshold` will be used to trigger flush of the inner buffer.
+    /// `buffer_size` determines the size of the intermediate buffer
+    ///
+    /// Flush will automatically be called by [`Self::write`] if
+    /// the buffer is at least half full
     pub fn try_new(
         writer: W,
         arrow_schema: SchemaRef,
-        buffer_flush_threshold: usize,
+        buffer_size: usize,
         props: Option<WriterProperties>,
     ) -> Result<Self> {
-        let shared_buffer = SharedBuffer::default();
+        let shared_buffer = SharedBuffer::new(buffer_size);

Review Comment:
   Actually, in the #3957, `buffer_flush_threshold` is designed to be able to be `usize::MAX` in order to let the async writer not flush until all the encoding work is done. And for this reason, the buffer can't be pre-allocated at initialization.
   
   And I think it is good here because of its efficiency, and it may be a fake feature to let the writer do flush only when all encoded bytes are ready. :satisfied:



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org