You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/02 11:32:44 UTC

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1634: expose row-group flush in public api

tustvold commented on code in PR #1634:
URL: https://github.com/apache/arrow-rs/pull/1634#discussion_r862770120


##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
         }
 
         self.buffered_rows += batch.num_rows();
-
-        self.flush_completed()?;
+        self.flush_excess()?;
 
         Ok(())
     }
 
     /// Flushes buffered data until there are less than `max_row_group_size` rows buffered
-    fn flush_completed(&mut self) -> Result<()> {
+    fn flush_excess(&mut self) -> Result<()> {
         while self.buffered_rows >= self.max_row_group_size {
-            self.flush_row_group(self.max_row_group_size)?;
+            self.flush_batch_into_new_row_group(self.max_row_group_size)?;
         }
         Ok(())
     }
 
+    /// Flushes `buffered_rows` from the buffer into a new row group

Review Comment:
   ```suggestion
       /// Flushes all buffered rows into a new row group
   ```



##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
         }
 
         self.buffered_rows += batch.num_rows();
-
-        self.flush_completed()?;
+        self.flush_excess()?;
 
         Ok(())
     }
 
     /// Flushes buffered data until there are less than `max_row_group_size` rows buffered
-    fn flush_completed(&mut self) -> Result<()> {
+    fn flush_excess(&mut self) -> Result<()> {
         while self.buffered_rows >= self.max_row_group_size {
-            self.flush_row_group(self.max_row_group_size)?;
+            self.flush_batch_into_new_row_group(self.max_row_group_size)?;
         }
         Ok(())
     }
 
+    /// Flushes `buffered_rows` from the buffer into a new row group
+    pub fn flush_row_group(&mut self) -> Result<()> {

Review Comment:
   ```suggestion
       pub fn flush(&mut self) -> Result<()> {
   ```
   



##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
         }
 
         self.buffered_rows += batch.num_rows();
-
-        self.flush_completed()?;
+        self.flush_excess()?;
 
         Ok(())
     }
 
     /// Flushes buffered data until there are less than `max_row_group_size` rows buffered
-    fn flush_completed(&mut self) -> Result<()> {
+    fn flush_excess(&mut self) -> Result<()> {

Review Comment:
   I personally preferred the previous name



##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
         }
 
         self.buffered_rows += batch.num_rows();
-
-        self.flush_completed()?;
+        self.flush_excess()?;
 
         Ok(())
     }
 
     /// Flushes buffered data until there are less than `max_row_group_size` rows buffered
-    fn flush_completed(&mut self) -> Result<()> {
+    fn flush_excess(&mut self) -> Result<()> {
         while self.buffered_rows >= self.max_row_group_size {
-            self.flush_row_group(self.max_row_group_size)?;
+            self.flush_batch_into_new_row_group(self.max_row_group_size)?;
         }
         Ok(())
     }
 
+    /// Flushes `buffered_rows` from the buffer into a new row group
+    pub fn flush_row_group(&mut self) -> Result<()> {
+        self.flush_batch_into_new_row_group(self.buffered_rows)
+    }
+
     /// Flushes `num_rows` from the buffer into a new row group
-    fn flush_row_group(&mut self, num_rows: usize) -> Result<()> {
+    fn flush_batch_into_new_row_group(&mut self, num_rows: usize) -> Result<()> {

Review Comment:
   ```suggestion
       fn flush_rows(&mut self, num_rows: usize) -> Result<()> {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org