You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/02 11:32:44 UTC
[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1634: expose row-group flush in public api
tustvold commented on code in PR #1634:
URL: https://github.com/apache/arrow-rs/pull/1634#discussion_r862770120
##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
}
self.buffered_rows += batch.num_rows();
-
- self.flush_completed()?;
+ self.flush_excess()?;
Ok(())
}
/// Flushes buffered data until there are less than `max_row_group_size` rows buffered
- fn flush_completed(&mut self) -> Result<()> {
+ fn flush_excess(&mut self) -> Result<()> {
while self.buffered_rows >= self.max_row_group_size {
- self.flush_row_group(self.max_row_group_size)?;
+ self.flush_batch_into_new_row_group(self.max_row_group_size)?;
}
Ok(())
}
+ /// Flushes `buffered_rows` from the buffer into a new row group
Review Comment:
```suggestion
/// Flushes all buffered rows into a new row group
```
##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
}
self.buffered_rows += batch.num_rows();
-
- self.flush_completed()?;
+ self.flush_excess()?;
Ok(())
}
/// Flushes buffered data until there are less than `max_row_group_size` rows buffered
- fn flush_completed(&mut self) -> Result<()> {
+ fn flush_excess(&mut self) -> Result<()> {
while self.buffered_rows >= self.max_row_group_size {
- self.flush_row_group(self.max_row_group_size)?;
+ self.flush_batch_into_new_row_group(self.max_row_group_size)?;
}
Ok(())
}
+ /// Flushes `buffered_rows` from the buffer into a new row group
+ pub fn flush_row_group(&mut self) -> Result<()> {
Review Comment:
```suggestion
pub fn flush(&mut self) -> Result<()> {
```
##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
}
self.buffered_rows += batch.num_rows();
-
- self.flush_completed()?;
+ self.flush_excess()?;
Ok(())
}
/// Flushes buffered data until there are less than `max_row_group_size` rows buffered
- fn flush_completed(&mut self) -> Result<()> {
+ fn flush_excess(&mut self) -> Result<()> {
Review Comment:
I personally preferred the previous name
##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
}
self.buffered_rows += batch.num_rows();
-
- self.flush_completed()?;
+ self.flush_excess()?;
Ok(())
}
/// Flushes buffered data until there are less than `max_row_group_size` rows buffered
- fn flush_completed(&mut self) -> Result<()> {
+ fn flush_excess(&mut self) -> Result<()> {
while self.buffered_rows >= self.max_row_group_size {
- self.flush_row_group(self.max_row_group_size)?;
+ self.flush_batch_into_new_row_group(self.max_row_group_size)?;
}
Ok(())
}
+ /// Flushes `buffered_rows` from the buffer into a new row group
+ pub fn flush_row_group(&mut self) -> Result<()> {
+ self.flush_batch_into_new_row_group(self.buffered_rows)
+ }
+
/// Flushes `num_rows` from the buffer into a new row group
- fn flush_row_group(&mut self, num_rows: usize) -> Result<()> {
+ fn flush_batch_into_new_row_group(&mut self, num_rows: usize) -> Result<()> {
Review Comment:
```suggestion
fn flush_rows(&mut self, num_rows: usize) -> Result<()> {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org