You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/01 06:22:57 UTC

[GitHub] [arrow-rs] Cheappie opened a new pull request, #1634: expose row-group flush in public api

Cheappie opened a new pull request, #1634:
URL: https://github.com/apache/arrow-rs/pull/1634

   # Which issue does this PR close?
   
   Closes #1626 
   
   # Rationale for this change
    
   more power for the users to let them decide when to flush row group
   
   # Are there any user-facing changes?
   
   not much, just one more method with single line body


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #1634: expose row-group flush in public api

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #1634:
URL: https://github.com/apache/arrow-rs/pull/1634#issuecomment-1115131390

   Thanks again :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Cheappie commented on a diff in pull request #1634: expose row-group flush in public api

Posted by GitBox <gi...@apache.org>.
Cheappie commented on code in PR #1634:
URL: https://github.com/apache/arrow-rs/pull/1634#discussion_r862821744


##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -192,8 +196,8 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
 
     /// Close and finalize the underlying Parquet writer
     pub fn close(&mut self) -> Result<parquet_format::FileMetaData> {
-        self.flush_completed()?;
-        self.flush_row_group(self.buffered_rows)?;
+        self.flush_excess()?;

Review Comment:
   yep, It is redundant, I didn't want to remove that on my own because it could be intentional to have defensive approach when closing writer, but right now that seems unnecessary unless more options for write popup in ArrowWriter



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Cheappie commented on a diff in pull request #1634: expose row-group flush in public api

Posted by GitBox <gi...@apache.org>.
Cheappie commented on code in PR #1634:
URL: https://github.com/apache/arrow-rs/pull/1634#discussion_r862818489


##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
         }
 
         self.buffered_rows += batch.num_rows();
-
-        self.flush_completed()?;
+        self.flush_excess()?;
 
         Ok(())
     }
 
     /// Flushes buffered data until there are less than `max_row_group_size` rows buffered
-    fn flush_completed(&mut self) -> Result<()> {
+    fn flush_excess(&mut self) -> Result<()> {

Review Comment:
   sure thing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1634: expose row-group flush in public api

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #1634:
URL: https://github.com/apache/arrow-rs/pull/1634#discussion_r862770120


##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
         }
 
         self.buffered_rows += batch.num_rows();
-
-        self.flush_completed()?;
+        self.flush_excess()?;
 
         Ok(())
     }
 
     /// Flushes buffered data until there are less than `max_row_group_size` rows buffered
-    fn flush_completed(&mut self) -> Result<()> {
+    fn flush_excess(&mut self) -> Result<()> {
         while self.buffered_rows >= self.max_row_group_size {
-            self.flush_row_group(self.max_row_group_size)?;
+            self.flush_batch_into_new_row_group(self.max_row_group_size)?;
         }
         Ok(())
     }
 
+    /// Flushes `buffered_rows` from the buffer into a new row group

Review Comment:
   ```suggestion
       /// Flushes all buffered rows into a new row group
   ```



##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
         }
 
         self.buffered_rows += batch.num_rows();
-
-        self.flush_completed()?;
+        self.flush_excess()?;
 
         Ok(())
     }
 
     /// Flushes buffered data until there are less than `max_row_group_size` rows buffered
-    fn flush_completed(&mut self) -> Result<()> {
+    fn flush_excess(&mut self) -> Result<()> {
         while self.buffered_rows >= self.max_row_group_size {
-            self.flush_row_group(self.max_row_group_size)?;
+            self.flush_batch_into_new_row_group(self.max_row_group_size)?;
         }
         Ok(())
     }
 
+    /// Flushes `buffered_rows` from the buffer into a new row group
+    pub fn flush_row_group(&mut self) -> Result<()> {

Review Comment:
   ```suggestion
       pub fn flush(&mut self) -> Result<()> {
   ```
   



##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
         }
 
         self.buffered_rows += batch.num_rows();
-
-        self.flush_completed()?;
+        self.flush_excess()?;
 
         Ok(())
     }
 
     /// Flushes buffered data until there are less than `max_row_group_size` rows buffered
-    fn flush_completed(&mut self) -> Result<()> {
+    fn flush_excess(&mut self) -> Result<()> {

Review Comment:
   I personally preferred the previous name



##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -113,22 +113,26 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
         }
 
         self.buffered_rows += batch.num_rows();
-
-        self.flush_completed()?;
+        self.flush_excess()?;
 
         Ok(())
     }
 
     /// Flushes buffered data until there are less than `max_row_group_size` rows buffered
-    fn flush_completed(&mut self) -> Result<()> {
+    fn flush_excess(&mut self) -> Result<()> {
         while self.buffered_rows >= self.max_row_group_size {
-            self.flush_row_group(self.max_row_group_size)?;
+            self.flush_batch_into_new_row_group(self.max_row_group_size)?;
         }
         Ok(())
     }
 
+    /// Flushes `buffered_rows` from the buffer into a new row group
+    pub fn flush_row_group(&mut self) -> Result<()> {
+        self.flush_batch_into_new_row_group(self.buffered_rows)
+    }
+
     /// Flushes `num_rows` from the buffer into a new row group
-    fn flush_row_group(&mut self, num_rows: usize) -> Result<()> {
+    fn flush_batch_into_new_row_group(&mut self, num_rows: usize) -> Result<()> {

Review Comment:
   ```suggestion
       fn flush_rows(&mut self, num_rows: usize) -> Result<()> {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1634: expose row-group flush in public api

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #1634:
URL: https://github.com/apache/arrow-rs/pull/1634#discussion_r862772438


##########
parquet/src/arrow/arrow_writer.rs:
##########
@@ -192,8 +196,8 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
 
     /// Close and finalize the underlying Parquet writer
     pub fn close(&mut self) -> Result<parquet_format::FileMetaData> {
-        self.flush_completed()?;
-        self.flush_row_group(self.buffered_rows)?;
+        self.flush_excess()?;

Review Comment:
   I know this existed before, but I think it is actually redundant as `write` calls `flush_excess`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #1634: expose row-group flush in public api

Posted by GitBox <gi...@apache.org>.
tustvold merged PR #1634:
URL: https://github.com/apache/arrow-rs/pull/1634


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] codecov-commenter commented on pull request #1634: expose row-group flush in public api

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #1634:
URL: https://github.com/apache/arrow-rs/pull/1634#issuecomment-1115092371

   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1634?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1634](https://codecov.io/gh/apache/arrow-rs/pull/1634?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3e0dc99) into [master](https://codecov.io/gh/apache/arrow-rs/commit/7d00e3c3b124bf39ad214ad08d1a1f49a4465dab?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7d00e3c) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #1634   +/-   ##
   =======================================
     Coverage   83.02%   83.03%           
   =======================================
     Files         193      193           
     Lines       55577    55578    +1     
   =======================================
   + Hits        46145    46147    +2     
   + Misses       9432     9431    -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1634?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow-rs/pull/1634/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXJyb3dfd3JpdGVyLnJz) | `97.66% <100.00%> (+<0.01%)` | :arrow_up: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1634/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `86.68% <0.00%> (-0.12%)` | :arrow_down: |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1634/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.21% <0.00%> (ø)` | |
   | [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/1634/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `93.56% <0.00%> (+0.18%)` | :arrow_up: |
   | [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1634/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.80% <0.00%> (+0.39%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1634?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1634?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [7d00e3c...3e0dc99](https://codecov.io/gh/apache/arrow-rs/pull/1634?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org