You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/31 07:29:53 UTC

[GitHub] [arrow-rs] crepererum commented on a change in pull request #381: Respect max rowgroup size in Arrow writer

crepererum commented on a change in pull request #381:
URL: https://github.com/apache/arrow-rs/pull/381#discussion_r642270850



##########
File path: parquet/src/arrow/arrow_writer.rs
##########
@@ -87,17 +92,31 @@ impl<W: 'static + ParquetWriter> ArrowWriter<W> {
                 "Record batch schema does not match writer schema".to_string(),
             ));
         }
-        // compute the definition and repetition levels of the batch
-        let batch_level = LevelInfo::new_from_batch(batch);
-        let mut row_group_writer = self.writer.next_row_group()?;
-        for (array, field) in batch.columns().iter().zip(batch.schema().fields()) {
-            let mut levels = batch_level.calculate_array_levels(array, field);
-            // Reverse levels as we pop() them when writing arrays
-            levels.reverse();
-            write_leaves(&mut row_group_writer, array, &mut levels)?;
+        // Track the number of rows being written in the batch.
+        // We currently do not have a way of slicing nested arrays, thus we
+        // track this manually.
+        let num_rows = batch.num_rows();
+        let batches = (num_rows + self.max_row_group_size - 1) / self.max_row_group_size;

Review comment:
       Do we ensure that `max_row_group_size > 0`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org