You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/12 16:27:26 UTC

[GitHub] [arrow-rs] alamb commented on a diff in pull request #1552: Create RecordBatch With Non-Zero Row Count But No Columns (#1536)

alamb commented on code in PR #1552:
URL: https://github.com/apache/arrow-rs/pull/1552#discussion_r848632777


##########
arrow/src/record_batch.rs:
##########
@@ -41,6 +41,7 @@ use crate::error::{ArrowError, Result};
 pub struct RecordBatch {
     schema: SchemaRef,
     columns: Vec<Arc<dyn Array>>,
+    row_count: usize,

Review Comment:
   maybe leave a comment here about why `row_count` is added?



##########
arrow/src/record_batch.rs:
##########
@@ -128,7 +125,15 @@ impl RecordBatch {
         }
 
         // check that all columns have the same row count
-        let row_count = columns[0].data().len();
+        let row_count = options
+            .row_count
+            .or_else(|| columns.first().map(|col| col.len()))
+            .ok_or_else(|| {
+                ArrowError::InvalidArgumentError(
+                    "must either specify a row count or at least one column".to_string(),
+                )
+            })?;
+
         if columns.iter().any(|c| c.len() != row_count) {
             return Err(ArrowError::InvalidArgumentError(
                 "all columns in a record batch must have the same length".to_string(),

Review Comment:
   This error will now also happen when the specified row count doesn't match the array counts (so the wording might be good to update?)



##########
arrow/src/record_batch.rs:
##########
@@ -243,7 +248,7 @@ impl RecordBatch {
     /// # }
     /// ```
     pub fn num_rows(&self) -> usize {
-        self.columns[0].data().len()
+        self.row_count

Review Comment:
   Maybe we could avoid adding `row_count` with something like
   
   ```rust
   self
     .columns
     .get(0)
     .map(|col| col.data.len())
     .unwrap_or(0)
   ```
   ?



##########
arrow/src/record_batch.rs:
##########
@@ -402,15 +402,20 @@ impl RecordBatch {
 
 /// Options that control the behaviour used when creating a [`RecordBatch`].
 #[derive(Debug)]
+#[non_exhaustive]

Review Comment:
   Or maybe it is time to make these fields non `pub` and add `with_match_field_name(&mut self, val: bool)` type accessors?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org