You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/15 21:57:23 UTC

[GitHub] [arrow] nevi-me commented on a change in pull request #10063: ARROW-12411: [Rust] Add Builder interface for adding Arrays to RecordBatches

nevi-me commented on a change in pull request #10063:
URL: https://github.com/apache/arrow/pull/10063#discussion_r614416779



##########
File path: rust/arrow/src/record_batch.rs
##########
@@ -103,6 +103,56 @@ impl RecordBatch {
         RecordBatch { schema, columns }
     }
 
+    /// Creates a new [`RecordBatch`] with no columns
+    ///
+    /// TODO add an code example using `append`
+    pub fn new() -> Self {
+        Self {
+            schema: Arc::new(Schema::empty()),
+            columns: Vec::new(),
+        }
+    }
+
+    /// Appends the `field_array` array to this `RecordBatch` as a
+    /// field named `field_name`.
+    ///
+    /// TODO: code example
+    ///
+    /// TODO: on error, can we return `Self` in some meaningful way?
+    pub fn append(self, field_name: &str, field_values: ArrayRef) -> Result<Self> {
+        if let Some(col) = self.columns.get(0) {
+            if col.len() != field_values.len() {
+                return Err(ArrowError::InvalidArgumentError(
+                    format!("all columns in a record batch must have the same length. expected {}, field {} had {} ",
+                            col.len(), field_name, field_values.len())
+                ));
+            }
+        }
+
+        let Self {
+            schema,
+            mut columns,
+        } = self;
+
+        // modify the schema we have if possible, otherwise copy
+        let mut schema = match Arc::try_unwrap(schema) {
+            Ok(schema) => schema,
+            Err(shared_schema) => shared_schema.as_ref().clone(),
+        };
+
+        let nullable = field_values.null_count() > 0;

Review comment:
       There's a limitation here. If the purpose is to create a single record batch, and that batch is used alone through its lifetime, then this is fine; otherwise we might need to take a `nullable` parameter.
   
   In any case, I think that if someone uses this to create individual record batches, it'd be inefficient.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org