You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/04/01 10:27:36 UTC

[GitHub] [arrow-rs] alamb commented on a diff in pull request #3953: Don't discard nulls converting StructArray to RecordBatch (#3951)

alamb commented on code in PR #3953:
URL: https://github.com/apache/arrow-rs/pull/3953#discussion_r1155090672


##########
arrow-array/src/array/struct_array.rs:
##########
@@ -24,6 +24,28 @@ use std::{any::Any, ops::Index};
 
 /// A nested array type where each child (called *field*) is represented by a separate
 /// array.
+///
+///
+/// # Comparison with [RecordBatch]
+///
+/// Both [`RecordBatch`] and [`StructArray`] represent a collection of columns / arrays with the
+/// same length.
+///
+/// However, there are a couple of key differences:
+///
+/// * [`StructArray`] can be nested within other [`Array`], including itself
+/// * [`RecordBatch`] can contain top-level metadata on its associated [`Schema`][arrow_schema::Schema]
+/// * [`StructArray`] can contain top-level nulls, i.e. `null`
+/// * [`RecordBatch`] can only represent nulls in its child columns, i.e. `{"field": null}`
+///
+/// [`StructArray`] is therefore a more general data container than [`RecordBatch`], and as such
+/// code that needs to handle both will typically share an implementation in terms of
+/// [`StructArray`] and convert to/from [`RecordBatch`] as necessary.
+///
+/// [`From`] implementations are provided to facilitate this conversion, however, converting

Review Comment:
   👍 thank you -- this is much clearer now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org