You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2021/04/26 12:52:00 UTC

[jira] [Closed] (ARROW-12411) [Rust] Add Builder interface for adding Arrays to record batches

     [ https://issues.apache.org/jira/browse/ARROW-12411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Lamb closed ARROW-12411.
-------------------------------
    Resolution: Invalid

> [Rust] Add Builder interface for adding Arrays to record batches
> ----------------------------------------------------------------
>
>                 Key: ARROW-12411
>                 URL: https://issues.apache.org/jira/browse/ARROW-12411
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust
>            Reporter: Andrew Lamb
>            Assignee: Andrew Lamb
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Use case:
> While writing tests (both in IOx and in DataFusion) where I need a single `RecordBatch`, I often find myself doing something like this:
> ```
>         let schema = Arc::new(Schema::new(vec![
>             ArrowField::new("float_field", ArrowDataType::Float64, true),
>             ArrowField::new("time", ArrowDataType::Int64, true),
>         ]));
>         let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1]));
>         let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000]));
>         let batch = RecordBatch::try_new(schema, vec![float_array, timestamp_array])
>             .expect("created new record batch");
> ```
> This is annoying because the information that `float_field` is a float is encoded both in the Schema and the `Float64Array`
> I would much rather rather be able to construct RecordBatches a a builder style to avoid the the redundancy and reduce the amount of typing / redundancy:
> ```
>         let float_array: ArrayRef = Arc::new(Float64Array::from(vec![10.1, 20.1, 30.1, 40.1]));
>         let timestamp_array: ArrayRef = Arc::new(Int64Array::from(vec![1000, 2000, 3000, 4000]));
>         let batch = RecordBatch::empty()
>           .append("float_field", timestamp_array).unwrap()
>           .append("time", float_array).unwrap;
> ```
> The proposal is to add a method to `RecordBatch` like
> ```
> impl RecordBatch {
> ...
>   fn append(self, field_name: &str, field_values: ArrayRef) -> Result<Self>
> }
> ```
> That would append the a field name to the current schema, returning an error if field_name was already present.
> The nullability of the field would be set based on the actual null count of the field_values



--
This message was sent by Atlassian Jira
(v8.3.4#803005)