You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Qingyou Meng (Jira)" <ji...@apache.org> on 2021/01/16 02:30:00 UTC

[jira] [Closed] (ARROW-11263) [Rust] problem of Field nullable

     [ https://issues.apache.org/jira/browse/ARROW-11263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Qingyou Meng closed ARROW-11263.
--------------------------------
    Resolution: Incomplete

> [Rust] problem of Field nullable
> --------------------------------
>
>                 Key: ARROW-11263
>                 URL: https://issues.apache.org/jira/browse/ARROW-11263
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Qingyou Meng
>            Priority: Major
>
> Quoting from section *Schema message*
>  [https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#schema-message]
> {noformat}
> Whether the field is semantically nullable. While this has no bearing on the 
> array's physical layout, many systems distinguish nullable and non-nullable 
> fields and we want to allow them to preserve this metadata to enable faithful 
> schema round trips.{noformat}
> This can be read as: for a field with nullable set as true, when encounters null array data from the field, data processor CAN continue or refuse to process.
> In current rust implementation, apart from read Fields from schema, we also construct `Field` with datafusion and`Field::new`in arrow::array::*StructArray*.
>  * in datafusion, the nullable is determined by DF schema
>  * in arrow::array::StructArray::try_from(values: Vec<(&str, ArrayRef)>) , the nullable is determined actual data. This is error-prone if ArrayRef's null buffer are all 1s (built by builder). The following test shows a bug:
> {noformat}
>     #[test]
>     fn test_struct_bug() {
>         let ints: ArrayRef = Arc::new(Int32Array::from(vec![
>             Some(1),
>             Some(2),
>             Some(3),
>         ]));
>         let array = StructArray::try_from(vec![("f1", ints.clone())])            .unwrap()
>             .data();
>         let arrays = vec![array.as_ref()];
>         let mut mutable = MutableArrayData::new(arrays, false, 0);
>         mutable.extend(0, 1, 3);
>         let data = mutable.freeze();
>         let array = StructArray::from(Arc::new(data));
>         let expected = StructArray::try_from(vec![
>             ("f1", ints.slice(1, 2)),
>         ])
>         .unwrap();
>         assert_eq!(array, expected);
>     }{noformat}
> Conclusions:
>  * It's questionable to set Field's nullable according to data.
>  * Perhaps builders should set null buffer back to None when the buffer has all bits set.
>  * StructArray::
> try_from(values: Vec<(&str, ArrayRef)>) incorrectly sets nullable when null buffer is Some with all bits set.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)