You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Qingyou Meng (Jira)" <ji...@apache.org> on 2021/01/16 02:14:00 UTC

[jira] [Updated] (ARROW-11263) [Rust] problem of Field nullable

     [ https://issues.apache.org/jira/browse/ARROW-11263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Qingyou Meng updated ARROW-11263:
---------------------------------
    Description: 
Quoting from section *Schema message*

 [https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#schema-message]
{noformat}
Whether the field is semantically nullable. While this has no bearing on the 
array's physical layout, many systems distinguish nullable and non-nullable 
fields and we want to allow them to preserve this metadata to enable faithful 
schema round trips.{noformat}
This can be read as: for a field with nullable set as true, when encounters null array data from the field, data processor CAN continue or refuse to process.

In current rust implementation, apart from read Fields from schema, we also construct `Field` with datafusion and`Field::new`in arrow::array::*StructArray*.
 * in datafusion, the nullable is determined by DF schema
 * in arrow::array::StructArray::try_from(values: Vec<(&str, ArrayRef)>) , the nullable is determined actual data. This is error-prone if ArrayRef's null buffer are all 1s (built by builder). The following test shows a bug:

{noformat}
    #[test]
    fn test_struct_bug() {
        let ints: ArrayRef = Arc::new(Int32Array::from(vec![
            Some(1),
            Some(2),
            Some(3),
        ]));
        let array = StructArray::try_from(vec![("f1", ints.clone())])            .unwrap()
            .data();
        let arrays = vec![array.as_ref()];
        let mut mutable = MutableArrayData::new(arrays, false, 0);
        mutable.extend(0, 1, 3);
        let data = mutable.freeze();
        let array = StructArray::from(Arc::new(data));
        let expected = StructArray::try_from(vec![
            ("f1", ints.slice(1, 2)),
        ])
        .unwrap();
        assert_eq!(array, expected);
    }{noformat}
Conclusions:
 * It's questionable to set Field's nullable according to data.
 * Perhaps builders should set null buffer back to None when the buffer has all bits set.
 * StructArray::
 TryFrom<Vec<(&str, ArrayRef)>> sets wrong nullable when null buffer is Some with all bits set.

 

  was:
Quoting from section *Schema message*

 [https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#schema-message]

 
{noformat}
Whether the field is semantically nullable. While this has no bearing on the array's physical layout, many systems distinguish nullable and non-nullable fields and we want to allow them to preserve this metadata to enable faithful schema round trips.{noformat}
This can be read as: for a field with nullable set as true, when encounters null array data from the field, data processor CAN continue or refuse to process.

In current rust implementation, apart from read Fields from schema, we also construct `Field` with datafusion and`Field::new`in arrow::array::*StructArray*.
 * in datafusion, the nullable is determined by DF schema
 * in arrow::array::StructArray::

try_from(values: Vec<(&str, ArrayRef)>) , the nullable is determined actual data. This is error-prone if ArrayRef's null buffer are all 1s (built by builder). The following test shows a bug:
 
{noformat}
    #[test]
    fn test_struct_bug() {
        let ints: ArrayRef = Arc::new(Int32Array::from(vec![
            Some(1),
            Some(2),
            Some(3),
        ]));
        let array = StructArray::try_from(vec![("f1", ints.clone())])            .unwrap()
            .data();
        let arrays = vec![array.as_ref()];
        let mut mutable = MutableArrayData::new(arrays, false, 0);
        mutable.extend(0, 1, 3);
        let data = mutable.freeze();
        let array = StructArray::from(Arc::new(data));
        let expected = StructArray::try_from(vec![
            ("f1", ints.slice(1, 2)),
        ])
        .unwrap();
        assert_eq!(array, expected);
    }{noformat}
Conclusions:
 * It's questionable to set Field's nullable according to data.
 * Perhaps builders should set null buffer back to None when the buffer has all bits set.
 * StructArray::
TryFrom<Vec<(&str, ArrayRef)>> sets wrong nullable when null buffer is Some with all bits set.

 


> [Rust] problem of Field nullable
> --------------------------------
>
>                 Key: ARROW-11263
>                 URL: https://issues.apache.org/jira/browse/ARROW-11263
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Qingyou Meng
>            Priority: Major
>
> Quoting from section *Schema message*
>  [https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#schema-message]
> {noformat}
> Whether the field is semantically nullable. While this has no bearing on the 
> array's physical layout, many systems distinguish nullable and non-nullable 
> fields and we want to allow them to preserve this metadata to enable faithful 
> schema round trips.{noformat}
> This can be read as: for a field with nullable set as true, when encounters null array data from the field, data processor CAN continue or refuse to process.
> In current rust implementation, apart from read Fields from schema, we also construct `Field` with datafusion and`Field::new`in arrow::array::*StructArray*.
>  * in datafusion, the nullable is determined by DF schema
>  * in arrow::array::StructArray::try_from(values: Vec<(&str, ArrayRef)>) , the nullable is determined actual data. This is error-prone if ArrayRef's null buffer are all 1s (built by builder). The following test shows a bug:
> {noformat}
>     #[test]
>     fn test_struct_bug() {
>         let ints: ArrayRef = Arc::new(Int32Array::from(vec![
>             Some(1),
>             Some(2),
>             Some(3),
>         ]));
>         let array = StructArray::try_from(vec![("f1", ints.clone())])            .unwrap()
>             .data();
>         let arrays = vec![array.as_ref()];
>         let mut mutable = MutableArrayData::new(arrays, false, 0);
>         mutable.extend(0, 1, 3);
>         let data = mutable.freeze();
>         let array = StructArray::from(Arc::new(data));
>         let expected = StructArray::try_from(vec![
>             ("f1", ints.slice(1, 2)),
>         ])
>         .unwrap();
>         assert_eq!(array, expected);
>     }{noformat}
> Conclusions:
>  * It's questionable to set Field's nullable according to data.
>  * Perhaps builders should set null buffer back to None when the buffer has all bits set.
>  * StructArray::
>  TryFrom<Vec<(&str, ArrayRef)>> sets wrong nullable when null buffer is Some with all bits set.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)