You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Qingyou Meng (Jira)" <ji...@apache.org> on 2021/01/16 02:13:00 UTC
[jira] [Created] (ARROW-11263) [Rust] problem of Field nullable
Qingyou Meng created ARROW-11263:
------------------------------------
Summary: [Rust] problem of Field nullable
Key: ARROW-11263
URL: https://issues.apache.org/jira/browse/ARROW-11263
Project: Apache Arrow
Issue Type: Bug
Reporter: Qingyou Meng
Quoting from section *Schema message*
[https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#schema-message]
{noformat}
Whether the field is semantically nullable. While this has no bearing on the array's physical layout, many systems distinguish nullable and non-nullable fields and we want to allow them to preserve this metadata to enable faithful schema round trips.{noformat}
This can be read as: for a field with nullable set as true, when encounters null array data from the field, data processor CAN continue or refuse to process.
In current rust implementation, apart from read Fields from schema, we also construct `Field` with datafusion and`Field::new`in arrow::array::*StructArray*.
* in datafusion, the nullable is determined by DF schema
* in arrow::array::StructArray::
try_from(values: Vec<(&str, ArrayRef)>) , the nullable is determined actual data. This is error-prone if ArrayRef's null buffer are all 1s (built by builder). The following test shows a bug:
{noformat}
#[test]
fn test_struct_bug() {
let ints: ArrayRef = Arc::new(Int32Array::from(vec![
Some(1),
Some(2),
Some(3),
]));
let array = StructArray::try_from(vec![("f1", ints.clone())]) .unwrap()
.data();
let arrays = vec![array.as_ref()];
let mut mutable = MutableArrayData::new(arrays, false, 0);
mutable.extend(0, 1, 3);
let data = mutable.freeze();
let array = StructArray::from(Arc::new(data));
let expected = StructArray::try_from(vec![
("f1", ints.slice(1, 2)),
])
.unwrap();
assert_eq!(array, expected);
}{noformat}
Conclusions:
* It's questionable to set Field's nullable according to data.
* Perhaps builders should set null buffer back to None when the buffer has all bits set.
* StructArray::
TryFrom<Vec<(&str, ArrayRef)>> sets wrong nullable when null buffer is Some with all bits set.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)