You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/01 20:53:14 UTC

[GitHub] [arrow-rs] tfeda opened a new issue, #1637: UnionBuilder produces incorrect Union DataType

tfeda opened a new issue, #1637:
URL: https://github.com/apache/arrow-rs/issues/1637

   **Describe the bug**
   The Union DataType produced by UnionBuilder has non-nullable children Fields after appending nulls in the builder.
   
   **To Reproduce**
   Steps to reproduce the behavior: Try the following code
   ```
   let mut builder = UnionBuilder::new_dense(4);
   builder.append::<Int32Type>("a", 1).unwrap();
   builder.append::<Float64Type>("b", 3.0).unwrap();
   builder.append_null::<Float64Type>("b").unwrap();
   builder.append_null::<Int32Type>("a").unwrap();
   let union = builder.build().unwrap();
   
   let schema = Schema::new(vec![
       Field::new(
           "Teamsters",
           DataType::Union(
               vec![
                   Field::new("a", DataType::Int32, true),
                   Field::new("b", DataType::Float64, true),
               ],
               UnionMode::Dense,
           ),
            false,
       ),
   ]); 
   
   let batch = RecordBatch::try_new(
       Arc::new(schema),
       vec![Arc::new(union)]
   ).unwrap();
   ```
   This code panics:
   
   InvalidArgumentError("column types must match schema types, expected 
   Union([
       Field {  name: \"a\", data_type: **Int32, nullable: true**, dict_id: 0, dict_is_ordered: false, metadata: None }, 
       Field { name: \"b\", data_type: **Float64, nullable: true**, dict_id: 0, dict_is_ordered: false, metadata: None }
        ], Dense
   ) but found Union([
       Field { name: \"a\", data_type: **Int32, nullable: false**, dict_id: 0, dict_is_ordered: false, metadata: None }, 
       Field { name: \"b\", data_type: **Float64, nullable: false**, dict_id: 0, dict_is_ordered: false, metadata: None }
       ], Dense) 
   at column index 0")
   
   **Expected behavior**
   
   **Depending on the interpretation of the specification, one of 2 things should happen:**
   *A `Union`'s children `Field`s should inherit its nullabillity (i.e. always be false):*  Then I think this should error when executing `Field::new()` with a bad `DataType`.
   
   *A child should be nullable if it is capable of returning None to the parent when `unionArray.value(index)` is called*: This code should run just fine then.
   
   **Additional context**
   I ran into this when working on #1594. I think it's a simple fix: track the nullablility of the `UnionBuilder` fields rather than always hardcode the child `Field`s nullability to be false. That being said, I'm not sure if that's the correct understanding of the specification. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1637: UnionBuilder produces incorrect Union DataType

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1637:
URL: https://github.com/apache/arrow-rs/issues/1637#issuecomment-1116585759

   I think automatically determining the nullability of the field based on if that child contains nulls makes a lot of sense, however, it could cause schema-volatility depending on if the written data happens to contain nulls which seems sub-optimal.
   
   I think the safest thing is probably to do what `GenericListBuilder` does and always set nullable to true, but it does feel a bit off...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1637: UnionBuilder produces incorrect Union DataType

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1637:
URL: https://github.com/apache/arrow-rs/issues/1637#issuecomment-1118609085

   Further https://github.com/apache/arrow-rs/issues/1649


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org