You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Mark Hildreth (Jira)" <ji...@apache.org> on 2020/04/22 01:45:00 UTC

[jira] [Commented] (ARROW-8508) [Rust] ListBuilder of FixedSizeListBuilder creates wrong offsets

    [ https://issues.apache.org/jira/browse/ARROW-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17089198#comment-17089198 ] 

Mark Hildreth commented on ARROW-8508:
--------------------------------------

I believe there are a few things going on here:

1.) I wouldn't consider myself an expert on these APIs, but it seems like the builders are being used correctly.

2.) The debug output definitely appears broken. I opened a [PR to fix this|https://github.com/apache/arrow/pull/7006], which puts it more in line with how the non-fixed size *ListArray* works. This should fix the *value()* method on the FixedSizeListArray to properly take the offset into the child array into account.

3.) As for the asserts that fail, this I'm less certain on. The values from these asserts are taken from the *values()* method, which seems to just return the underlying array without taking offsets into account. This seems to be similar to how other arrays work (including primitives), so my guess it is by design. I don't have an explanation for a better way of using the API, so maybe someone else can provide input.

> [Rust] ListBuilder of FixedSizeListBuilder creates wrong offsets
> ----------------------------------------------------------------
>
>                 Key: ARROW-8508
>                 URL: https://issues.apache.org/jira/browse/ARROW-8508
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Rust
>    Affects Versions: 0.16.0
>            Reporter: Christian Beilschmidt
>            Priority: Major
>              Labels: pull-request-available
>
> I created an example of storing multi points with Arrow.
>  # A coordinate consists of two floats (Float64Builder)
>  # A multi point consists of one or more coordinates (FixedSizeListBuilder)
>  # A list of multi points consists of multiple multi points (ListBuilder)
> This is the corresponding code snippet:
> {code:java}
> let float_builder = arrow::array::Float64Builder::new(0);
> let coordinate_builder = arrow::array::FixedSizeListBuilder::new(float_builder, 2);
> let mut multi_point_builder = arrow::array::ListBuilder::new(coordinate_builder);
> multi_point_builder
>     .values()
>     .values()
>     .append_slice(&[0.0, 0.1])
>     .unwrap();
> multi_point_builder.values().append(true).unwrap();
> multi_point_builder
>     .values()
>     .values()
>     .append_slice(&[1.0, 1.1])
>     .unwrap();
> multi_point_builder.values().append(true).unwrap();
> multi_point_builder.append(true).unwrap(); // first multi point
> multi_point_builder
>     .values()
>     .values()
>     .append_slice(&[2.0, 2.1])
>     .unwrap();
> multi_point_builder.values().append(true).unwrap();
> multi_point_builder
>     .values()
>     .values()
>     .append_slice(&[3.0, 3.1])
>     .unwrap();
> multi_point_builder.values().append(true).unwrap();
> multi_point_builder
>     .values()
>     .values()
>     .append_slice(&[4.0, 4.1])
>     .unwrap();
> multi_point_builder.values().append(true).unwrap();
> multi_point_builder.append(true).unwrap(); // second multi point
> let multi_point = dbg!(multi_point_builder.finish());
> let first_multi_point_ref = multi_point.value(0);
> let first_multi_point: &arrow::array::FixedSizeListArray = first_multi_point_ref.as_any().downcast_ref().unwrap();
> let coordinates_ref = first_multi_point.values();
> let coordinates: &Float64Array = coordinates_ref.as_any().downcast_ref().unwrap();
> assert_eq!(coordinates.value_slice(0, 2 * 2), &[0.0, 0.1, 1.0, 1.1]);
> let second_multi_point_ref = multi_point.value(1);
> let second_multi_point: &arrow::array::FixedSizeListArray = second_multi_point_ref.as_any().downcast_ref().unwrap();
> let coordinates_ref = second_multi_point.values();
> let coordinates: &Float64Array = coordinates_ref.as_any().downcast_ref().unwrap();
> assert_eq!(coordinates.value_slice(0, 2 * 3), &[2.0, 2.1, 3.0, 3.1, 4.0, 4.1]);
> {code}
> The second assertion fails and the output is {{[0.0, 0.1, 1.0, 1.1, 2.0, 2.1]}}.
> Moreover, the debug output produced from {{dbg!}} confirms this:
> {noformat}
> [
>   FixedSizeListArray<2>
> [
>   PrimitiveArray<Float64>
> [
>   0.0,
>   0.1,
> ],
>   PrimitiveArray<Float64>
> [
>   1.0,
>   1.1,
> ],
> ],
>   FixedSizeListArray<2>
> [
>   PrimitiveArray<Float64>
> [
>   0.0,
>   0.1,
> ],
>   PrimitiveArray<Float64>
> [
>   1.0,
>   1.1,
> ],
>   PrimitiveArray<Float64>
> [
>   2.0,
>   2.1,
> ],
> ],
> ]{noformat}
> The second list should contain the values 2-4.
>  
> So either I am using the builder wrong or there is a bug with the offsets. I used {{0.16}} as well as the current {{master}} from GitHub.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)