You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Kouhei Sutou (Jira)" <ji...@apache.org> on 2020/11/09 19:19:00 UTC

[jira] [Assigned] (ARROW-10411) [C++] Fix incorrect child array lengths for Concatenate of FixedSizeList

     [ https://issues.apache.org/jira/browse/ARROW-10411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kouhei Sutou reassigned ARROW-10411:
------------------------------------

    Assignee: Johan Peltenburg

> [C++] Fix incorrect child array lengths for Concatenate of FixedSizeList
> ------------------------------------------------------------------------
>
>                 Key: ARROW-10411
>                 URL: https://issues.apache.org/jira/browse/ARROW-10411
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 2.0.0, 3.0.0
>            Reporter: Johan Peltenburg
>            Assignee: Johan Peltenburg
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> When attempting to CombineChunks() on an arrow::Table containing a FixedSizeList type Array, the child arrays of the FixedSizeLists are not properly concatenated. The lengths of the child array being set incorrectly. I ran into this when trying to ToString() the combined RecordBatch.
> This seems to be because this function in :
>  cpp/arrow/array/concatenate.cc
> {code:java}
> Result<std::vector<std::shared_ptr<const ArrayData>>> ChildData(size_t index)
> {code}
> ... used to calculate offsets and slice lengths before actual concatenation doesn't take the list lengths into account.
> The bug can be reproduced by adding the following unit test to:
>  cpp/arrow/array/concatenate_test.cc
>  
> {code:java}
> TEST_F(ConcatenateTest, FixedSizeListType) {
>   Check([this](int32_t size, double null_probability, std::shared_ptr<Array>* out) {
>     auto list_size = 3;
>     auto values_size = size * list_size;
>     auto values = this->GeneratePrimitive<Int8Type>(values_size, null_probability);
>     ASSERT_OK_AND_ASSIGN(*out, FixedSizeListArray::FromArrays(values, list_size));
>     ASSERT_OK((**out).ValidateFull());
>   });
> }
> {code}
> One possible approach to fix this would be to add another ChildData overload to ConcatenateImpl with a multiplier parameter, and multiply the offset and length of the slice by the multiplier. This function can be called by the FixedSizeList Visitor and be supplied with the list length as multiplier.
> I have this fix ready but would like to know if this would be the right approach.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)