You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Rasmus Johansen (Jira)" <ji...@apache.org> on 2022/09/14 21:29:00 UTC

[jira] [Created] (ARROW-17733) [C++] Concatenating dictionary arrays with nulls fills wrong parts of index buffer with 0.

Rasmus Johansen created ARROW-17733:
---------------------------------------

             Summary: [C++] Concatenating dictionary arrays with nulls fills wrong parts of index buffer with 0.
                 Key: ARROW-17733
                 URL: https://issues.apache.org/jira/browse/ARROW-17733
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++
            Reporter: Rasmus Johansen


When concatenating dictionary arrays with nulls, and whose index type is not 8-bit wide the wrong bits of the index buffer get zeroed out.

Example using pyarrow:
{code:java}
import pyarrow as pa
dictionary_type = pa.dictionary(pa.int16(), pa.string())
empty_array = pa.array([], dictionary_type)
array1 = pa.array(["a", "b", None], dictionary_type)
array2 = pa.concat_arrays([empty_array, array1])
print(array1.to_pylist())
print(array2.to_pylist()) {code}
We would expect array1 and array2 to be the same, but this prints:
{noformat}
['a', 'b', None]
['a', 'a', None] {noformat}
 

This bug happens because the index type is 2-byte wide, so the null at position 2 should result in zeroing out byte 4-5 (0-indexed) of the index buffer. However the code instead zeroes out byte 2-3 because we don't take into account the width of the index type when adding the position here:

https://github.com/apache/arrow/blob/master/cpp/src/arrow/array/concatenate.cc#L314-L315



--
This message was sent by Atlassian Jira
(v8.20.10#820010)