You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Kirill Lykov (Jira)" <ji...@apache.org> on 2020/12/29 17:17:00 UTC

[jira] [Commented] (ARROW-10578) [C++] Comparison kernels crashing for string array with null string scalar

    [ https://issues.apache.org/jira/browse/ARROW-10578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17256068#comment-17256068 ] 

Kirill Lykov commented on ARROW-10578:
--------------------------------------

Problem is still reproducible. It happens only for type==string

I don't see cpp tests for this use case: [https://github.com/apache/arrow/blob/52d615dc2cd64fbdbc10f2aeeb3b43ad5e879f3b/cpp/src/arrow/compute/kernels/scalar_compare_test.cc#L537]

Let me know if I look into the wrong place.
I will try to add unit test for this particular case.

I also think it makes sense to add test on pyarrow. Something similar to [https://github.com/apache/arrow/blob/64f9b3fbe9ef4c718449a735435b53ab992ca852/python/pyarrow/tests/test_compute.py#L769]


The problem is that the scalar is invalid (`datum->is_valid == false`): see [https://github.com/apache/arrow/blob/ca685a0c08bb41f43a80e5605e4cc8f9efb77cca/cpp/src/arrow/compute/kernels/codegen_internal.h#L713 
|https://github.com/apache/arrow/blob/ca685a0c08bb41f43a80e5605e4cc8f9efb77cca/cpp/src/arrow/compute/kernels/codegen_internal.h#L713]But we deference val at codegen_internal.h:275 and trying to create string_view from data_ which has address 0x10.

To fix the bug, I guess some additional checks should be added to https://github.com/apache/arrow/blame/ca685a0c08bb41f43a80e5605e4cc8f9efb77cca/cpp/src/arrow/compute/kernels/codegen_internal.h#L273
Something like if scalar is invalid, return default string_view.




 

> [C++] Comparison kernels crashing for string array with null string scalar
> --------------------------------------------------------------------------
>
>                 Key: ARROW-10578
>                 URL: https://issues.apache.org/jira/browse/ARROW-10578
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Joris Van den Bossche
>            Priority: Major
>
> Comparing a string array with a string scalar works:
> {code}
> In [1]: import pyarrow.compute as pc
> In [2]: pc.equal(pa.array(["a", None, "b"]), pa.scalar("a", type="string")
> Out[2]: 
> <pyarrow.lib.BooleanArray object at 0x7f38d56e23a8>
> [
>   true,
>   null,
>   false
> ]
> {code}
> but if the scalar is a null (from the proper string type), it crashes:
> {code}
> In [4]: pc.equal(pa.array(["a", None, "b"]), pa.scalar(None, type="string"))
> Segmentation fault (core dumped)
> {code}
> (and not even debug messages ..)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)