You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/08/30 21:36:00 UTC
[jira] [Created] (ARROW-13803) [C++] Segfault on filtering taxi
dataset
Neal Richardson created ARROW-13803:
---------------------------------------
Summary: [C++] Segfault on filtering taxi dataset
Key: ARROW-13803
URL: https://issues.apache.org/jira/browse/ARROW-13803
Project: Apache Arrow
Issue Type: New Feature
Components: C++
Environment: macOS 11.2.1, MacBook Pro (13-inch, M1, 2020)
Reporter: Neal Richardson
Fix For: 6.0.0
Found this while testing ARROW-13740. Using the nyc-taxi dataset:
{code}
ds %>%
filter(total_amount > 0, passenger_count > 0) %>%
summarise(n = n()) %>%
collect()
{code}
{code}
*** caught segfault ***
address 0x161784000, cause 'invalid permissions'
Traceback:
1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options)
...
{code}
lldb shows
{code}
* thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000)
frame #0: 0x000000013a79d9cc libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long long) + 296
libarrow.600.dylib`arrow::BitUtil::SetBitmap:
-> 0x13a79d9cc <+296>: ldrb w10, [x8]
0x13a79d9d0 <+300>: cmp w9, #0x8 ; =0x8
0x13a79d9d4 <+304>: cset w11, lo
0x13a79d9d8 <+308>: and w9, w9, #0x7
Target 0: (R) stopped.
(lldb)
{code}
Interestingly, I can evaluate those filter expressions just fine, and it only seems to crash if both are provided. And I can count over the data with both:
{code}
ds %>%
group_by(total_amount > 0, passenger_count > 0)
%>% summarize(n=n())
%>% collect()
# A tibble: 4 × 3
`total_amount > 0` `passenger_count > 0` n
<lgl> <lgl> <int>
1 FALSE FALSE 805
2 FALSE TRUE 368680
3 TRUE FALSE 5810556
4 TRUE TRUE 1541561340
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)