You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/09/02 17:31:00 UTC
[jira] [Updated] (ARROW-13803) [C++] Segfault on filtering taxi
dataset
[ https://issues.apache.org/jira/browse/ARROW-13803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-13803:
-----------------------------------
Labels: pull-request-available query-engine (was: query-engine)
> [C++] Segfault on filtering taxi dataset
> ----------------------------------------
>
> Key: ARROW-13803
> URL: https://issues.apache.org/jira/browse/ARROW-13803
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Environment: macOS 11.2.1, MacBook Pro (13-inch, M1, 2020)
> Reporter: Neal Richardson
> Priority: Major
> Labels: pull-request-available, query-engine
> Fix For: 6.0.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Found this while testing ARROW-13740. Using the nyc-taxi dataset:
> {code}
> ds %>%
> filter(total_amount > 0, passenger_count > 0) %>%
> summarise(n = n()) %>%
> collect()
> {code}
> {code}
> *** caught segfault ***
> address 0x161784000, cause 'invalid permissions'
> Traceback:
> 1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options)
> ...
> {code}
> lldb shows
> {code}
> * thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000)
> frame #0: 0x000000013a79d9cc libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long long) + 296
> libarrow.600.dylib`arrow::BitUtil::SetBitmap:
> -> 0x13a79d9cc <+296>: ldrb w10, [x8]
> 0x13a79d9d0 <+300>: cmp w9, #0x8 ; =0x8
> 0x13a79d9d4 <+304>: cset w11, lo
> 0x13a79d9d8 <+308>: and w9, w9, #0x7
> Target 0: (R) stopped.
> (lldb)
> {code}
> Interestingly, I can evaluate those filter expressions just fine, and it only seems to crash if both are provided. And I can count over the data with both:
> {code}
> ds %>%
> group_by(total_amount > 0, passenger_count > 0) %>%
> summarize(n=n()) %>%
> collect()
> # A tibble: 4 × 3
> `total_amount > 0` `passenger_count > 0` n
> <lgl> <lgl> <int>
> 1 FALSE FALSE 805
> 2 FALSE TRUE 368680
> 3 TRUE FALSE 5810556
> 4 TRUE TRUE 1541561340
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)