You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/08/30 21:36:00 UTC

[jira] [Created] (ARROW-13803) [C++] Segfault on filtering taxi dataset

Neal Richardson created ARROW-13803:
---------------------------------------

             Summary: [C++] Segfault on filtering taxi dataset
                 Key: ARROW-13803
                 URL: https://issues.apache.org/jira/browse/ARROW-13803
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++
         Environment: macOS 11.2.1, MacBook Pro (13-inch, M1, 2020)
            Reporter: Neal Richardson
             Fix For: 6.0.0


Found this while testing ARROW-13740. Using the nyc-taxi dataset:

{code}
ds %>%
  filter(total_amount > 0, passenger_count > 0) %>%
  summarise(n = n()) %>%
  collect()
{code}

{code}
 *** caught segfault ***
address 0x161784000, cause 'invalid permissions'

Traceback:
 1: .Call(`_arrow_ExecPlan_run`, plan, final_node, sort_options)
...
{code}

lldb shows 

{code}
* thread #11, stop reason = EXC_BAD_ACCESS (code=1, address=0x1631a8000)
    frame #0: 0x000000013a79d9cc libarrow.600.dylib`arrow::BitUtil::SetBitmap(unsigned char*, long long, long long) + 296
libarrow.600.dylib`arrow::BitUtil::SetBitmap:
->  0x13a79d9cc <+296>: ldrb   w10, [x8]
    0x13a79d9d0 <+300>: cmp    w9, #0x8                  ; =0x8 
    0x13a79d9d4 <+304>: cset   w11, lo
    0x13a79d9d8 <+308>: and    w9, w9, #0x7
Target 0: (R) stopped.
(lldb) 
{code}

Interestingly, I can evaluate those filter expressions just fine, and it only seems to crash if both are provided. And I can count over the data with both:

{code}
ds %>% 
  group_by(total_amount > 0, passenger_count > 0)
  %>% summarize(n=n())
  %>% collect()

# A tibble: 4 × 3
  `total_amount > 0` `passenger_count > 0`          n
  <lgl>              <lgl>                      <int>
1 FALSE              FALSE                        805
2 FALSE              TRUE                      368680
3 TRUE               FALSE                    5810556
4 TRUE               TRUE                  1541561340
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)