You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2022/04/21 19:29:00 UTC

[jira] [Closed] (ARROW-15664) [C++] parquet reader Segfaults with illegal SIMD instruction

     [ https://issues.apache.org/jira/browse/ARROW-15664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Keane closed ARROW-15664.
----------------------------------
    Resolution: Duplicate

> [C++] parquet reader Segfaults with illegal SIMD instruction 
> -------------------------------------------------------------
>
>                 Key: ARROW-15664
>                 URL: https://issues.apache.org/jira/browse/ARROW-15664
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 7.0.0
>            Reporter: Jonathan Keane
>            Priority: Critical
>             Fix For: 7.0.1, 8.0.0
>
>
> When compiling with {{-Os}} (or with release type {{MinRelSize}}), and we run parquet tests (in R at least, though I imagine the pyarrow and C++ will have the same issues!) we get a segfault with an illegal opcode on systems that don't have BMI2 available when trying to read parquet files. (It turns out, the github runners for macos don't have BMI2, so this is easily testable there!)
> Somehow in the optimization combined with the way our runtime detection code works, the runtime detection we normally use for this fails (though it works just fine with {{-O2}}, {{-O3}}, etc.).
> When diagnosing this, I created a branch + PR that runs our R tests after installing from brew which can reliably cause this to happen: https://github.com/apache/arrow/pull/12364 other test suites that exercise parquet reading would probably have the same issue (or even C++ tests built with {{-Os}}.
> Here's a coredump:
> {code}
> 2491 Thread_829819
> + 2491 thread_start  (in libsystem_pthread.dylib) + 15  [0x7ff801c3e00f]
> +   2491 _pthread_start  (in libsystem_pthread.dylib) + 125  [0x7ff801c424f4]
> +     2491 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::$_3> >(void*)  (in arrow.so) + 380  [0x109203749]
> +       2491 arrow::internal::FnOnce<void ()>::operator()() &&  (in arrow.so) + 26  [0x109201f30]
> +         2491 arrow::internal::FnOnce<void ()>::FnImpl<std::__1::__bind<arrow::detail::ContinueFuture, arrow::Future<std::__1::shared_ptr<arrow::ChunkedArray> >&, parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::__1::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::vector<int, std::__1::allocator<int> > const&, arrow::internal::Executor*)::$_4&, unsigned long&, std::__1::shared_ptr<parquet::arrow::ColumnReaderImpl> > >::invoke()  (in arrow.so) + 98  [0x108f125c2]
> +           2491 parquet::arrow::(anonymous namespace)::FileReaderImpl::DecodeRowGroups(std::__1::shared_ptr<parquet::arrow::(anonymous namespace)::FileReaderImpl>, std::__1::vector<int, std::__1::allocator<int> > const&, std::__1::vector<int, std::__1::allocator<int> > const&, arrow::internal::Executor*)::$_4::operator()(unsigned long, std::__1::shared_ptr<parquet::arrow::ColumnReaderImpl>) const  (in arrow.so) + 47  [0x108f11ed5]
> +             2491 parquet::arrow::(anonymous namespace)::FileReaderImpl::ReadColumn(int, std::__1::vector<int, std::__1::allocator<int> > const&, parquet::arrow::ColumnReader*, std::__1::shared_ptr<arrow::ChunkedArray>*)  (in arrow.so) + 273  [0x108f0c037]
> +               2491 parquet::arrow::ColumnReaderImpl::NextBatch(long long, std::__1::shared_ptr<arrow::ChunkedArray>*)  (in arrow.so) + 39  [0x108f0733b]
> +                 2491 parquet::arrow::(anonymous namespace)::LeafReader::LoadBatch(long long)  (in arrow.so) + 137  [0x108f0794b]
> +                   2491 parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)1> >::ReadRecords(long long)  (in arrow.so) + 442  [0x108f4f53e]
> +                     2491 parquet::internal::(anonymous namespace)::TypedRecordReader<parquet::PhysicalType<(parquet::Type::type)1> >::ReadRecordData(long long)  (in arrow.so) + 471  [0x108f50503]
> +                       2491 void parquet::internal::standard::DefLevelsToBitmapSimd<false>(short const*, long long, parquet::internal::LevelInfo, parquet::internal::ValidityBitmapInputOutput*)  (in arrow.so) + 250  [0x108fc2a5a]
> +                         2491 long long parquet::internal::standard::DefLevelsBatchToBitmap<false>(short const*, long long, long long, parquet::internal::LevelInfo, arrow::internal::FirstTimeBitmapWriter*)  (in arrow.so) + 63  [0x108fc34da]
> +                           2491 ???  (in <unknown binary>)  [0x600001354518]
> +                             2491 _sigtramp  (in libsystem_platform.dylib) + 29  [0x7ff801c57e2d]
> +                               2491 sigactionSegv  (in libR.dylib) + 649  [0x1042598c9]  main.c:625
> +                                 2491 Rstd_ReadConsole  (in libR.dylib) + 2042  [0x10435160a]  sys-std.c:1044
> +                                   2491 R_SelectEx  (in libR.dylib) + 308  [0x104350854]  sys-std.c:178
> +                                     2491 __select  (in libsystem_kernel.dylib) + 10  [0x7ff801c0de4a]
> {code}
> And then a disassembly (where you can see a SHLX that shouldn't be there):
> {code}
> Dump of assembler code from 0x13ac6db00 to 0x13ac6db99ff:
>  ...
> --Type <RET> for more, q to quit, c to continue without paging--
>    0x000000013ac6db82:	mov    $0x8,%ecx
>    0x000000013ac6db87:	sub    %rax,%rcx
>    0x000000013ac6db8a:	lea    0xf1520b(%rip),%rdi        # 0x13bb82d9c
>    0x000000013ac6db91:	movzbl (%rcx,%rdi,1),%edi
>    0x000000013ac6db95:	mov    %esi,%ebx
>    0x000000013ac6db97:	and    %edi,%ebx
> => 0x000000013ac6db99:	shlx   %rax,%rbx,%rax
>    0x000000013ac6db9e:	or     0x18(%r15),%al
>    0x000000013ac6dba2:	mov    %al,0x18(%r15)
>    0x000000013ac6dba6:	cmp    %rdx,%rcx
>    0x000000013ac6dba9:	jg     0x13ac6dbf5
>    0x000000013ac6dbab:	mov    %al,(%r14)
>    0x000000013ac6dbae:	inc    %r14
>    0x000000013ac6dbb1:	shrx   %rcx,%rsi,%rax
>    0x000000013ac6dbb6:	mov    %rax,-0x20(%rbp)
>    0x000000013ac6dbba:	sub    %rcx,%rdx
>    0x000000013ac6dbbd:	mov    %rdx,%rbx
>    0x000000013ac6dbc0:	sar    $0x3,%rbx
>    0x000000013ac6dbc4:	and    $0x7,%edx
>    0x000000013ac6dbc7:	cmp    $0x1,%rdx
>    0x000000013ac6dbcb:	sbb    $0xffffffffffffffff,%rbx
>    0x000000013ac6dbcf:	lea    -0x20(%rbp),%rsi
>    0x000000013ac6dbd3:	mov    %r14,%rdi
> ...
> {code}
> We discovered this because homebrew alters the default build flags and uses {{-Os}}, though we should include a test that tests this in our CI as well (at least as a nightly) to catch it earlier: https://github.com/Homebrew/homebrew-core/issues/94724



--
This message was sent by Atlassian Jira
(v8.20.7#820007)