You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/03/31 03:32:00 UTC

[jira] [Updated] (ARROW-15221) [C++] Occasional failure arrow-compute-hash-join-node-test

     [ https://issues.apache.org/jira/browse/ARROW-15221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weston Pace updated ARROW-15221:
--------------------------------
    Attachment: hash-join-node-test-failure.log

> [C++] Occasional failure arrow-compute-hash-join-node-test
> ----------------------------------------------------------
>
>                 Key: ARROW-15221
>                 URL: https://issues.apache.org/jira/browse/ARROW-15221
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: David Li
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: query-engine
>         Attachments: hash-join-node-test-failure.log, log.txt
>
>
> The test seems to be flaky. [Full log|https://github.com/ursacomputing/crossbow/runs/4664466384?check_suite_focus=true]
> {noformat}
> 44/84 Test #35: arrow-compute-hash-join-node-test .........***Failed    8.63 sec
> Running arrow-compute-hash-join-node-test, redirecting output into /build/cpp/build/test-logs/arrow-compute-hash-join-node-test.txt (attempt 1/1)
> /arrow/cpp/build-support/run-test.sh: line 88: 19125 Segmentation fault      (core dumped) $TEST_EXECUTABLE "$@" > $LOGFILE.raw 2>&1
> Running main() from /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc
> [==========] Running 23 tests from 2 test suites.
> [----------] Global test environment set-up.
> [----------] 7 tests from HashJoin
> [ RUN      ] HashJoin.Random
> /arrow/cpp/src/arrow/compute/exec/hash_join_node_test.cc:934: Failure
> Failed
> '_error_or_value46.status()' failed with Cancelled: Scheduler cancelled
> Google Test trace:
> /arrow/cpp/src/arrow/compute/exec/hash_join_node_test.cc:1053: FULL_OUTER IS parallel = false
> /build/cpp/src/arrow/compute/exec
> {noformat}
> Another one observed in AMD64 Conda C++ [Full Log|https://github.com/apache/arrow/runs/5055044211?check_suite_focus=true]
> {noformat}
> [----------] 7 tests from HashJoin
> [ RUN      ] HashJoin.Random
> Found core dump, printing backtrace:warning: core file may not match specified executable file.
> [New LWP 19309]
> [New LWP 19308]
> [New LWP 19306]
> [New LWP 19310]
> [New LWP 19307]
> [New LWP 19311]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/build/cpp/debug/arrow-compute-hash-join-node-test'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x0000000000011479 in ?? ()
> [Current thread is 1 (Thread 0x7f8cfcb7d700 (LWP 19309))]Thread 6 (Thread 0x7f8cf9fff700 (LWP 19311)):
> #0  0x00007f8d01131065 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f8cf9ffd4a0, expected=0, futex_word=0x7f8cff40a790) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> #1  __pthread_cond_wait_common (abstime=0x7f8cf9ffd4a0, mutex=0x7f8cff40a7d8, cond=0x7f8cff40a768) at pthread_cond_wait.c:539
> #2  __pthread_cond_timedwait (cond=0x7f8cff40a768, mutex=0x7f8cff40a7d8, abstime=0x7f8cf9ffd4a0) at pthread_cond_wait.c:667
> #3  0x00007f8d04411496 in background_thread_sleep (tsdn=<optimized out>, interval=<optimized out>, info=0x7f8cff40a760) at src/background_thread.c:255
> #4  background_work_sleep_once (ind=<optimized out>, info=<optimized out>, tsdn=<optimized out>) at src/background_thread.c:307
> #5  background_work (ind=<optimized out>, tsd=<optimized out>) at src/background_thread.c:497
> #6  background_thread_entry () at src/background_thread.c:522
> #7  0x00007f8d0112a6db in start_thread (arg=0x7f8cf9fff700) at pthread_create.c:463
> #8  0x00007f8d0180171f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95Thread 5 (Thread 0x7f8cfedff700 (LWP 19307)):
> #0  0x00007f8d01131065 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f8cfedfd4a0, expected=0, futex_word=0x7f8cff40a5f0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> #1  __pthread_cond_wait_common (abstime=0x7f8cfedfd4a0, mutex=0x7f8cff40a638, cond=0x7f8cff40a5c8) at pthread_cond_wait.c:539
> #2  __pthread_cond_timedwait (cond=0x7f8cff40a5c8, mutex=0x7f8cff40a638, abstime=0x7f8cfedfd4a0) at pthread_cond_wait.c:667
> #3  0x00007f8d04411bc6 in background_thread_sleep (tsdn=<optimized out>, interval=<optimized out>, info=<optimized out>) at src/background_thread.c:255
> #4  background_work_sleep_once (ind=0, info=<optimized out>, tsdn=<optimized out>) at src/background_thread.c:307
> #5  background_thread0_work (tsd=<optimized out>) at src/background_thread.c:452
> #6  background_work (ind=<optimized out>, tsd=<optimized out>) at src/background_thread.c:490
> #7  background_thread_entry () at src/background_thread.c:522
> #8  0x00007f8d0112a6db in start_thread (arg=0x7f8cfedff700) at pthread_create.c:463
> #9  0x00007f8d0180171f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95Thread 4 (Thread 0x7f8cfb5ff700 (LWP 19310)):
> #0  0x00007f8d01131065 in futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7f8cfb5fd4a0, expected=0, futex_word=0x7f8cff40a6c0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
> #1  __pthread_cond_wait_common (abstime=0x7f8cfb5fd4a0, mutex=0x7f8cff40a708, cond=0x7f8cff40a698) at pthread_cond_wait.c:539
> #2  __pthread_cond_timedwait (cond=0x7f8cff40a698, mutex=0x7f8cff40a708, abstime=0x7f8cfb5fd4a0) at pthread_cond_wait.c:667
> #3  0x00007f8d04411496 in background_thread_sleep (tsdn=<optimized out>, interval=<optimized out>, info=0x7f8cff40a690) at src/background_thread.c:255
> #4  background_work_sleep_once (ind=<optimized out>, info=<optimized out>, tsdn=<optimized out>) at src/background_thread.c:307
> #5  background_work (ind=<optimized out>, tsd=<optimized out>) at src/background_thread.c:497
> #6  background_thread_entry () at src/background_thread.c:522
> #7  0x00007f8d0112a6db in start_thread (arg=0x7f8cfb5ff700) at pthread_create.c:463
> #8  0x00007f8d0180171f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95Thread 3 (Thread 0x7f8cff6db0c0 (LWP 19306)):
> #0  0x00005630dcb287d4 in __gnu_cxx::operator==<int const*, std::vector<int, std::allocator<int> > > (__lhs=..., __rhs=...) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/stl_iterator.h:890
> #1  0x00005630dcb1d3d1 in std::vector<int, std::allocator<int> >::empty (this=0x7ffc7ab72ae0) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/stl_vector.h:1005
> #2  0x00005630dcafd4ea in arrow::compute::HashJoinSimpleInt (join_type=arrow::compute::JoinType::FULL_OUTER, l=..., null_in_key_l=..., r=..., null_in_key_r=..., result_l=0x7ffc7ab72cb0, result_r=0x7ffc7ab72cd0, output_length_limit=100000, length_limit_reached=0x7ffc7ab72e77) at /arrow/cpp/src/arrow/compute/exec/hash_join_node_test.cc:781
> #3  0x00005630dcafe22c in arrow::compute::HashJoinSimple (ctx=0x5630de57a320, join_type=arrow::compute::JoinType::FULL_OUTER, cmp=..., num_key_fields=1, key_id_l=..., key_id_r=..., original_l=..., original_r=..., l=..., r=..., output_ids_l=..., output_ids_r=..., output_length_limit=100000, length_limit_reached=0x7ffc7ab72e77) at /arrow/cpp/src/arrow/compute/exec/hash_join_node_test.cc:887
> #4  0x00005630dcb011c0 in arrow::compute::HashJoin_Random_Test::TestBody (this=0x5630de47a300) at /arrow/cpp/src/arrow/compute/exec/hash_join_node_test.cc:1067
> #5  0x00007f8d056a3c9c in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=0x5630de47a300, method=&virtual testing::Test::TestBody(), location=0x7f8d056b897b "the test body") at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2607
> #6  0x00007f8d0569add2 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0x5630de47a300, method=&virtual testing::Test::TestBody(), location=0x7f8d056b897b "the test body") at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2643
> #7  0x00007f8d05675c03 in testing::Test::Run (this=0x5630de47a300) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2682
> #8  0x00007f8d0567663b in testing::TestInfo::Run (this=0x5630de476b50) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2861
> #9  0x00007f8d05677010 in testing::TestSuite::Run (this=0x5630de476c70) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:3015
> #10 0x00007f8d0568731c in testing::internal::UnitTestImpl::RunAllTests (this=0x5630de4762e0) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:5855
> #11 0x00007f8d056a4ce8 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x5630de4762e0, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x7f8d05686ed8 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x7f8d056b9468 "auxiliary test code (environments or event listeners)") at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2607
> #12 0x00007f8d0569c064 in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x5630de4762e0, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x7f8d05686ed8 <testing::internal::UnitTestImpl::RunAllTests()>, location=0x7f8d056b9468 "auxiliary test code (environments or event listeners)") at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2643
> #13 0x00007f8d056857b7 in testing::UnitTest::Run (this=0x7f8d056e5260 <testing::UnitTest::GetInstance()::instance>) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:5438
> #14 0x00007f8d056e6919 in RUN_ALL_TESTS () at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/include/gtest/gtest.h:2490
> #15 0x00007f8d056e695c in main (argc=1, argv=0x7ffc7ab739d8) at /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc:52
> #16 0x00007f8d01701bf7 in __libc_start_main (main=0x7f8d056e691b <main(int, char**)>, argc=1, argv=0x7ffc7ab739d8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc7ab739c8) at ../csu/libc-start.c:310
> #17 0x00005630dcaedf49 in _start ()Thread 2 (Thread 0x7f8cfdb7e700 (LWP 19308)):
> #0  0x00007f8d01130ad3 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x5630de565a80) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> #1  __pthread_cond_wait_common (abstime=0x0, mutex=0x5630de565a30, cond=0x5630de565a58) at pthread_cond_wait.c:502
> #2  __pthread_cond_wait (cond=0x5630de565a58, mutex=0x5630de565a30) at pthread_cond_wait.c:655
> #3  0x00007f8d01b994d1 in __gthread_cond_wait (__mutex=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, __cond=<optimized out>) at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1643063175398/work/build/x86_64-conda-linux-gnu/libstdc++-v3/src/c++11/condition_variable.cc:865
> #4  std::__condvar::wait (__m=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, this=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/gthr-default.h:155
> #5  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../libstdc++-v3/src/c++11/condition_variable.cc:41
> #6  0x00007f8d02f8fbb7 in arrow::internal::WorkerLoop (state=..., it=...) at /arrow/cpp/src/arrow/util/thread_pool.cc:195
> #7  0x00007f8d02f90960 in arrow::internal::ThreadPool::<lambda()>::operator()(void) const (__closure=0x5630de561958) at /arrow/cpp/src/arrow/util/thread_pool.cc:344
> #8  0x00007f8d02f97498 in std::__invoke_impl<void, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> >(std::__invoke_other, arrow::internal::ThreadPool::<lambda()> &&) (__f=...) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/invoke.h:60
> #9  0x00007f8d02f97438 in std::__invoke<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> >(arrow::internal::ThreadPool::<lambda()> &&) (__fn=...) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/invoke.h:95
> #10 0x00007f8d02f973d6 in std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x5630de561958) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/thread:244
> #11 0x00007f8d02f97293 in std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > >::operator()(void) (this=0x5630de561958) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/thread:251
> #12 0x00007f8d02f971e4 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > > >::_M_run(void) (this=0x5630de561950) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/thread:195
> #13 0x00007f8d01b9d9d4 in std::execute_native_thread_routine (__p=<optimized out>) at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1643063175398/work/build/x86_64-conda-linux-gnu/libstdc++-v3/include/bits/new_allocator.h:82
> #14 0x00007f8d0112a6db in start_thread (arg=0x7f8cfdb7e700) at pthread_create.c:463
> #15 0x00007f8d0180171f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95Thread 1 (Thread 0x7f8cfcb7d700 (LWP 19309)):
> #0  0x0000000000011479 in ?? ()
> #1  0x00007f8d0331fae3 in arrow::compute::TaskSchedulerImpl::ScheduleMore (this=0x5630de572960, thread_id=0, num_tasks_finished=0) at /arrow/cpp/src/arrow/compute/exec/task_util.cc:326
> #2  0x00007f8d0331e94c in arrow::compute::TaskSchedulerImpl::StartTaskGroup (this=0x5630de572960, thread_id=0, group_id=1, total_num_tasks=0) at /arrow/cpp/src/arrow/compute/exec/task_util.cc:153
> #3  0x00007f8d0327d952 in arrow::compute::HashJoinBasicImpl::ProbeQueuedBatches (this=0x7f8cec24aee0, thread_index=0) at /arrow/cpp/src/arrow/compute/exec/hash_join.cc:726
> #4  0x00007f8d0327d13b in arrow::compute::HashJoinBasicImpl::BuildHashTable_on_finished (this=0x7f8cec24aee0, thread_index=0) at /arrow/cpp/src/arrow/compute/exec/hash_join.cc:663
> #5  0x00007f8d0327d2db in arrow::compute::HashJoinBasicImpl::RegisterBuildHashTable()::{lambda(unsigned long)#2}::operator()(unsigned long) const (__closure=0x5630de654840, thread_index=0) at /arrow/cpp/src/arrow/compute/exec/hash_join.cc:674
> #6  0x00007f8d0328213c in std::_Function_handler<arrow::Status (unsigned long), arrow::compute::HashJoinBasicImpl::RegisterBuildHashTable()::{lambda(unsigned long)#2}>::_M_invoke(std::_Any_data const&, unsigned long&&) (__functor=..., __args#0=@0x7f8cfcb7b138: 0) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/std_function.h:286
> #7  0x00007f8d032aa81e in std::function<arrow::Status (unsigned long)>::operator()(unsigned long) const (this=0x5630de654840, __args#0=0) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/std_function.h:688
> #8  0x00007f8d0331f041 in arrow::compute::TaskSchedulerImpl::OnTaskGroupFinished (this=0x5630de572960, thread_id=0, group_id=0, all_task_groups_finished=0x7f8cfcb7b230) at /arrow/cpp/src/arrow/compute/exec/task_util.cc:244
> #9  0x00007f8d0331f934 in arrow::compute::TaskSchedulerImpl::<lambda(size_t)>::operator()(size_t) const (__closure=0x5630de6a1390, thread_id=0) at /arrow/cpp/src/arrow/compute/exec/task_util.cc:349
> #10 0x00007f8d0332152f in std::_Function_handler<arrow::Status(long unsigned int), arrow::compute::TaskSchedulerImpl::ScheduleMore(size_t, int)::<lambda(size_t)> >::_M_invoke(const std::_Any_data &, unsigned long &&) (__functor=..., __args#0=@0x7f8cfcb7b2b8: 0) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/std_function.h:286
> #11 0x00007f8d032aa81e in std::function<arrow::Status (unsigned long)>::operator()(unsigned long) const (this=0x5630de654f70, __args#0=0) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/std_function.h:688
> #12 0x00007f8d032a7f8c in arrow::compute::HashJoinNode::ScheduleTaskCallback(std::function<arrow::Status (unsigned long)>)::{lambda()#1}::operator()() const (__closure=0x5630de654f68) at /arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:604
> #13 0x00007f8d032b9329 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::compute::HashJoinNode::ScheduleTaskCallback(std::function<arrow::Status (unsigned long)>)::{lambda()#1}>::invoke() (this=0x5630de654f60) at /arrow/cpp/src/arrow/util/functional.h:152
> #14 0x00007f8d02f91ade in arrow::internal::FnOnce<void ()>::operator()() && (this=0x7f8cfcb7b3f0) at /arrow/cpp/src/arrow/util/functional.h:140
> #15 0x00007f8d02f8fa87 in arrow::internal::WorkerLoop (state=..., it=...) at /arrow/cpp/src/arrow/util/thread_pool.cc:177
> #16 0x00007f8d02f90960 in arrow::internal::ThreadPool::<lambda()>::operator()(void) const (__closure=0x5630de659468) at /arrow/cpp/src/arrow/util/thread_pool.cc:344
> #17 0x00007f8d02f97498 in std::__invoke_impl<void, arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> >(std::__invoke_other, arrow::internal::ThreadPool::<lambda()> &&) (__f=...) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/invoke.h:60
> #18 0x00007f8d02f97438 in std::__invoke<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> >(arrow::internal::ThreadPool::<lambda()> &&) (__fn=...) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/invoke.h:95
> #19 0x00007f8d02f973d6 in std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x5630de659468) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/thread:244
> #20 0x00007f8d02f97293 in std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > >::operator()(void) (this=0x5630de659468) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/thread:251
> #21 0x00007f8d02f971e4 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<arrow::internal::ThreadPool::LaunchWorkersUnlocked(int)::<lambda()> > > >::_M_run(void) (this=0x5630de659460) at /opt/conda/envs/arrow/x86_64-conda-linux-gnu/include/c++/9.4.0/thread:195
> #22 0x00007f8d01b9d9d4 in std::execute_native_thread_routine (__p=<optimized out>) at /home/conda/feedstock_root/build_artifacts/gcc_compilers_1643063175398/work/build/x86_64-conda-linux-gnu/libstdc++-v3/include/bits/new_allocator.h:82
> #23 0x00007f8d0112a6db in start_thread (arg=0x7f8cfcb7d700) at pthread_create.c:463
> #24 0x00007f8d0180171f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> /build/cpp/src/arrow/compute/exec {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)