You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "zhaorenhai (Jira)" <ji...@apache.org> on 2020/08/17 08:16:00 UTC

[jira] [Updated] (IMPALA-10088) DeadLock while run unifiedbetests on aarch64 platform

     [ https://issues.apache.org/jira/browse/IMPALA-10088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhaorenhai updated IMPALA-10088:
--------------------------------
    Description: 
When run unifiedbetests and impalad on aarch64 platform, when init tcmalloc, will happen deadlock.

The stacktrace is as following:

 
{code:java}
(gdb) bt
#0  0x0000ffff83099544 in __GI___nanosleep (requested_time=0xffffffc71698, remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x00000000054cf144 in base::internal::SpinLockDelay (w=0x77385b0 <tcmalloc::Static::pageheap_lock_>, value=2, loop=727956) at /home/impala/impala/be/src/gutil/spinlock_linux-inl.h:86
#2  0x0000000005529800 in SpinLock::SlowLock() ()
#3  0x00000000055fb5c4 in tcmalloc::ThreadCache::InitModule() ()
#4  0x0000000005743374 in tc_calloc ()
#5  0x0000ffff81c737f4 in _dlerror_run (operate=operate@entry=0xffff81c73158 <dlsym_doit>, args=0xffffffc717d8, args@entry=0xffffffc717f8) at dlerror.c:140
#6  0x0000ffff81c731f0 in __dlsym (handle=<optimized out>, name=<optimized out>) at dlsym.c:70
#7  0x000000000310ee04 in (anonymous namespace)::dlsym_or_die (sym=0x606b260 "dlopen") at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:74
#8  0x000000000310ef1c in (anonymous namespace)::InitIfNecessary () at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:100
#9  0x000000000310f0b4 in dl_iterate_phdr (callback=0xffff81620d18 <_Unwind_IteratePhdrCallback>, data=0xffffffc71900) at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:158
#10 0x0000ffff816215b4 in _Unwind_Find_FDE (pc=0xffff8161f98f <_Unwind_Backtrace+79>, bases=bases@entry=0xffffffc72438) at ../../../gcc-7.5.0/libgcc/unwind-dw2-fde-dip.c:469
#11 0x0000ffff8161dfdc in uw_frame_state_for (context=context@entry=0xffffffc72110, fs=fs@entry=0xffffffc719f0) at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1249
#12 0x0000ffff8161ef3c in uw_init_context_1 (context=context@entry=0xffffffc72110, outer_cfa=0xffffffc72b50, outer_cfa@entry=0xffffffc72be0, outer_ra=0x55298d8 <GetStackTrace_libgcc(void**, int, int)+40>)
    at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1578
#13 0x0000ffff8161f990 in _Unwind_Backtrace (trace=0x5529a48 <libgcc_backtrace_helper(_Unwind_Context*, void*)>, trace_argument=0xffffffc72b68) at ../../../gcc-7.5.0/libgcc/unwind.inc:283
#14 0x00000000055298d8 in GetStackTrace_libgcc(void**, int, int) ()
#15 0x0000000005529db4 in GetStackTrace(void**, int, int) ()
#16 0x00000000055f891c in tcmalloc::PageHeap::GrowHeap(unsigned long) ()
{code}
I think this is same issue with [https://github.com/gperftools/gperftools/issues/1184] ,

because the issue will happen  when I building gperftools both with libunwind and without libunwind .

 

And KUDU also have same issue:

https://issues.apache.org/jira/browse/KUDU-3072

I think the  solution in following link is not correct

[https://gerrit.cloudera.org/#/c/15420/]

in aarch64 , the method of getting stacktrace is not same with arm.

I think the correct solution of getting stacktrace is should like this:

[https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/stacktrace_aarch64-inl.inc]

 

But I think the gperftools maybe not the root cause of this issue, because both gperftools and libunwind now can support aarch64 perfectly.

Maybe this commit of kudu has some bug?

[https://github.com/apache/kudu/commit/b621f9c1a3949dc31ca4836b0767b2840fa73f29]

Because in x86, the gperftools will not use libunwind or libgcc to getstacktrace, so the issue will not happen.

I tried :
{code:java}
#if !defined(THREAD_SANITIZER) && !defined(__APPLE__)
#define HOOK_DL_ITERATE_PHDR 1
#endif
{code}
change to 
{code:java}
#if !defined(THREAD_SANITIZER) && !defined(__APPLE__) && !defined(__aarch64__)
#define HOOK_DL_ITERATE_PHDR 1
#endif{code}
The deadlock issue will not happen.

 

[~tarmstrong@cloudera.com] [~tlipcon] [~adar]

What do you think about this issue? how to fix it? any suggestion?

 

 

 

  was:
When run unifiedbetests and impalad on aarch64 platform, when init tcmalloc, will happen deadlock.

The stacktrace is as following:

 
{code:java}
(gdb) bt
#0  0x0000ffff83099544 in __GI___nanosleep (requested_time=0xffffffc71698, remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x00000000054cf144 in base::internal::SpinLockDelay (w=0x77385b0 <tcmalloc::Static::pageheap_lock_>, value=2, loop=727956) at /home/impala/impala/be/src/gutil/spinlock_linux-inl.h:86
#2  0x0000000005529800 in SpinLock::SlowLock() ()
#3  0x00000000055fb5c4 in tcmalloc::ThreadCache::InitModule() ()
#4  0x0000000005743374 in tc_calloc ()
#5  0x0000ffff81c737f4 in _dlerror_run (operate=operate@entry=0xffff81c73158 <dlsym_doit>, args=0xffffffc717d8, args@entry=0xffffffc717f8) at dlerror.c:140
#6  0x0000ffff81c731f0 in __dlsym (handle=<optimized out>, name=<optimized out>) at dlsym.c:70
#7  0x000000000310ee04 in (anonymous namespace)::dlsym_or_die (sym=0x606b260 "dlopen") at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:74
#8  0x000000000310ef1c in (anonymous namespace)::InitIfNecessary () at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:100
#9  0x000000000310f0b4 in dl_iterate_phdr (callback=0xffff81620d18 <_Unwind_IteratePhdrCallback>, data=0xffffffc71900) at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:158
#10 0x0000ffff816215b4 in _Unwind_Find_FDE (pc=0xffff8161f98f <_Unwind_Backtrace+79>, bases=bases@entry=0xffffffc72438) at ../../../gcc-7.5.0/libgcc/unwind-dw2-fde-dip.c:469
#11 0x0000ffff8161dfdc in uw_frame_state_for (context=context@entry=0xffffffc72110, fs=fs@entry=0xffffffc719f0) at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1249
#12 0x0000ffff8161ef3c in uw_init_context_1 (context=context@entry=0xffffffc72110, outer_cfa=0xffffffc72b50, outer_cfa@entry=0xffffffc72be0, outer_ra=0x55298d8 <GetStackTrace_libgcc(void**, int, int)+40>)
    at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1578
#13 0x0000ffff8161f990 in _Unwind_Backtrace (trace=0x5529a48 <libgcc_backtrace_helper(_Unwind_Context*, void*)>, trace_argument=0xffffffc72b68) at ../../../gcc-7.5.0/libgcc/unwind.inc:283
#14 0x00000000055298d8 in GetStackTrace_libgcc(void**, int, int) ()
#15 0x0000000005529db4 in GetStackTrace(void**, int, int) ()
#16 0x00000000055f891c in tcmalloc::PageHeap::GrowHeap(unsigned long) ()
{code}
I think this is same issue with [https://github.com/gperftools/gperftools/issues/1184] ,

because the issue will happen  when I tried with building gperftools both with libunwind and without libunwind .

 

And KUDU also have same issue:

https://issues.apache.org/jira/browse/KUDU-3072

I think the  solution in following link is not correct

[https://gerrit.cloudera.org/#/c/15420/]

in aarch64 , the method of getting stacktrace is not same with arm.

I think the correct solution of getting stacktrace is should like this:

[https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/stacktrace_aarch64-inl.inc]

 

But I think the gperftools maybe not the root cause of this issue, because both gperftools and libunwind now can support aarch64 perfectly.

Maybe this commit of kudu has some bug?

[https://github.com/apache/kudu/commit/b621f9c1a3949dc31ca4836b0767b2840fa73f29]

Because in x86, the gperftools will not use libunwind or libgcc to getstacktrace, so the issue will not happen.

I tried :
{code:java}
#if !defined(THREAD_SANITIZER) && !defined(__APPLE__)
#define HOOK_DL_ITERATE_PHDR 1
#endif
{code}
change to 
{code:java}
#if !defined(THREAD_SANITIZER) && !defined(__APPLE__) && !defined(__aarch64__)
#define HOOK_DL_ITERATE_PHDR 1
#endif{code}
The deadlock issue will not happen.

 

[~tarmstrong@cloudera.com] [~tlipcon] [~adar]

What do you think about this issue? how to fix it? any suggestion?

 

 

 


> DeadLock while run unifiedbetests on aarch64 platform
> -----------------------------------------------------
>
>                 Key: IMPALA-10088
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10088
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: zhaorenhai
>            Priority: Major
>
> When run unifiedbetests and impalad on aarch64 platform, when init tcmalloc, will happen deadlock.
> The stacktrace is as following:
>  
> {code:java}
> (gdb) bt
> #0  0x0000ffff83099544 in __GI___nanosleep (requested_time=0xffffffc71698, remaining=0x0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
> #1  0x00000000054cf144 in base::internal::SpinLockDelay (w=0x77385b0 <tcmalloc::Static::pageheap_lock_>, value=2, loop=727956) at /home/impala/impala/be/src/gutil/spinlock_linux-inl.h:86
> #2  0x0000000005529800 in SpinLock::SlowLock() ()
> #3  0x00000000055fb5c4 in tcmalloc::ThreadCache::InitModule() ()
> #4  0x0000000005743374 in tc_calloc ()
> #5  0x0000ffff81c737f4 in _dlerror_run (operate=operate@entry=0xffff81c73158 <dlsym_doit>, args=0xffffffc717d8, args@entry=0xffffffc717f8) at dlerror.c:140
> #6  0x0000ffff81c731f0 in __dlsym (handle=<optimized out>, name=<optimized out>) at dlsym.c:70
> #7  0x000000000310ee04 in (anonymous namespace)::dlsym_or_die (sym=0x606b260 "dlopen") at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:74
> #8  0x000000000310ef1c in (anonymous namespace)::InitIfNecessary () at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:100
> #9  0x000000000310f0b4 in dl_iterate_phdr (callback=0xffff81620d18 <_Unwind_IteratePhdrCallback>, data=0xffffffc71900) at /home/impala/impala/be/src/kudu/util/debug/unwind_safeness.cc:158
> #10 0x0000ffff816215b4 in _Unwind_Find_FDE (pc=0xffff8161f98f <_Unwind_Backtrace+79>, bases=bases@entry=0xffffffc72438) at ../../../gcc-7.5.0/libgcc/unwind-dw2-fde-dip.c:469
> #11 0x0000ffff8161dfdc in uw_frame_state_for (context=context@entry=0xffffffc72110, fs=fs@entry=0xffffffc719f0) at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1249
> #12 0x0000ffff8161ef3c in uw_init_context_1 (context=context@entry=0xffffffc72110, outer_cfa=0xffffffc72b50, outer_cfa@entry=0xffffffc72be0, outer_ra=0x55298d8 <GetStackTrace_libgcc(void**, int, int)+40>)
>     at ../../../gcc-7.5.0/libgcc/unwind-dw2.c:1578
> #13 0x0000ffff8161f990 in _Unwind_Backtrace (trace=0x5529a48 <libgcc_backtrace_helper(_Unwind_Context*, void*)>, trace_argument=0xffffffc72b68) at ../../../gcc-7.5.0/libgcc/unwind.inc:283
> #14 0x00000000055298d8 in GetStackTrace_libgcc(void**, int, int) ()
> #15 0x0000000005529db4 in GetStackTrace(void**, int, int) ()
> #16 0x00000000055f891c in tcmalloc::PageHeap::GrowHeap(unsigned long) ()
> {code}
> I think this is same issue with [https://github.com/gperftools/gperftools/issues/1184] ,
> because the issue will happen  when I building gperftools both with libunwind and without libunwind .
>  
> And KUDU also have same issue:
> https://issues.apache.org/jira/browse/KUDU-3072
> I think the  solution in following link is not correct
> [https://gerrit.cloudera.org/#/c/15420/]
> in aarch64 , the method of getting stacktrace is not same with arm.
> I think the correct solution of getting stacktrace is should like this:
> [https://github.com/abseil/abseil-cpp/blob/master/absl/debugging/internal/stacktrace_aarch64-inl.inc]
>  
> But I think the gperftools maybe not the root cause of this issue, because both gperftools and libunwind now can support aarch64 perfectly.
> Maybe this commit of kudu has some bug?
> [https://github.com/apache/kudu/commit/b621f9c1a3949dc31ca4836b0767b2840fa73f29]
> Because in x86, the gperftools will not use libunwind or libgcc to getstacktrace, so the issue will not happen.
> I tried :
> {code:java}
> #if !defined(THREAD_SANITIZER) && !defined(__APPLE__)
> #define HOOK_DL_ITERATE_PHDR 1
> #endif
> {code}
> change to 
> {code:java}
> #if !defined(THREAD_SANITIZER) && !defined(__APPLE__) && !defined(__aarch64__)
> #define HOOK_DL_ITERATE_PHDR 1
> #endif{code}
> The deadlock issue will not happen.
>  
> [~tarmstrong@cloudera.com] [~tlipcon] [~adar]
> What do you think about this issue? how to fix it? any suggestion?
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org