You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hawq.apache.org by "Lei Chang (JIRA)" <ji...@apache.org> on 2016/01/24 02:23:40 UTC

[jira] [Updated] (HAWQ-252) Coredump When RM Reconnect libyarn

     [ https://issues.apache.org/jira/browse/HAWQ-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lei Chang updated HAWQ-252:
---------------------------
    Fix Version/s: 2.0.0

> Coredump When RM Reconnect libyarn
> ----------------------------------
>
>                 Key: HAWQ-252
>                 URL: https://issues.apache.org/jira/browse/HAWQ-252
>             Project: Apache HAWQ
>          Issue Type: Bug
>          Components: Resource Manager
>            Reporter: Lin Wen
>            Assignee: Lin Wen
>             Fix For: 2.0.0
>
>
> Coredump When RM Reconnect libyarn
> Missing separate debuginfos, use: debuginfo-install hawq-2.0.0.0_beta-19011.x86_64
> (gdb) bt
> #0  0x0000000000e661f8 in std::string::_Rep::_S_empty_rep_storage ()
> #1  0x00007f7f1f20947c in libyarn::LibYarnClient::dummyAllocate (this=<value optimized out>)
>     at /data1/pulse2-agent/agents/agent1/work/LIBYARN-main-opt/rhel5_x86_64/src/libyarnclient/LibYarnClient.cpp:330
> #2  0x00007f7f1f209988 in libyarn::heartbeatFunc (args=<value optimized out>)
>     at /data1/pulse2-agent/agents/agent1/work/LIBYARN-main-opt/rhel5_x86_64/src/libyarnclient/LibYarnClient.cpp:114
> #3  0x000000350b4079d1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000350b0e8b6d in clone () from /lib64/libc.so.6
> (gdb) info thread
>   4 Thread 0x7f7efc239700 (LWP 760442)  0x000000350b40b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>   3 Thread 0x7f7f1a1758c0 (LWP 760441)  0x000000350b0accdd in nanosleep () from /lib64/libc.so.6
>   2 Thread 0x7f7efae37700 (LWP 760797)  0x000000350b0accdd in nanosleep () from /lib64/libc.so.6
> * 1 Thread 0x7f7efb838700 (LWP 760443)  0x0000000000e661f8 in std::string::_Rep::_S_empty_rep_storage ()
> (gdb) thread 2
> [Switching to thread 2 (Thread 0x7f7efae37700 (LWP 760797))]#0  0x000000350b0accdd in nanosleep () from /lib64/libc.so.6
> (gdb) bt
> #0  0x000000350b0accdd in nanosleep () from /lib64/libc.so.6
> #1  0x000000350b0e1e54 in usleep () from /lib64/libc.so.6
> #2  0x00007f7f1f209999 in libyarn::heartbeatFunc (args=<value optimized out>)
>     at /data1/pulse2-agent/agents/agent1/work/LIBYARN-main-opt/rhel5_x86_64/src/libyarnclient/LibYarnClient.cpp:131
> #3  0x000000350b4079d1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000350b0e8b6d in clone () from /lib64/libc.so.6
> (gdb) thread 3
> [Switching to thread 3 (Thread 0x7f7f1a1758c0 (LWP 760441))]#0  0x000000350b0accdd in nanosleep () from /lib64/libc.so.6
> (gdb) bt
> #0  0x000000350b0accdd in nanosleep () from /lib64/libc.so.6
> #1  0x000000350b0e1e54 in usleep () from /lib64/libc.so.6
> #2  0x00000000008dd8b9 in RB2YARN_registerYARNApplication () at resourcebroker_LIBYARN_proc.c:1354
> #3  0x00000000008df8ad in RB2YARN_initializeConnection () at resourcebroker_LIBYARN_proc.c:1270
> #4  0x00000000008dfc93 in ResBrokerMainInternal () at resourcebroker_LIBYARN_proc.c:202
> #5  0x00000000008dff79 in ResBrokerMain () at resourcebroker_LIBYARN_proc.c:157
> #6  0x00000000008dc246 in RB_LIBYARN_start (isforked=<value optimized out>) at resourcebroker_LIBYARN.c:153
> #7  0x0000000000903bda in MainHandlerLoop () at resourcemanager.c:531
> #8  0x00000000009041f1 in ResManagerMainServer2ndPhase () at resourcemanager.c:508
> #9  0x0000000000904624 in ResManagerMain (argc=<value optimized out>, argv=<value optimized out>) at resourcemanager.c:330
> #10 0x00000000009049b1 in ResManagerProcessStartup () at resourcemanager.c:402
> #11 0x0000000000764b08 in CommenceNormalOperations () at postmaster.c:3616
> #12 0x00000000007659c2 in do_reaper () at postmaster.c:3964
> #13 0x000000000076a01d in ServerLoop () at postmaster.c:2102
> #14 0x000000000076bb5e in PostmasterMain (argc=9, argv=0x32a15b0) at postmaster.c:1421
> #15 0x00000000006c691a in main (argc=9, argv=0x32a1570) at main.c:226
> There are two heartbeat thread at this moment, which means one heartbeat thread hasn't be canceled when RM reconnects libyarn.
> In function ResBrokerMainInternal(), from line:270, should cancel the heartbeat thread before call RB2YARN_disconnectFromYARN 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)