You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafodion.apache.org by "Zhu, Wen-Jun" <we...@esgyn.cn> on 2018/09/26 10:09:17 UTC
command `shell -c node info` blocks
Hi,
Recently I find that the `shell` program has blocked.
As I run
sqcheck
Which invokes
shell -c node info
it blocks.
After some debugging, I find that there are two threads within `shell`,
Stacks of one thread looks like this:
#0 0x0000007fb7e292fc in pthread_cond_wait@@GLIBC_2.17 () from /lib/aarch64-linux-gnu/libpthread.so.0
#1 0x000000000042ab54 in Local_IO_To_Monitor::wait_on_cv (this=0x600b40) at clio.cxx:2240
#2 0x0000000000429888 in Local_IO_To_Monitor::send_recv (this=0x600b40, pp_msg=0x7fb6bc95bc, pv_nw=false) at clio.cxx:1675
#3 0x000000000040aeec in attach (nid=0, name=0x5d0240 "SHELL", program=0x4e84c8 "shell") at shell.cxx:995
#4 0x0000000000421a58 in main (argc=4, argv=0x7fffff2d58) at shell.cxx:8849
Which is wait for `iv_sr_cv`
Stacks of the other thread:
#0 local_monitor_reader (pp_arg=0x63e3) at clio.cxx:285
#1 0x0000007fb7e22fb4 in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
Which wait `monitor` for the signal SQ_LIO_SIGNAL_REQUEST_REPLY.
If `monitor` send the signal, then `shell` would receive it, continue, and finish its job.
But `monitor` do not send the signal.
After some searching, I find that there is only one piece of code sending the signal:
513 pthread_kill(iv_worker_thread_id, SQ_LIO_SIGNAL_REQUEST_REPLY);
In function Local_IO_To_Monitor::~Local_IO_To_Monitor() of file core/sqf/monitor/linux/clio.cxx.
As my understanding, this function should be invoked in `monitor` program, but when I attach to that `monitor`,
whose pid is got from function `local_monitor_reader()`, and add a breakpoint on ` Local_IO_To_Monitor::~Local_IO_To_Monitor()`,
it does not break there.
So, what should the normal procedure be? Is it incorrect for `monitor` not to invoking Local_IO_To_Monitor::~Local_IO_To_Monitor() ?
Thank you.
Wenjun Zhu
RE: command `shell -c node info` blocks
Posted by Selva Govindarajan <se...@esgyn.com>.
The backup command can block any SQL query that would make changes to the database to ensure that the database is backed up in a consistent manner.
My guess is both drop and create should have been blocked and waiting for the backup to complete.
Selva
-----Original Message-----
From: Zhu, Wen-Jun <we...@esgyn.cn>
Sent: Sunday, September 30, 2018 2:58 AM
To: dev@trafodion.apache.org
Subject: 答复: command `shell -c node info` blocks
Hi,
There is another block:
trafodion@kylin:~$ offender -s active
EsgynDB Advanced Conversational Interface 2.4.5 Copyright (c) 2015-2018 Esgyn Corporation Interpreter has not been linked in.EXITING FROM layoutNativeCode() -could not create function !!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+>+>+>+>+>+>+>+>+>+>+>+>+>Interpreter has not been linked in.EXITING FROM layoutNativeCode() -could not create function !!
CURRENT_TIMESTAMP LAST_ACTIVITY_SECS QUERY_ID
EXECUTE_STATE SOURCE_TEXT
-------------------------- -------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2018-09-30 17:33:57.186697 94105 MXID11000005682212404965134575421000000000206U3333308T150000000_1450_SQL_CUR_29
EXECUTE "drop table TESTBIGINT_SIGNED
2018-09-30 17:33:57.186697 83196 MXID11000002449212404945321083089000000000606U3333308T150000000_1210_SQL_CUR_28
EXECUTE "backup trafodion, tag 'test3_incbk2', table(tab3), incremental,override
2018-09-30 17:33:57.186697 83152 MXID11000002449212404945321083089000000000606U3333308T150000000_1228_1077
EXECUTE "create table if not exists TRAFODION."_BACKUP_test3_incbk2_".TABLE_CONSTRAINTS like TRAFODION."_MD_".TABLE_CONSTRAINTS;
2018-09-30 17:33:57.186697 83151 MXID11000006502212404976246567184000000000106U3333308T150000000_1871_1871
EXECUTE "create table TRAFODION."_BACKUP_test3_incbk2_".TABLE_CONSTRAINTS ( "TABLE_UID" LARGEINT NO DEFAULT NOT NULL NOT DROPPABLE
NOT SERIALIZED , "CONSTRAINT_UID" LARGEINT NO DEFAULT NOT NULL NOT DROPPABLE
--- 4 row(s) selected.
And the compiler processes:
trafodion@kylin:~$ ps aux|grep tdm_arkcmp
trafodi+ 6253 0.1 0.9 1097316 314744 ? SNl 9月29 2:53 tdm_arkcmp SQMON1.1 00000 00000 006253 $Z00053N 172.16.20.18:47936 00004 00000 00235 00001 -guardian
trafodi+ 6454 0.1 0.9 1090060 323544 ? SNl 9月29 2:29 tdm_arkcmp SQMON1.1 00000 00000 006454 $Z00059E 172.16.20.18:47936 00004 00000 00236 00001 -guardian
trafodi+ 6502 0.2 0.9 1094252 317376 ? SNl 9月29 2:49 tdm_arkcmp SQMON1.1 00000 00000 006502 $Z0005AS 172.16.20.18:47936 00004 00000 00246 00001 -guardian
trafodi+ 6616 0.1 0.8 1052036 265712 ? SNl 9月29 1:57 tdm_arkcmp SQMON1.1 00000 00000 006616 $Z0005E1 172.16.20.18:47936 00004 00000 00237 00001 -guardian
trafodi+ 6692 0.2 0.9 1098312 303212 ? SNl 9月29 3:04 tdm_arkcmp SQMON1.1 00000 00000 006692 $Z0005G7 172.16.20.18:47936 00004 00000 00247 00001 -guardian
trafodi+ 7252 0.1 0.9 1092792 299088 ? SNl 9月29 2:42 tdm_arkcmp SQMON1.1 00000 00000 007252 $Z0005X7 172.16.20.18:47936 00004 00000 00248 00001 -guardian
trafodi+ 10234 0.1 0.7 1054868 259388 ? SNl 9月29 1:53 tdm_arkcmp SQMON1.1 00000 00000 010234 $Z0008CE 172.16.20.18:47936 00004 00000 00239 00001 -guardian
trafodi+ 13950 0.1 0.7 1051052 254552 ? SNl 9月29 1:39 tdm_arkcmp SQMON1.1 00000 00000 013950 $Z000BDK 172.16.20.18:47936 00004 00000 00249 00001 -guardian
trafodi+ 17307 0.0 0.0 10980 592 pts/13 S+ 17:39 0:00 grep tdm_arkcmp
trafodi+ 19708 1.3 0.9 1107432 308124 ? SNl 16:14 1:10 tdm_arkcmp SQMON1.1 00000 00000 019708 $Z000G33 172.16.20.18:47936 00004 00000 00263 00001 -guardian
trafodi+ 20169 1.2 0.9 1092104 301936 ? SNl 16:14 1:04 tdm_arkcmp SQMON1.1 00000 00000 020169 $Z000GG9 172.16.20.18:47936 00004 00000 00264 00001 -guardian
trafodi+ 20314 0.4 0.7 1052780 258584 ? SNl 16:15 0:25 tdm_arkcmp SQMON1.1 00000 00000 020314 $Z000GKE 172.16.20.18:47936 00004 00000 00265 00001 -guardian
trafodi+ 22355 0.3 0.7 1052176 253904 ? SNl 16:19 0:19 tdm_arkcmp SQMON1.1 00000 00000 022355 $Z000I8Q 172.16.20.18:47936 00004 00000 00266 00001 -guardian
And attach to process 22355:
(gdb) bt
#0 0x0000007fa7f55120 in pthread_join () from /lib/aarch64-linux-gnu/libpthread.so.0
#1 0x0000000000408f0c in main (argc=2, argv=0x7ffdcbc178) at ../bin/arkcmp.cpp:388
(gdb) f 1
#1 0x0000000000408f0c in main (argc=2, argv=0x7ffdcbc178) at ../bin/arkcmp.cpp:388
388 s = pthread_join(gv_main_thread_id, &res);
(gdb) p /x gv_main_thread_id
$1 = 0x7fa1d1a320
(gdb) thread 5
[Switching to thread 5 (Thread 0x7fa1d1a320 (LWP 22359))]
#0 0x0000007fa7f5a2fc in pthread_cond_wait@@GLIBC_2.17 () from /lib/aarch64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x0000007fa7f5a2fc in pthread_cond_wait@@GLIBC_2.17 () from /lib/aarch64-linux-gnu/libpthread.so.0
#1 0x0000007fa6afcf84 in SB_Thread::CV::wait (this=0x3283ec68) at /work/esgyndb/core/sqf/export/include/seabed/int/thread.inl:555
#2 0x0000007fa6afd018 in SB_Thread::CV::wait (this=0x3283ec68, pv_lock=true) at /work/esgyndb/core/sqf/export/include/seabed/int/thread.inl:590
#3 0x0000007fa6fa99a8 in SB_Ms_Event_Mgr::wait (this=0x3283eb70, pv_us=-1) at mseventmgr.inl:346
#4 0x0000007fa6fd088c in XWAIT_com (pv_mask=256, pv_time=-1, pv_residual=true) at pctl.cpp:982
#5 0x0000007fa6fd05dc in XWAIT (pv_mask=256, pv_time=-1) at pctl.cpp:878
#6 0x0000007fa6f32258 in fs_int_fs_file_awaitiox (pp_filenum=0x3283fdbc, ppp_buf=0x7fa1d18878, pp_xfercount=0x7fa1d18874, pp_tag=0x7fa1d18880, pv_timeout=-1, pp_segid=0x7fa1d1886e, pv_int=false, pv_ts=false) at fsi.cpp:1426
#7 0x0000007fa6f2afdc in BAWAITIOX (pp_filenum=0x3283fdbc, ppp_buf=0x7fa1d18b40, pp_xfercount=0x7fa1d18b04, pp_tag=0x7fa1d18b48, pv_timeout=-1, pp_segid=0x0) at fs.cpp:563
#8 0x0000007fab6e5698 in GuaReceiveControlConnection::wait (this=0x3283fda0, timeout=-1, eventConsumed=0x0, ipcAwaitiox=0x0) at ../common/IpcGuardian.cpp:2773
#9 0x0000007fab6e4378 in GuaConnectionToClient::wait (this=0x3283f040, timeout=-1, eventConsumed=0x0, ipcAwaitiox=0x0) at ../common/IpcGuardian.cpp:2252
#10 0x0000007fab6c5e3c in IpcWaitableSetOfConnections::waitOnSet (this=0x7fa1d19738, timeout=-1, calledByESP=0, timedout=0x0) at ../common/Ipc.cpp:2006
#11 0x0000007fab6c9418 in IpcMessageStream::waitOnMsgStream (this=0x7fa1d19608, timeout=-1) at ../common/Ipc.cpp:3593
#12 0x0000007fab6c9380 in IpcMessageStream::receive (this=0x7fa1d19608, waited=1) at ../common/Ipc.cpp:3575
#13 0x0000000000408c5c in thread_main (p_arg=0x0) at ../bin/arkcmp.cpp:326
#14 0x0000007fa7f53fb4 in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
#15 0x0000007fa6c1abd0 in ?? () from /lib/aarch64-linux-gnu/libc.so.6
It seems that thread 1 is waiting for thread 5, and thread 5 is waiting for something unknow.
As my understanding, this `tdm_arkcmp` process is short, and should not wait for something for a long time, right?
So what's wrong here? How can I find which thread(or process) is thread 5 waiting for?
-----邮件原件-----
发件人: Zhu, Wen-Jun <we...@esgyn.cn>
发送时间: 2018年9月26日 18:09
收件人: dev@trafodion.apache.org
主题: command `shell -c node info` blocks
Hi,
Recently I find that the `shell` program has blocked.
As I run
sqcheck
Which invokes
shell -c node info
it blocks.
After some debugging, I find that there are two threads within `shell`, Stacks of one thread looks like this:
#0 0x0000007fb7e292fc in pthread_cond_wait@@GLIBC_2.17 () from /lib/aarch64-linux-gnu/libpthread.so.0
#1 0x000000000042ab54 in Local_IO_To_Monitor::wait_on_cv (this=0x600b40) at clio.cxx:2240
#2 0x0000000000429888 in Local_IO_To_Monitor::send_recv (this=0x600b40, pp_msg=0x7fb6bc95bc, pv_nw=false) at clio.cxx:1675
#3 0x000000000040aeec in attach (nid=0, name=0x5d0240 "SHELL", program=0x4e84c8 "shell") at shell.cxx:995
#4 0x0000000000421a58 in main (argc=4, argv=0x7fffff2d58) at shell.cxx:8849 Which is wait for `iv_sr_cv`
Stacks of the other thread:
#0 local_monitor_reader (pp_arg=0x63e3) at clio.cxx:285
#1 0x0000007fb7e22fb4 in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
Which wait `monitor` for the signal SQ_LIO_SIGNAL_REQUEST_REPLY.
If `monitor` send the signal, then `shell` would receive it, continue, and finish its job.
But `monitor` do not send the signal.
After some searching, I find that there is only one piece of code sending the signal:
513 pthread_kill(iv_worker_thread_id, SQ_LIO_SIGNAL_REQUEST_REPLY);
In function Local_IO_To_Monitor::~Local_IO_To_Monitor() of file core/sqf/monitor/linux/clio.cxx.
As my understanding, this function should be invoked in `monitor` program, but when I attach to that `monitor`, whose pid is got from function `local_monitor_reader()`, and add a breakpoint on ` Local_IO_To_Monitor::~Local_IO_To_Monitor()`,
it does not break there.
So, what should the normal procedure be? Is it incorrect for `monitor` not to invoking Local_IO_To_Monitor::~Local_IO_To_Monitor() ?
Thank you.
Wenjun Zhu
答复: command `shell -c node info` blocks
Posted by "Zhu, Wen-Jun" <we...@esgyn.cn>.
Hi,
There is another block:
trafodion@kylin:~$ offender -s active
EsgynDB Advanced Conversational Interface 2.4.5
Copyright (c) 2015-2018 Esgyn Corporation
Interpreter has not been linked in.EXITING FROM layoutNativeCode() -could not create function !!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>+>+>+>+>+>+>+>+>+>+>+>+>+>Interpreter has not been linked in.EXITING FROM layoutNativeCode() -could not create function !!
CURRENT_TIMESTAMP LAST_ACTIVITY_SECS QUERY_ID
EXECUTE_STATE SOURCE_TEXT
-------------------------- -------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2018-09-30 17:33:57.186697 94105 MXID11000005682212404965134575421000000000206U3333308T150000000_1450_SQL_CUR_29
EXECUTE "drop table TESTBIGINT_SIGNED
2018-09-30 17:33:57.186697 83196 MXID11000002449212404945321083089000000000606U3333308T150000000_1210_SQL_CUR_28
EXECUTE "backup trafodion, tag 'test3_incbk2', table(tab3), incremental,override
2018-09-30 17:33:57.186697 83152 MXID11000002449212404945321083089000000000606U3333308T150000000_1228_1077
EXECUTE "create table if not exists TRAFODION."_BACKUP_test3_incbk2_".TABLE_CONSTRAINTS like TRAFODION."_MD_".TABLE_CONSTRAINTS;
2018-09-30 17:33:57.186697 83151 MXID11000006502212404976246567184000000000106U3333308T150000000_1871_1871
EXECUTE "create table TRAFODION."_BACKUP_test3_incbk2_".TABLE_CONSTRAINTS ( "TABLE_UID" LARGEINT NO DEFAULT NOT NULL NOT DROPPABLE
NOT SERIALIZED , "CONSTRAINT_UID" LARGEINT NO DEFAULT NOT NULL NOT DROPPABLE
--- 4 row(s) selected.
And the compiler processes:
trafodion@kylin:~$ ps aux|grep tdm_arkcmp
trafodi+ 6253 0.1 0.9 1097316 314744 ? SNl 9月29 2:53 tdm_arkcmp SQMON1.1 00000 00000 006253 $Z00053N 172.16.20.18:47936 00004 00000 00235 00001 -guardian
trafodi+ 6454 0.1 0.9 1090060 323544 ? SNl 9月29 2:29 tdm_arkcmp SQMON1.1 00000 00000 006454 $Z00059E 172.16.20.18:47936 00004 00000 00236 00001 -guardian
trafodi+ 6502 0.2 0.9 1094252 317376 ? SNl 9月29 2:49 tdm_arkcmp SQMON1.1 00000 00000 006502 $Z0005AS 172.16.20.18:47936 00004 00000 00246 00001 -guardian
trafodi+ 6616 0.1 0.8 1052036 265712 ? SNl 9月29 1:57 tdm_arkcmp SQMON1.1 00000 00000 006616 $Z0005E1 172.16.20.18:47936 00004 00000 00237 00001 -guardian
trafodi+ 6692 0.2 0.9 1098312 303212 ? SNl 9月29 3:04 tdm_arkcmp SQMON1.1 00000 00000 006692 $Z0005G7 172.16.20.18:47936 00004 00000 00247 00001 -guardian
trafodi+ 7252 0.1 0.9 1092792 299088 ? SNl 9月29 2:42 tdm_arkcmp SQMON1.1 00000 00000 007252 $Z0005X7 172.16.20.18:47936 00004 00000 00248 00001 -guardian
trafodi+ 10234 0.1 0.7 1054868 259388 ? SNl 9月29 1:53 tdm_arkcmp SQMON1.1 00000 00000 010234 $Z0008CE 172.16.20.18:47936 00004 00000 00239 00001 -guardian
trafodi+ 13950 0.1 0.7 1051052 254552 ? SNl 9月29 1:39 tdm_arkcmp SQMON1.1 00000 00000 013950 $Z000BDK 172.16.20.18:47936 00004 00000 00249 00001 -guardian
trafodi+ 17307 0.0 0.0 10980 592 pts/13 S+ 17:39 0:00 grep tdm_arkcmp
trafodi+ 19708 1.3 0.9 1107432 308124 ? SNl 16:14 1:10 tdm_arkcmp SQMON1.1 00000 00000 019708 $Z000G33 172.16.20.18:47936 00004 00000 00263 00001 -guardian
trafodi+ 20169 1.2 0.9 1092104 301936 ? SNl 16:14 1:04 tdm_arkcmp SQMON1.1 00000 00000 020169 $Z000GG9 172.16.20.18:47936 00004 00000 00264 00001 -guardian
trafodi+ 20314 0.4 0.7 1052780 258584 ? SNl 16:15 0:25 tdm_arkcmp SQMON1.1 00000 00000 020314 $Z000GKE 172.16.20.18:47936 00004 00000 00265 00001 -guardian
trafodi+ 22355 0.3 0.7 1052176 253904 ? SNl 16:19 0:19 tdm_arkcmp SQMON1.1 00000 00000 022355 $Z000I8Q 172.16.20.18:47936 00004 00000 00266 00001 -guardian
And attach to process 22355:
(gdb) bt
#0 0x0000007fa7f55120 in pthread_join () from /lib/aarch64-linux-gnu/libpthread.so.0
#1 0x0000000000408f0c in main (argc=2, argv=0x7ffdcbc178) at ../bin/arkcmp.cpp:388
(gdb) f 1
#1 0x0000000000408f0c in main (argc=2, argv=0x7ffdcbc178) at ../bin/arkcmp.cpp:388
388 s = pthread_join(gv_main_thread_id, &res);
(gdb) p /x gv_main_thread_id
$1 = 0x7fa1d1a320
(gdb) thread 5
[Switching to thread 5 (Thread 0x7fa1d1a320 (LWP 22359))]
#0 0x0000007fa7f5a2fc in pthread_cond_wait@@GLIBC_2.17 () from /lib/aarch64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x0000007fa7f5a2fc in pthread_cond_wait@@GLIBC_2.17 () from /lib/aarch64-linux-gnu/libpthread.so.0
#1 0x0000007fa6afcf84 in SB_Thread::CV::wait (this=0x3283ec68) at /work/esgyndb/core/sqf/export/include/seabed/int/thread.inl:555
#2 0x0000007fa6afd018 in SB_Thread::CV::wait (this=0x3283ec68, pv_lock=true) at /work/esgyndb/core/sqf/export/include/seabed/int/thread.inl:590
#3 0x0000007fa6fa99a8 in SB_Ms_Event_Mgr::wait (this=0x3283eb70, pv_us=-1) at mseventmgr.inl:346
#4 0x0000007fa6fd088c in XWAIT_com (pv_mask=256, pv_time=-1, pv_residual=true) at pctl.cpp:982
#5 0x0000007fa6fd05dc in XWAIT (pv_mask=256, pv_time=-1) at pctl.cpp:878
#6 0x0000007fa6f32258 in fs_int_fs_file_awaitiox (pp_filenum=0x3283fdbc, ppp_buf=0x7fa1d18878, pp_xfercount=0x7fa1d18874, pp_tag=0x7fa1d18880, pv_timeout=-1, pp_segid=0x7fa1d1886e, pv_int=false, pv_ts=false) at fsi.cpp:1426
#7 0x0000007fa6f2afdc in BAWAITIOX (pp_filenum=0x3283fdbc, ppp_buf=0x7fa1d18b40, pp_xfercount=0x7fa1d18b04, pp_tag=0x7fa1d18b48, pv_timeout=-1, pp_segid=0x0) at fs.cpp:563
#8 0x0000007fab6e5698 in GuaReceiveControlConnection::wait (this=0x3283fda0, timeout=-1, eventConsumed=0x0, ipcAwaitiox=0x0) at ../common/IpcGuardian.cpp:2773
#9 0x0000007fab6e4378 in GuaConnectionToClient::wait (this=0x3283f040, timeout=-1, eventConsumed=0x0, ipcAwaitiox=0x0) at ../common/IpcGuardian.cpp:2252
#10 0x0000007fab6c5e3c in IpcWaitableSetOfConnections::waitOnSet (this=0x7fa1d19738, timeout=-1, calledByESP=0, timedout=0x0) at ../common/Ipc.cpp:2006
#11 0x0000007fab6c9418 in IpcMessageStream::waitOnMsgStream (this=0x7fa1d19608, timeout=-1) at ../common/Ipc.cpp:3593
#12 0x0000007fab6c9380 in IpcMessageStream::receive (this=0x7fa1d19608, waited=1) at ../common/Ipc.cpp:3575
#13 0x0000000000408c5c in thread_main (p_arg=0x0) at ../bin/arkcmp.cpp:326
#14 0x0000007fa7f53fb4 in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
#15 0x0000007fa6c1abd0 in ?? () from /lib/aarch64-linux-gnu/libc.so.6
It seems that thread 1 is waiting for thread 5, and thread 5 is waiting for something unknow.
As my understanding, this `tdm_arkcmp` process is short, and should not wait for something for a long time, right?
So what's wrong here? How can I find which thread(or process) is thread 5 waiting for?
-----邮件原件-----
发件人: Zhu, Wen-Jun <we...@esgyn.cn>
发送时间: 2018年9月26日 18:09
收件人: dev@trafodion.apache.org
主题: command `shell -c node info` blocks
Hi,
Recently I find that the `shell` program has blocked.
As I run
sqcheck
Which invokes
shell -c node info
it blocks.
After some debugging, I find that there are two threads within `shell`, Stacks of one thread looks like this:
#0 0x0000007fb7e292fc in pthread_cond_wait@@GLIBC_2.17 () from /lib/aarch64-linux-gnu/libpthread.so.0
#1 0x000000000042ab54 in Local_IO_To_Monitor::wait_on_cv (this=0x600b40) at clio.cxx:2240
#2 0x0000000000429888 in Local_IO_To_Monitor::send_recv (this=0x600b40, pp_msg=0x7fb6bc95bc, pv_nw=false) at clio.cxx:1675
#3 0x000000000040aeec in attach (nid=0, name=0x5d0240 "SHELL", program=0x4e84c8 "shell") at shell.cxx:995
#4 0x0000000000421a58 in main (argc=4, argv=0x7fffff2d58) at shell.cxx:8849 Which is wait for `iv_sr_cv`
Stacks of the other thread:
#0 local_monitor_reader (pp_arg=0x63e3) at clio.cxx:285
#1 0x0000007fb7e22fb4 in start_thread () from /lib/aarch64-linux-gnu/libpthread.so.0
Which wait `monitor` for the signal SQ_LIO_SIGNAL_REQUEST_REPLY.
If `monitor` send the signal, then `shell` would receive it, continue, and finish its job.
But `monitor` do not send the signal.
After some searching, I find that there is only one piece of code sending the signal:
513 pthread_kill(iv_worker_thread_id, SQ_LIO_SIGNAL_REQUEST_REPLY);
In function Local_IO_To_Monitor::~Local_IO_To_Monitor() of file core/sqf/monitor/linux/clio.cxx.
As my understanding, this function should be invoked in `monitor` program, but when I attach to that `monitor`, whose pid is got from function `local_monitor_reader()`, and add a breakpoint on ` Local_IO_To_Monitor::~Local_IO_To_Monitor()`,
it does not break there.
So, what should the normal procedure be? Is it incorrect for `monitor` not to invoking Local_IO_To_Monitor::~Local_IO_To_Monitor() ?
Thank you.
Wenjun Zhu