You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafodion.apache.org by Dave Birdsall <da...@esgyn.com> on 2017/04/06 15:50:18 UTC
Hang in hive/TEST007?
Hi,
I was running Trafodion regressions on a workstation overnight and came in this morning to find them hung in test hive/TEST007, at the following statement:
>>insert into trafodion.sch007.vhive1 values (3,4);
--- 1 row(s) inserted.
>>select * from thive1;
A B
----------- -----------
1 2
3 4
--- 2 row(s) selected.
Notice the "--- 2 rows(s) selected" and then no sqlci prompt. Just stuck there.
Checking dtmci, there were no transactions active or outstanding.
I put the sqlci process into gdb, and looked at the stack trace:
(gdb) bt
#0 0x00007f1e9be5b2ad in pthread_join () from /lib64/libpthread.so.0
#1 0x00007f1e9e2c8c5b in ExLobGlobals::~ExLobGlobals (this=0x3c4e640,
__in_chrg=<value optimized out>) at ../exp/ExpLOBaccess.cpp:3176
#2 0x00007f1e9e2c717f in ExLobsOper (lobName=0x7f1e9eb666d7 "dummy", handleIn=
0x0, handleInLen=0, hdfsServer=0x0, hdfsPort=0, handleOut=0x0,
handleOutLen=@0x7ffc34b26334, descNumIn=0, descNumOut=@0x7ffc34b26340,
retOperLen=@0x7ffc34b26340, requestTagIn=0, requestTagOut=@0x7ffc34b26340,
requestStatus=@0x7ffc34b26348, cliError=@0x7ffc34b26338,
lobStorageLocation=0x0, storage=Lob_HDFS_File, source=0x0, sourceLen=0,
cursorBytes=0, cursorId=0x0, operation=Lob_Cleanup, subOperation=Lob_None,
waited=1, globPtr=@0x7f1ea49a7de0, transId=0, blackBox=0x7f1ea4963ea0,
blackBoxLen=0, lobMaxSize=0, lobMaxChunkMemSize=0, lobGCLimit=0,
bufferSize=0, replication=0, blockSize=0, openType=0)
at ../exp/ExpLOBaccess.cpp:2670
#3 0x00007f1e9e2cfdcf in ExpLOBinterfaceCleanup (exLobGlob=@0x7f1ea49a7de0,
lobHeap=0x7f1ea4963ea0) at ../exp/ExpLOBinterface.cpp:201
#4 0x00007f1ea01480cf in ExHdfsScanTcb::freeResources (this=0x7f1ea49a7cd8)
at ../executor/ExHdfsScan.cpp:283
#5 0x00007f1ea0147b21 in ExHdfsScanTcb::~ExHdfsScanTcb (this=0x7f1ea49a7cd8,
__in_chrg=<value optimized out>) at ../executor/ExHdfsScan.cpp:224
#6 0x00007f1ea0147c50 in ExHdfsScanTcb::~ExHdfsScanTcb (this=0x7f1ea49a7cd8,
__in_chrg=<value optimized out>) at ../executor/ExHdfsScan.cpp:225
#7 0x00007f1e9fefcce5 in ex_globals::cleanupTcbs (this=0x7f1ea4983b48)
---Type <return> to continue, or q <return> to quit---
at ../executor/ex_globals.cpp:199
#8 0x00007f1e9fefcac1 in ex_globals::deleteMe (this=0x7f1ea4983b48,
fatalError=0) at ../executor/ex_globals.cpp:143
#9 0x00007f1e9fed9600 in ExExeStmtGlobals::deleteMe (this=0x7f1ea4983b48,
fatalError=0) at ../executor/ex_exe_stmt_globals.cpp:321
#10 0x00007f1e9feda06d in ExMasterStmtGlobals::deleteMe (this=0x7f1ea4983b48,
fatalError=0) at ../executor/ex_exe_stmt_globals.cpp:672
#11 0x00007f1e9ff41f42 in ex_root_tcb::deallocAndDelete (this=0x7f1ea49a8660,
glob=0x7f1ea4983b48, fragTable=0x7f1ea49a7980)
at ../executor/ex_root.cpp:2470
#12 0x00007f1ea1922345 in CliStatement::releaseTcbs (this=0x7f1ea4963de0,
closeAllOpens=0) at ../cli/Statement.cpp:4372
#13 0x00007f1ea1922506 in CliStatement::dealloc (this=0x7f1ea4963de0,
closeAllOpens=0) at ../cli/Statement.cpp:4420
#14 0x00007f1ea1917caa in CliStatement::~CliStatement (this=0x7f1ea4963de0,
__in_chrg=<value optimized out>) at ../cli/Statement.cpp:575
#15 0x00007f1ea19187bc in CliStatement::~CliStatement (this=0x7f1ea4963de0,
__in_chrg=<value optimized out>) at ../cli/Statement.cpp:745
#16 0x00007f1ea18cad3e in ContextCli::deallocStmt (this=0x7f1ea49c2120,
statement_id=0x3a14d70, deallocStaticStmt=0) at ../cli/Context.cpp:2348
#17 0x00007f1ea1890c89 in SQLCLI_DeallocStmt (cliGlobals=0x296bfe0,
statement_id=0x3a14d70) at ../cli/Cli.cpp:1660
#18 0x00007f1ea1935309 in SQL_EXEC_DeallocStmt (statement_id=0x3a14d70)
---Type <return> to continue, or q <return> to quit---
at ../cli/CliExtern.cpp:1832
#19 0x00007f1ea42622c9 in SqlCmd::deallocate (sqlci_env=0x2954a10, prep_stmt=
0x3bd67b0) at ../sqlci/SqlCmd.cpp:2854
#20 0x00007f1ea426283c in DML::process (this=0x489e460, sqlci_env=0x2954a10)
at ../sqlci/SqlCmd.cpp:2988
#21 0x00007f1ea424300c in Obey::process (this=0x623ca00, sqlci_env=0x2954a10)
at ../sqlci/Obey.cpp:267
#22 0x00007f1ea424300c in Obey::process (this=0x2939090, sqlci_env=0x2954a10)
at ../sqlci/Obey.cpp:267
#23 0x00007f1ea424bb43 in SqlciEnv::run (this=0x2954a10, in_filename=
0x7ffc34b291f3 "TEST007.tmp", input_string=0x0)
at ../sqlci/SqlciEnv.cpp:720
#24 0x000000000040272b in main (argc=2, argv=0x7ffc34b26f38)
at ../bin/SqlciMain.cpp:329
(gdb)
So, we are waiting for some other thread to complete which never completes. Looking at the threads in the process I see:
(gdb) info thread
37 Thread 0x7f1e93799700 (LWP 9140) 0x00007f1ea1c964fd in accept ()
from /lib64/libc.so.6
36 Thread 0x7f1e92d98700 (LWP 9141) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
35 Thread 0x7f1e9046c700 (LWP 9142) 0x00007f1ea1be071d in sigtimedwait ()
from /lib64/libc.so.6
34 Thread 0x7f1e8f96a700 (LWP 9157) 0x00007f1ea1c95f33 in epoll_wait ()
from /lib64/libc.so.6
33 Thread 0x7f1e8b5e7700 (LWP 9168) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
32 Thread 0x7f1e8b4e6700 (LWP 9169) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
31 Thread 0x7f1e8b3e5700 (LWP 9170) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
30 Thread 0x7f1e8b2e4700 (LWP 9171) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
29 Thread 0x7f1e8b1e3700 (LWP 9172) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
28 Thread 0x7f1e8b0e2700 (LWP 9173) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
27 Thread 0x7f1e8afe1700 (LWP 9174) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
26 Thread 0x7f1e8aee0700 (LWP 9175) 0x00007f1e9be5e63c in pthread_cond_wait@@---Type <return> to continue, or q <return> to quit---
GLIBC_2.3.2 () from /lib64/libpthread.so.0
25 Thread 0x7f1e87824700 (LWP 9177) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
24 Thread 0x7f1e87723700 (LWP 9180) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
23 Thread 0x7f1e87622700 (LWP 9182) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
22 Thread 0x7f1e81690700 (LWP 9183) 0x00007f1e9be609b0 in sem_wait ()
from /lib64/libpthread.so.0
21 Thread 0x7f1e8158f700 (LWP 9184) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
20 Thread 0x7f1e8148e700 (LWP 9193) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
19 Thread 0x7f1e8138d700 (LWP 9195) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
18 Thread 0x7f1e8128c700 (LWP 9196) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
17 Thread 0x7f1e7e3ba700 (LWP 9247) 0x00007f1ea1c95f33 in epoll_wait ()
from /lib64/libc.so.6
16 Thread 0x7f1e7e0b8700 (LWP 9248) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
15 Thread 0x7f1e7dfb7700 (LWP 9301) 0x00007f1ea1c8c113 in poll ()
from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
14 Thread 0x7f1e7a712700 (LWP 10456) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
13 Thread 0x7f1e7a20f700 (LWP 10528) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
12 Thread 0x7f1e7a10e700 (LWP 10531) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
11 Thread 0x7f1e7a00d700 (LWP 10532) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
10 Thread 0x7f1e79f0c700 (LWP 10722) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
9 Thread 0x7f1e79e0b700 (LWP 10723) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
8 Thread 0x7f1e79d0a700 (LWP 10724) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
7 Thread 0x7f1e7ad18700 (LWP 14325) 0x00007f1ea1c95f33 in epoll_wait ()
from /lib64/libc.so.6
6 Thread 0x7f1e7b01b700 (LWP 14326) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
5 Thread 0x7f1e7b21d700 (LWP 14328) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
4 Thread 0x7f1e7b31e700 (LWP 14329) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
3 Thread 0x7f1e7ba26700 (LWP 14336) 0x00007f1e9be5ea0e in pthread_cond_timedw---Type <return> to continue, or q <return> to quit---
ait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7f1e7cea0700 (LWP 16913) 0x00007f1e9be612e4 in __lll_lock_wait ()
from /lib64/libpthread.so.0
* 1 Thread 0x7f1ea4aa42c0 (LWP 9113) 0x00007f1e9be5b2ad in pthread_join ()
from /lib64/libpthread.so.0
(gdb)
Everything is waiting. I'm not sure what thread is interesting to look at but thread 2 is different. Looking at that one:
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f1e7cea0700 (LWP 16913))]#0 0x00007f1e9be612e4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007f1e9be612e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f1e9be5c588 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f1e9be5c457 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f1e9e2c913c in ExLobLock::lock (this=0x3c52480)
at ../exp/ExpLOBaccess.cpp:3281
#4 0x00007f1e9e2c80e6 in ExLobGlobals::performRequest (this=0x3c4e640,
request=0x3c53a90) at ../exp/ExpLOBaccess.cpp:2963
#5 0x00007f1e9e2c96fb in ExLobGlobals::doWorkInThread (this=0x3c4e640)
at ../exp/ExpLOBaccess.cpp:3441
#6 0x00007f1e9e2c8fc9 in workerThreadMain (arg=0x3c4e640)
at ../exp/ExpLOBaccess.cpp:3235
#7 0x00007f1e9be5aa51 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f1ea1c9593d in clone () from /lib64/libc.so.6
At this point I killed the process and let the tests continue.
Is this a known bug?
Thanks,
Dave
RE: Hang in hive/TEST007?
Posted by Sandhya Sundaresan <sa...@esgyn.com>.
The stck that Dave sees has been showing up on official builds once in a while. There are 2 threads doing hdfs access and one of them seems to be accessing resources that the other has destructed. It used to show up often during hdfsReads but I had made a fix so it can never happen during the active reading phase. This needs to be investigated by looking at the code ot see if there is a window where this can happen during the destructor. Perhaps we need to let each thread finish it's work or something. But from a quick look at the ExLobGlobals destructor, it seems to be doing that today. So I am not sure...haven't had the chance to spend time on it yet but will do so in a few days.
Thanks
Sandhya
-----Original Message-----
From: Roberta Marton [mailto:roberta.marton@esgyn.com]
Sent: Thursday, April 6, 2017 9:31 AM
To: dev@trafodion.incubator.apache.org
Subject: RE: Hang in hive/TEST007?
This test failed for me in a slightly different way but maybe they are related.
The hive constructor for NATable is not initializing the variable (prototype_) so once in a while the call to the destructor cores.
I noticed that you were trying to call the destructor for a different object.
If the NATable destructor actually did memory deletes for prototype_ on an invalid address perhaps it caused some kind of memory corruption. Later, when then destructor for ExLobGlobals is called, memory was invalid.
Roberta
-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Thursday, April 6, 2017 8:50 AM
To: dev@trafodion.incubator.apache.org
Subject: Hang in hive/TEST007?
Hi,
I was running Trafodion regressions on a workstation overnight and came in this morning to find them hung in test hive/TEST007, at the following statement:
>>insert into trafodion.sch007.vhive1 values (3,4);
--- 1 row(s) inserted.
>>select * from thive1;
A B
----------- -----------
1 2
3 4
--- 2 row(s) selected.
Notice the "--- 2 rows(s) selected" and then no sqlci prompt. Just stuck there.
Checking dtmci, there were no transactions active or outstanding.
I put the sqlci process into gdb, and looked at the stack trace:
(gdb) bt
#0 0x00007f1e9be5b2ad in pthread_join () from /lib64/libpthread.so.0
#1 0x00007f1e9e2c8c5b in ExLobGlobals::~ExLobGlobals (this=0x3c4e640,
__in_chrg=<value optimized out>) at ../exp/ExpLOBaccess.cpp:3176
#2 0x00007f1e9e2c717f in ExLobsOper (lobName=0x7f1e9eb666d7 "dummy", handleIn=
0x0, handleInLen=0, hdfsServer=0x0, hdfsPort=0, handleOut=0x0,
handleOutLen=@0x7ffc34b26334, descNumIn=0, descNumOut=@0x7ffc34b26340,
retOperLen=@0x7ffc34b26340, requestTagIn=0, requestTagOut=@0x7ffc34b26340,
requestStatus=@0x7ffc34b26348, cliError=@0x7ffc34b26338,
lobStorageLocation=0x0, storage=Lob_HDFS_File, source=0x0, sourceLen=0,
cursorBytes=0, cursorId=0x0, operation=Lob_Cleanup, subOperation=Lob_None,
waited=1, globPtr=@0x7f1ea49a7de0, transId=0, blackBox=0x7f1ea4963ea0,
blackBoxLen=0, lobMaxSize=0, lobMaxChunkMemSize=0, lobGCLimit=0,
bufferSize=0, replication=0, blockSize=0, openType=0)
at ../exp/ExpLOBaccess.cpp:2670
#3 0x00007f1e9e2cfdcf in ExpLOBinterfaceCleanup (exLobGlob=@0x7f1ea49a7de0,
lobHeap=0x7f1ea4963ea0) at ../exp/ExpLOBinterface.cpp:201
#4 0x00007f1ea01480cf in ExHdfsScanTcb::freeResources (this=0x7f1ea49a7cd8)
at ../executor/ExHdfsScan.cpp:283
#5 0x00007f1ea0147b21 in ExHdfsScanTcb::~ExHdfsScanTcb (this=0x7f1ea49a7cd8,
__in_chrg=<value optimized out>) at ../executor/ExHdfsScan.cpp:224
#6 0x00007f1ea0147c50 in ExHdfsScanTcb::~ExHdfsScanTcb (this=0x7f1ea49a7cd8,
__in_chrg=<value optimized out>) at ../executor/ExHdfsScan.cpp:225
#7 0x00007f1e9fefcce5 in ex_globals::cleanupTcbs (this=0x7f1ea4983b48) ---Type <return> to continue, or q <return> to quit---
at ../executor/ex_globals.cpp:199
#8 0x00007f1e9fefcac1 in ex_globals::deleteMe (this=0x7f1ea4983b48,
fatalError=0) at ../executor/ex_globals.cpp:143
#9 0x00007f1e9fed9600 in ExExeStmtGlobals::deleteMe (this=0x7f1ea4983b48,
fatalError=0) at ../executor/ex_exe_stmt_globals.cpp:321
#10 0x00007f1e9feda06d in ExMasterStmtGlobals::deleteMe (this=0x7f1ea4983b48,
fatalError=0) at ../executor/ex_exe_stmt_globals.cpp:672
#11 0x00007f1e9ff41f42 in ex_root_tcb::deallocAndDelete (this=0x7f1ea49a8660,
glob=0x7f1ea4983b48, fragTable=0x7f1ea49a7980)
at ../executor/ex_root.cpp:2470
#12 0x00007f1ea1922345 in CliStatement::releaseTcbs (this=0x7f1ea4963de0,
closeAllOpens=0) at ../cli/Statement.cpp:4372
#13 0x00007f1ea1922506 in CliStatement::dealloc (this=0x7f1ea4963de0,
closeAllOpens=0) at ../cli/Statement.cpp:4420
#14 0x00007f1ea1917caa in CliStatement::~CliStatement (this=0x7f1ea4963de0,
__in_chrg=<value optimized out>) at ../cli/Statement.cpp:575
#15 0x00007f1ea19187bc in CliStatement::~CliStatement (this=0x7f1ea4963de0,
__in_chrg=<value optimized out>) at ../cli/Statement.cpp:745
#16 0x00007f1ea18cad3e in ContextCli::deallocStmt (this=0x7f1ea49c2120,
statement_id=0x3a14d70, deallocStaticStmt=0) at ../cli/Context.cpp:2348
#17 0x00007f1ea1890c89 in SQLCLI_DeallocStmt (cliGlobals=0x296bfe0,
statement_id=0x3a14d70) at ../cli/Cli.cpp:1660
#18 0x00007f1ea1935309 in SQL_EXEC_DeallocStmt (statement_id=0x3a14d70) ---Type <return> to continue, or q <return> to quit---
at ../cli/CliExtern.cpp:1832
#19 0x00007f1ea42622c9 in SqlCmd::deallocate (sqlci_env=0x2954a10, prep_stmt=
0x3bd67b0) at ../sqlci/SqlCmd.cpp:2854
#20 0x00007f1ea426283c in DML::process (this=0x489e460, sqlci_env=0x2954a10)
at ../sqlci/SqlCmd.cpp:2988
#21 0x00007f1ea424300c in Obey::process (this=0x623ca00, sqlci_env=0x2954a10)
at ../sqlci/Obey.cpp:267
#22 0x00007f1ea424300c in Obey::process (this=0x2939090, sqlci_env=0x2954a10)
at ../sqlci/Obey.cpp:267
#23 0x00007f1ea424bb43 in SqlciEnv::run (this=0x2954a10, in_filename=
0x7ffc34b291f3 "TEST007.tmp", input_string=0x0)
at ../sqlci/SqlciEnv.cpp:720
#24 0x000000000040272b in main (argc=2, argv=0x7ffc34b26f38)
at ../bin/SqlciMain.cpp:329
(gdb)
So, we are waiting for some other thread to complete which never completes. Looking at the threads in the process I see:
(gdb) info thread
37 Thread 0x7f1e93799700 (LWP 9140) 0x00007f1ea1c964fd in accept ()
from /lib64/libc.so.6
36 Thread 0x7f1e92d98700 (LWP 9141) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
35 Thread 0x7f1e9046c700 (LWP 9142) 0x00007f1ea1be071d in sigtimedwait ()
from /lib64/libc.so.6
34 Thread 0x7f1e8f96a700 (LWP 9157) 0x00007f1ea1c95f33 in epoll_wait ()
from /lib64/libc.so.6
33 Thread 0x7f1e8b5e7700 (LWP 9168) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
32 Thread 0x7f1e8b4e6700 (LWP 9169) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
31 Thread 0x7f1e8b3e5700 (LWP 9170) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
30 Thread 0x7f1e8b2e4700 (LWP 9171) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
29 Thread 0x7f1e8b1e3700 (LWP 9172) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
28 Thread 0x7f1e8b0e2700 (LWP 9173) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
27 Thread 0x7f1e8afe1700 (LWP 9174) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
26 Thread 0x7f1e8aee0700 (LWP 9175) 0x00007f1e9be5e63c in pthread_cond_wait@@---Type <return> to continue, or q <return> to quit---
GLIBC_2.3.2 () from /lib64/libpthread.so.0
25 Thread 0x7f1e87824700 (LWP 9177) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
24 Thread 0x7f1e87723700 (LWP 9180) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
23 Thread 0x7f1e87622700 (LWP 9182) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
22 Thread 0x7f1e81690700 (LWP 9183) 0x00007f1e9be609b0 in sem_wait ()
from /lib64/libpthread.so.0
21 Thread 0x7f1e8158f700 (LWP 9184) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
20 Thread 0x7f1e8148e700 (LWP 9193) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
19 Thread 0x7f1e8138d700 (LWP 9195) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
18 Thread 0x7f1e8128c700 (LWP 9196) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
17 Thread 0x7f1e7e3ba700 (LWP 9247) 0x00007f1ea1c95f33 in epoll_wait ()
from /lib64/libc.so.6
16 Thread 0x7f1e7e0b8700 (LWP 9248) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
15 Thread 0x7f1e7dfb7700 (LWP 9301) 0x00007f1ea1c8c113 in poll ()
from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
14 Thread 0x7f1e7a712700 (LWP 10456) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
13 Thread 0x7f1e7a20f700 (LWP 10528) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
12 Thread 0x7f1e7a10e700 (LWP 10531) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
11 Thread 0x7f1e7a00d700 (LWP 10532) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
10 Thread 0x7f1e79f0c700 (LWP 10722) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
9 Thread 0x7f1e79e0b700 (LWP 10723) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
8 Thread 0x7f1e79d0a700 (LWP 10724) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
7 Thread 0x7f1e7ad18700 (LWP 14325) 0x00007f1ea1c95f33 in epoll_wait ()
from /lib64/libc.so.6
6 Thread 0x7f1e7b01b700 (LWP 14326) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
5 Thread 0x7f1e7b21d700 (LWP 14328) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
4 Thread 0x7f1e7b31e700 (LWP 14329) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
3 Thread 0x7f1e7ba26700 (LWP 14336) 0x00007f1e9be5ea0e in pthread_cond_timedw---Type <return> to continue, or q <return> to quit---
ait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7f1e7cea0700 (LWP 16913) 0x00007f1e9be612e4 in __lll_lock_wait ()
from /lib64/libpthread.so.0
* 1 Thread 0x7f1ea4aa42c0 (LWP 9113) 0x00007f1e9be5b2ad in pthread_join ()
from /lib64/libpthread.so.0
(gdb)
Everything is waiting. I'm not sure what thread is interesting to look at but thread 2 is different. Looking at that one:
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f1e7cea0700 (LWP 16913))]#0 0x00007f1e9be612e4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007f1e9be612e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f1e9be5c588 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f1e9be5c457 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f1e9e2c913c in ExLobLock::lock (this=0x3c52480)
at ../exp/ExpLOBaccess.cpp:3281
#4 0x00007f1e9e2c80e6 in ExLobGlobals::performRequest (this=0x3c4e640,
request=0x3c53a90) at ../exp/ExpLOBaccess.cpp:2963
#5 0x00007f1e9e2c96fb in ExLobGlobals::doWorkInThread (this=0x3c4e640)
at ../exp/ExpLOBaccess.cpp:3441
#6 0x00007f1e9e2c8fc9 in workerThreadMain (arg=0x3c4e640)
at ../exp/ExpLOBaccess.cpp:3235
#7 0x00007f1e9be5aa51 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f1ea1c9593d in clone () from /lib64/libc.so.6
At this point I killed the process and let the tests continue.
Is this a known bug?
Thanks,
Dave
RE: Hang in hive/TEST007?
Posted by Roberta Marton <ro...@esgyn.com>.
This test failed for me in a slightly different way but maybe they are related.
The hive constructor for NATable is not initializing the variable (prototype_) so once in a while the call to the destructor cores.
I noticed that you were trying to call the destructor for a different object.
If the NATable destructor actually did memory deletes for prototype_ on an invalid address perhaps it caused some kind of memory corruption. Later, when then destructor for ExLobGlobals is called, memory was invalid.
Roberta
-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Thursday, April 6, 2017 8:50 AM
To: dev@trafodion.incubator.apache.org
Subject: Hang in hive/TEST007?
Hi,
I was running Trafodion regressions on a workstation overnight and came in this morning to find them hung in test hive/TEST007, at the following statement:
>>insert into trafodion.sch007.vhive1 values (3,4);
--- 1 row(s) inserted.
>>select * from thive1;
A B
----------- -----------
1 2
3 4
--- 2 row(s) selected.
Notice the "--- 2 rows(s) selected" and then no sqlci prompt. Just stuck there.
Checking dtmci, there were no transactions active or outstanding.
I put the sqlci process into gdb, and looked at the stack trace:
(gdb) bt
#0 0x00007f1e9be5b2ad in pthread_join () from /lib64/libpthread.so.0
#1 0x00007f1e9e2c8c5b in ExLobGlobals::~ExLobGlobals (this=0x3c4e640,
__in_chrg=<value optimized out>) at ../exp/ExpLOBaccess.cpp:3176
#2 0x00007f1e9e2c717f in ExLobsOper (lobName=0x7f1e9eb666d7 "dummy", handleIn=
0x0, handleInLen=0, hdfsServer=0x0, hdfsPort=0, handleOut=0x0,
handleOutLen=@0x7ffc34b26334, descNumIn=0, descNumOut=@0x7ffc34b26340,
retOperLen=@0x7ffc34b26340, requestTagIn=0, requestTagOut=@0x7ffc34b26340,
requestStatus=@0x7ffc34b26348, cliError=@0x7ffc34b26338,
lobStorageLocation=0x0, storage=Lob_HDFS_File, source=0x0, sourceLen=0,
cursorBytes=0, cursorId=0x0, operation=Lob_Cleanup, subOperation=Lob_None,
waited=1, globPtr=@0x7f1ea49a7de0, transId=0, blackBox=0x7f1ea4963ea0,
blackBoxLen=0, lobMaxSize=0, lobMaxChunkMemSize=0, lobGCLimit=0,
bufferSize=0, replication=0, blockSize=0, openType=0)
at ../exp/ExpLOBaccess.cpp:2670
#3 0x00007f1e9e2cfdcf in ExpLOBinterfaceCleanup (exLobGlob=@0x7f1ea49a7de0,
lobHeap=0x7f1ea4963ea0) at ../exp/ExpLOBinterface.cpp:201
#4 0x00007f1ea01480cf in ExHdfsScanTcb::freeResources (this=0x7f1ea49a7cd8)
at ../executor/ExHdfsScan.cpp:283
#5 0x00007f1ea0147b21 in ExHdfsScanTcb::~ExHdfsScanTcb (this=0x7f1ea49a7cd8,
__in_chrg=<value optimized out>) at ../executor/ExHdfsScan.cpp:224
#6 0x00007f1ea0147c50 in ExHdfsScanTcb::~ExHdfsScanTcb (this=0x7f1ea49a7cd8,
__in_chrg=<value optimized out>) at ../executor/ExHdfsScan.cpp:225
#7 0x00007f1e9fefcce5 in ex_globals::cleanupTcbs (this=0x7f1ea4983b48)
---Type <return> to continue, or q <return> to quit---
at ../executor/ex_globals.cpp:199
#8 0x00007f1e9fefcac1 in ex_globals::deleteMe (this=0x7f1ea4983b48,
fatalError=0) at ../executor/ex_globals.cpp:143
#9 0x00007f1e9fed9600 in ExExeStmtGlobals::deleteMe (this=0x7f1ea4983b48,
fatalError=0) at ../executor/ex_exe_stmt_globals.cpp:321
#10 0x00007f1e9feda06d in ExMasterStmtGlobals::deleteMe (this=0x7f1ea4983b48,
fatalError=0) at ../executor/ex_exe_stmt_globals.cpp:672
#11 0x00007f1e9ff41f42 in ex_root_tcb::deallocAndDelete (this=0x7f1ea49a8660,
glob=0x7f1ea4983b48, fragTable=0x7f1ea49a7980)
at ../executor/ex_root.cpp:2470
#12 0x00007f1ea1922345 in CliStatement::releaseTcbs (this=0x7f1ea4963de0,
closeAllOpens=0) at ../cli/Statement.cpp:4372
#13 0x00007f1ea1922506 in CliStatement::dealloc (this=0x7f1ea4963de0,
closeAllOpens=0) at ../cli/Statement.cpp:4420
#14 0x00007f1ea1917caa in CliStatement::~CliStatement (this=0x7f1ea4963de0,
__in_chrg=<value optimized out>) at ../cli/Statement.cpp:575
#15 0x00007f1ea19187bc in CliStatement::~CliStatement (this=0x7f1ea4963de0,
__in_chrg=<value optimized out>) at ../cli/Statement.cpp:745
#16 0x00007f1ea18cad3e in ContextCli::deallocStmt (this=0x7f1ea49c2120,
statement_id=0x3a14d70, deallocStaticStmt=0) at ../cli/Context.cpp:2348
#17 0x00007f1ea1890c89 in SQLCLI_DeallocStmt (cliGlobals=0x296bfe0,
statement_id=0x3a14d70) at ../cli/Cli.cpp:1660
#18 0x00007f1ea1935309 in SQL_EXEC_DeallocStmt (statement_id=0x3a14d70)
---Type <return> to continue, or q <return> to quit---
at ../cli/CliExtern.cpp:1832
#19 0x00007f1ea42622c9 in SqlCmd::deallocate (sqlci_env=0x2954a10, prep_stmt=
0x3bd67b0) at ../sqlci/SqlCmd.cpp:2854
#20 0x00007f1ea426283c in DML::process (this=0x489e460, sqlci_env=0x2954a10)
at ../sqlci/SqlCmd.cpp:2988
#21 0x00007f1ea424300c in Obey::process (this=0x623ca00, sqlci_env=0x2954a10)
at ../sqlci/Obey.cpp:267
#22 0x00007f1ea424300c in Obey::process (this=0x2939090, sqlci_env=0x2954a10)
at ../sqlci/Obey.cpp:267
#23 0x00007f1ea424bb43 in SqlciEnv::run (this=0x2954a10, in_filename=
0x7ffc34b291f3 "TEST007.tmp", input_string=0x0)
at ../sqlci/SqlciEnv.cpp:720
#24 0x000000000040272b in main (argc=2, argv=0x7ffc34b26f38)
at ../bin/SqlciMain.cpp:329
(gdb)
So, we are waiting for some other thread to complete which never completes. Looking at the threads in the process I see:
(gdb) info thread
37 Thread 0x7f1e93799700 (LWP 9140) 0x00007f1ea1c964fd in accept ()
from /lib64/libc.so.6
36 Thread 0x7f1e92d98700 (LWP 9141) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
35 Thread 0x7f1e9046c700 (LWP 9142) 0x00007f1ea1be071d in sigtimedwait ()
from /lib64/libc.so.6
34 Thread 0x7f1e8f96a700 (LWP 9157) 0x00007f1ea1c95f33 in epoll_wait ()
from /lib64/libc.so.6
33 Thread 0x7f1e8b5e7700 (LWP 9168) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
32 Thread 0x7f1e8b4e6700 (LWP 9169) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
31 Thread 0x7f1e8b3e5700 (LWP 9170) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
30 Thread 0x7f1e8b2e4700 (LWP 9171) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
29 Thread 0x7f1e8b1e3700 (LWP 9172) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
28 Thread 0x7f1e8b0e2700 (LWP 9173) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
27 Thread 0x7f1e8afe1700 (LWP 9174) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
26 Thread 0x7f1e8aee0700 (LWP 9175) 0x00007f1e9be5e63c in pthread_cond_wait@@---Type <return> to continue, or q <return> to quit---
GLIBC_2.3.2 () from /lib64/libpthread.so.0
25 Thread 0x7f1e87824700 (LWP 9177) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
24 Thread 0x7f1e87723700 (LWP 9180) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
23 Thread 0x7f1e87622700 (LWP 9182) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
22 Thread 0x7f1e81690700 (LWP 9183) 0x00007f1e9be609b0 in sem_wait ()
from /lib64/libpthread.so.0
21 Thread 0x7f1e8158f700 (LWP 9184) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
20 Thread 0x7f1e8148e700 (LWP 9193) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
19 Thread 0x7f1e8138d700 (LWP 9195) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
18 Thread 0x7f1e8128c700 (LWP 9196) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
17 Thread 0x7f1e7e3ba700 (LWP 9247) 0x00007f1ea1c95f33 in epoll_wait ()
from /lib64/libc.so.6
16 Thread 0x7f1e7e0b8700 (LWP 9248) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
15 Thread 0x7f1e7dfb7700 (LWP 9301) 0x00007f1ea1c8c113 in poll ()
from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
14 Thread 0x7f1e7a712700 (LWP 10456) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
13 Thread 0x7f1e7a20f700 (LWP 10528) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
12 Thread 0x7f1e7a10e700 (LWP 10531) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
11 Thread 0x7f1e7a00d700 (LWP 10532) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
10 Thread 0x7f1e79f0c700 (LWP 10722) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
9 Thread 0x7f1e79e0b700 (LWP 10723) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
8 Thread 0x7f1e79d0a700 (LWP 10724) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
7 Thread 0x7f1e7ad18700 (LWP 14325) 0x00007f1ea1c95f33 in epoll_wait ()
from /lib64/libc.so.6
6 Thread 0x7f1e7b01b700 (LWP 14326) 0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
5 Thread 0x7f1e7b21d700 (LWP 14328) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
4 Thread 0x7f1e7b31e700 (LWP 14329) 0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
3 Thread 0x7f1e7ba26700 (LWP 14336) 0x00007f1e9be5ea0e in pthread_cond_timedw---Type <return> to continue, or q <return> to quit---
ait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
2 Thread 0x7f1e7cea0700 (LWP 16913) 0x00007f1e9be612e4 in __lll_lock_wait ()
from /lib64/libpthread.so.0
* 1 Thread 0x7f1ea4aa42c0 (LWP 9113) 0x00007f1e9be5b2ad in pthread_join ()
from /lib64/libpthread.so.0
(gdb)
Everything is waiting. I'm not sure what thread is interesting to look at but thread 2 is different. Looking at that one:
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f1e7cea0700 (LWP 16913))]#0 0x00007f1e9be612e4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007f1e9be612e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f1e9be5c588 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f1e9be5c457 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f1e9e2c913c in ExLobLock::lock (this=0x3c52480)
at ../exp/ExpLOBaccess.cpp:3281
#4 0x00007f1e9e2c80e6 in ExLobGlobals::performRequest (this=0x3c4e640,
request=0x3c53a90) at ../exp/ExpLOBaccess.cpp:2963
#5 0x00007f1e9e2c96fb in ExLobGlobals::doWorkInThread (this=0x3c4e640)
at ../exp/ExpLOBaccess.cpp:3441
#6 0x00007f1e9e2c8fc9 in workerThreadMain (arg=0x3c4e640)
at ../exp/ExpLOBaccess.cpp:3235
#7 0x00007f1e9be5aa51 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f1ea1c9593d in clone () from /lib64/libc.so.6
At this point I killed the process and let the tests continue.
Is this a known bug?
Thanks,
Dave