You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafodion.apache.org by Dave Birdsall <da...@esgyn.com> on 2017/04/06 15:50:18 UTC

Hang in hive/TEST007?

Hi,

I was running Trafodion regressions on a workstation overnight and came in this morning to find them hung in test hive/TEST007, at the following statement:

>>insert into trafodion.sch007.vhive1 values (3,4);

--- 1 row(s) inserted.
>>select * from thive1;

A            B
-----------  -----------

          1            2
          3            4

--- 2 row(s) selected.

Notice the "--- 2 rows(s) selected" and then no sqlci prompt. Just stuck there.

Checking dtmci, there were no transactions active or outstanding.

I put the sqlci process into gdb, and looked at the stack trace:

(gdb) bt
#0  0x00007f1e9be5b2ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f1e9e2c8c5b in ExLobGlobals::~ExLobGlobals (this=0x3c4e640,
    __in_chrg=<value optimized out>) at ../exp/ExpLOBaccess.cpp:3176
#2  0x00007f1e9e2c717f in ExLobsOper (lobName=0x7f1e9eb666d7 "dummy", handleIn=
    0x0, handleInLen=0, hdfsServer=0x0, hdfsPort=0, handleOut=0x0,
    handleOutLen=@0x7ffc34b26334, descNumIn=0, descNumOut=@0x7ffc34b26340,
    retOperLen=@0x7ffc34b26340, requestTagIn=0, requestTagOut=@0x7ffc34b26340,
    requestStatus=@0x7ffc34b26348, cliError=@0x7ffc34b26338,
    lobStorageLocation=0x0, storage=Lob_HDFS_File, source=0x0, sourceLen=0,
    cursorBytes=0, cursorId=0x0, operation=Lob_Cleanup, subOperation=Lob_None,
    waited=1, globPtr=@0x7f1ea49a7de0, transId=0, blackBox=0x7f1ea4963ea0,
    blackBoxLen=0, lobMaxSize=0, lobMaxChunkMemSize=0, lobGCLimit=0,
    bufferSize=0, replication=0, blockSize=0, openType=0)
    at ../exp/ExpLOBaccess.cpp:2670
#3  0x00007f1e9e2cfdcf in ExpLOBinterfaceCleanup (exLobGlob=@0x7f1ea49a7de0,
    lobHeap=0x7f1ea4963ea0) at ../exp/ExpLOBinterface.cpp:201
#4  0x00007f1ea01480cf in ExHdfsScanTcb::freeResources (this=0x7f1ea49a7cd8)
    at ../executor/ExHdfsScan.cpp:283
#5  0x00007f1ea0147b21 in ExHdfsScanTcb::~ExHdfsScanTcb (this=0x7f1ea49a7cd8,
    __in_chrg=<value optimized out>) at ../executor/ExHdfsScan.cpp:224
#6  0x00007f1ea0147c50 in ExHdfsScanTcb::~ExHdfsScanTcb (this=0x7f1ea49a7cd8,
    __in_chrg=<value optimized out>) at ../executor/ExHdfsScan.cpp:225
#7  0x00007f1e9fefcce5 in ex_globals::cleanupTcbs (this=0x7f1ea4983b48)
---Type <return> to continue, or q <return> to quit---
    at ../executor/ex_globals.cpp:199
#8  0x00007f1e9fefcac1 in ex_globals::deleteMe (this=0x7f1ea4983b48,
    fatalError=0) at ../executor/ex_globals.cpp:143
#9  0x00007f1e9fed9600 in ExExeStmtGlobals::deleteMe (this=0x7f1ea4983b48,
    fatalError=0) at ../executor/ex_exe_stmt_globals.cpp:321
#10 0x00007f1e9feda06d in ExMasterStmtGlobals::deleteMe (this=0x7f1ea4983b48,
    fatalError=0) at ../executor/ex_exe_stmt_globals.cpp:672
#11 0x00007f1e9ff41f42 in ex_root_tcb::deallocAndDelete (this=0x7f1ea49a8660,
    glob=0x7f1ea4983b48, fragTable=0x7f1ea49a7980)
    at ../executor/ex_root.cpp:2470
#12 0x00007f1ea1922345 in CliStatement::releaseTcbs (this=0x7f1ea4963de0,
    closeAllOpens=0) at ../cli/Statement.cpp:4372
#13 0x00007f1ea1922506 in CliStatement::dealloc (this=0x7f1ea4963de0,
    closeAllOpens=0) at ../cli/Statement.cpp:4420
#14 0x00007f1ea1917caa in CliStatement::~CliStatement (this=0x7f1ea4963de0,
    __in_chrg=<value optimized out>) at ../cli/Statement.cpp:575
#15 0x00007f1ea19187bc in CliStatement::~CliStatement (this=0x7f1ea4963de0,
    __in_chrg=<value optimized out>) at ../cli/Statement.cpp:745
#16 0x00007f1ea18cad3e in ContextCli::deallocStmt (this=0x7f1ea49c2120,
    statement_id=0x3a14d70, deallocStaticStmt=0) at ../cli/Context.cpp:2348
#17 0x00007f1ea1890c89 in SQLCLI_DeallocStmt (cliGlobals=0x296bfe0,
    statement_id=0x3a14d70) at ../cli/Cli.cpp:1660
#18 0x00007f1ea1935309 in SQL_EXEC_DeallocStmt (statement_id=0x3a14d70)
---Type <return> to continue, or q <return> to quit---
    at ../cli/CliExtern.cpp:1832
#19 0x00007f1ea42622c9 in SqlCmd::deallocate (sqlci_env=0x2954a10, prep_stmt=
    0x3bd67b0) at ../sqlci/SqlCmd.cpp:2854
#20 0x00007f1ea426283c in DML::process (this=0x489e460, sqlci_env=0x2954a10)
    at ../sqlci/SqlCmd.cpp:2988
#21 0x00007f1ea424300c in Obey::process (this=0x623ca00, sqlci_env=0x2954a10)
    at ../sqlci/Obey.cpp:267
#22 0x00007f1ea424300c in Obey::process (this=0x2939090, sqlci_env=0x2954a10)
    at ../sqlci/Obey.cpp:267
#23 0x00007f1ea424bb43 in SqlciEnv::run (this=0x2954a10, in_filename=
    0x7ffc34b291f3 "TEST007.tmp", input_string=0x0)
    at ../sqlci/SqlciEnv.cpp:720
#24 0x000000000040272b in main (argc=2, argv=0x7ffc34b26f38)
    at ../bin/SqlciMain.cpp:329
(gdb)

So, we are waiting for some other thread to complete which never completes. Looking at the threads in the process I see:

(gdb) info thread
  37 Thread 0x7f1e93799700 (LWP 9140)  0x00007f1ea1c964fd in accept ()
   from /lib64/libc.so.6
  36 Thread 0x7f1e92d98700 (LWP 9141)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  35 Thread 0x7f1e9046c700 (LWP 9142)  0x00007f1ea1be071d in sigtimedwait ()
   from /lib64/libc.so.6
  34 Thread 0x7f1e8f96a700 (LWP 9157)  0x00007f1ea1c95f33 in epoll_wait ()
   from /lib64/libc.so.6
  33 Thread 0x7f1e8b5e7700 (LWP 9168)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  32 Thread 0x7f1e8b4e6700 (LWP 9169)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  31 Thread 0x7f1e8b3e5700 (LWP 9170)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  30 Thread 0x7f1e8b2e4700 (LWP 9171)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  29 Thread 0x7f1e8b1e3700 (LWP 9172)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  28 Thread 0x7f1e8b0e2700 (LWP 9173)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  27 Thread 0x7f1e8afe1700 (LWP 9174)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  26 Thread 0x7f1e8aee0700 (LWP 9175)  0x00007f1e9be5e63c in pthread_cond_wait@@---Type <return> to continue, or q <return> to quit---
GLIBC_2.3.2 () from /lib64/libpthread.so.0
  25 Thread 0x7f1e87824700 (LWP 9177)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  24 Thread 0x7f1e87723700 (LWP 9180)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  23 Thread 0x7f1e87622700 (LWP 9182)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  22 Thread 0x7f1e81690700 (LWP 9183)  0x00007f1e9be609b0 in sem_wait ()
   from /lib64/libpthread.so.0
  21 Thread 0x7f1e8158f700 (LWP 9184)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  20 Thread 0x7f1e8148e700 (LWP 9193)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  19 Thread 0x7f1e8138d700 (LWP 9195)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  18 Thread 0x7f1e8128c700 (LWP 9196)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  17 Thread 0x7f1e7e3ba700 (LWP 9247)  0x00007f1ea1c95f33 in epoll_wait ()
   from /lib64/libc.so.6
  16 Thread 0x7f1e7e0b8700 (LWP 9248)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  15 Thread 0x7f1e7dfb7700 (LWP 9301)  0x00007f1ea1c8c113 in poll ()
   from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
  14 Thread 0x7f1e7a712700 (LWP 10456)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  13 Thread 0x7f1e7a20f700 (LWP 10528)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  12 Thread 0x7f1e7a10e700 (LWP 10531)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11 Thread 0x7f1e7a00d700 (LWP 10532)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10 Thread 0x7f1e79f0c700 (LWP 10722)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9 Thread 0x7f1e79e0b700 (LWP 10723)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8 Thread 0x7f1e79d0a700 (LWP 10724)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7 Thread 0x7f1e7ad18700 (LWP 14325)  0x00007f1ea1c95f33 in epoll_wait ()
   from /lib64/libc.so.6
  6 Thread 0x7f1e7b01b700 (LWP 14326)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x7f1e7b21d700 (LWP 14328)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 Thread 0x7f1e7b31e700 (LWP 14329)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 Thread 0x7f1e7ba26700 (LWP 14336)  0x00007f1e9be5ea0e in pthread_cond_timedw---Type <return> to continue, or q <return> to quit---
ait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7f1e7cea0700 (LWP 16913)  0x00007f1e9be612e4 in __lll_lock_wait ()
   from /lib64/libpthread.so.0
* 1 Thread 0x7f1ea4aa42c0 (LWP 9113)  0x00007f1e9be5b2ad in pthread_join ()
   from /lib64/libpthread.so.0
(gdb)

Everything is waiting. I'm not sure what thread is interesting to look at but thread 2 is different.  Looking at that one:

(gdb) thread 2
[Switching to thread 2 (Thread 0x7f1e7cea0700 (LWP 16913))]#0  0x00007f1e9be612e4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f1e9be612e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f1e9be5c588 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f1e9be5c457 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f1e9e2c913c in ExLobLock::lock (this=0x3c52480)
    at ../exp/ExpLOBaccess.cpp:3281
#4  0x00007f1e9e2c80e6 in ExLobGlobals::performRequest (this=0x3c4e640,
    request=0x3c53a90) at ../exp/ExpLOBaccess.cpp:2963
#5  0x00007f1e9e2c96fb in ExLobGlobals::doWorkInThread (this=0x3c4e640)
    at ../exp/ExpLOBaccess.cpp:3441
#6  0x00007f1e9e2c8fc9 in workerThreadMain (arg=0x3c4e640)
    at ../exp/ExpLOBaccess.cpp:3235
#7  0x00007f1e9be5aa51 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f1ea1c9593d in clone () from /lib64/libc.so.6

At this point I killed the process and let the tests continue.

Is this a known bug?

Thanks,

Dave

RE: Hang in hive/TEST007?

Posted by Sandhya Sundaresan <sa...@esgyn.com>.
The stck that Dave sees has been showing up on official builds once in a while. There are 2 threads doing hdfs access and one of them seems to be accessing resources that the other has destructed. It used to show up often during hdfsReads but I had made a fix so it can never happen during the active reading phase. This needs to be investigated by looking at the code ot see if there is a window where this can happen during the destructor. Perhaps we need to let each thread finish it's work or something. But from a quick look at the ExLobGlobals destructor, it seems to be doing that today. So I am not sure...haven't had the chance to spend time on it yet but will do so in a few days. 
Thanks
Sandhya

-----Original Message-----
From: Roberta Marton [mailto:roberta.marton@esgyn.com] 
Sent: Thursday, April 6, 2017 9:31 AM
To: dev@trafodion.incubator.apache.org
Subject: RE: Hang in hive/TEST007?

This test failed for me in a slightly different way but maybe they are related.
The hive constructor for NATable is not initializing the variable (prototype_) so once in a while the call to the destructor cores.
I noticed that you were trying to call the destructor for a different object. 
 If the NATable destructor actually did memory deletes for prototype_ on an invalid address perhaps it caused some kind of memory corruption.  Later, when then destructor for ExLobGlobals is called, memory was invalid.

     Roberta 
 
-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Thursday, April 6, 2017 8:50 AM
To: dev@trafodion.incubator.apache.org
Subject: Hang in hive/TEST007?

Hi,

I was running Trafodion regressions on a workstation overnight and came in this morning to find them hung in test hive/TEST007, at the following statement:

>>insert into trafodion.sch007.vhive1 values (3,4);

--- 1 row(s) inserted.
>>select * from thive1;

A            B
-----------  -----------

          1            2
          3            4

--- 2 row(s) selected.

Notice the "--- 2 rows(s) selected" and then no sqlci prompt. Just stuck there.

Checking dtmci, there were no transactions active or outstanding.

I put the sqlci process into gdb, and looked at the stack trace:

(gdb) bt
#0  0x00007f1e9be5b2ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f1e9e2c8c5b in ExLobGlobals::~ExLobGlobals (this=0x3c4e640,
    __in_chrg=<value optimized out>) at ../exp/ExpLOBaccess.cpp:3176
#2  0x00007f1e9e2c717f in ExLobsOper (lobName=0x7f1e9eb666d7 "dummy", handleIn=
    0x0, handleInLen=0, hdfsServer=0x0, hdfsPort=0, handleOut=0x0,
    handleOutLen=@0x7ffc34b26334, descNumIn=0, descNumOut=@0x7ffc34b26340,
    retOperLen=@0x7ffc34b26340, requestTagIn=0, requestTagOut=@0x7ffc34b26340,
    requestStatus=@0x7ffc34b26348, cliError=@0x7ffc34b26338,
    lobStorageLocation=0x0, storage=Lob_HDFS_File, source=0x0, sourceLen=0,
    cursorBytes=0, cursorId=0x0, operation=Lob_Cleanup, subOperation=Lob_None,
    waited=1, globPtr=@0x7f1ea49a7de0, transId=0, blackBox=0x7f1ea4963ea0,
    blackBoxLen=0, lobMaxSize=0, lobMaxChunkMemSize=0, lobGCLimit=0,
    bufferSize=0, replication=0, blockSize=0, openType=0)
    at ../exp/ExpLOBaccess.cpp:2670
#3  0x00007f1e9e2cfdcf in ExpLOBinterfaceCleanup (exLobGlob=@0x7f1ea49a7de0,
    lobHeap=0x7f1ea4963ea0) at ../exp/ExpLOBinterface.cpp:201
#4  0x00007f1ea01480cf in ExHdfsScanTcb::freeResources (this=0x7f1ea49a7cd8)
    at ../executor/ExHdfsScan.cpp:283
#5  0x00007f1ea0147b21 in ExHdfsScanTcb::~ExHdfsScanTcb (this=0x7f1ea49a7cd8,
    __in_chrg=<value optimized out>) at ../executor/ExHdfsScan.cpp:224
#6  0x00007f1ea0147c50 in ExHdfsScanTcb::~ExHdfsScanTcb (this=0x7f1ea49a7cd8,
    __in_chrg=<value optimized out>) at ../executor/ExHdfsScan.cpp:225
#7  0x00007f1e9fefcce5 in ex_globals::cleanupTcbs (this=0x7f1ea4983b48) ---Type <return> to continue, or q <return> to quit---
    at ../executor/ex_globals.cpp:199
#8  0x00007f1e9fefcac1 in ex_globals::deleteMe (this=0x7f1ea4983b48,
    fatalError=0) at ../executor/ex_globals.cpp:143
#9  0x00007f1e9fed9600 in ExExeStmtGlobals::deleteMe (this=0x7f1ea4983b48,
    fatalError=0) at ../executor/ex_exe_stmt_globals.cpp:321
#10 0x00007f1e9feda06d in ExMasterStmtGlobals::deleteMe (this=0x7f1ea4983b48,
    fatalError=0) at ../executor/ex_exe_stmt_globals.cpp:672
#11 0x00007f1e9ff41f42 in ex_root_tcb::deallocAndDelete (this=0x7f1ea49a8660,
    glob=0x7f1ea4983b48, fragTable=0x7f1ea49a7980)
    at ../executor/ex_root.cpp:2470
#12 0x00007f1ea1922345 in CliStatement::releaseTcbs (this=0x7f1ea4963de0,
    closeAllOpens=0) at ../cli/Statement.cpp:4372
#13 0x00007f1ea1922506 in CliStatement::dealloc (this=0x7f1ea4963de0,
    closeAllOpens=0) at ../cli/Statement.cpp:4420
#14 0x00007f1ea1917caa in CliStatement::~CliStatement (this=0x7f1ea4963de0,
    __in_chrg=<value optimized out>) at ../cli/Statement.cpp:575
#15 0x00007f1ea19187bc in CliStatement::~CliStatement (this=0x7f1ea4963de0,
    __in_chrg=<value optimized out>) at ../cli/Statement.cpp:745
#16 0x00007f1ea18cad3e in ContextCli::deallocStmt (this=0x7f1ea49c2120,
    statement_id=0x3a14d70, deallocStaticStmt=0) at ../cli/Context.cpp:2348
#17 0x00007f1ea1890c89 in SQLCLI_DeallocStmt (cliGlobals=0x296bfe0,
    statement_id=0x3a14d70) at ../cli/Cli.cpp:1660
#18 0x00007f1ea1935309 in SQL_EXEC_DeallocStmt (statement_id=0x3a14d70) ---Type <return> to continue, or q <return> to quit---
    at ../cli/CliExtern.cpp:1832
#19 0x00007f1ea42622c9 in SqlCmd::deallocate (sqlci_env=0x2954a10, prep_stmt=
    0x3bd67b0) at ../sqlci/SqlCmd.cpp:2854
#20 0x00007f1ea426283c in DML::process (this=0x489e460, sqlci_env=0x2954a10)
    at ../sqlci/SqlCmd.cpp:2988
#21 0x00007f1ea424300c in Obey::process (this=0x623ca00, sqlci_env=0x2954a10)
    at ../sqlci/Obey.cpp:267
#22 0x00007f1ea424300c in Obey::process (this=0x2939090, sqlci_env=0x2954a10)
    at ../sqlci/Obey.cpp:267
#23 0x00007f1ea424bb43 in SqlciEnv::run (this=0x2954a10, in_filename=
    0x7ffc34b291f3 "TEST007.tmp", input_string=0x0)
    at ../sqlci/SqlciEnv.cpp:720
#24 0x000000000040272b in main (argc=2, argv=0x7ffc34b26f38)
    at ../bin/SqlciMain.cpp:329
(gdb)

So, we are waiting for some other thread to complete which never completes. Looking at the threads in the process I see:

(gdb) info thread
  37 Thread 0x7f1e93799700 (LWP 9140)  0x00007f1ea1c964fd in accept ()
   from /lib64/libc.so.6
  36 Thread 0x7f1e92d98700 (LWP 9141)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  35 Thread 0x7f1e9046c700 (LWP 9142)  0x00007f1ea1be071d in sigtimedwait ()
   from /lib64/libc.so.6
  34 Thread 0x7f1e8f96a700 (LWP 9157)  0x00007f1ea1c95f33 in epoll_wait ()
   from /lib64/libc.so.6
  33 Thread 0x7f1e8b5e7700 (LWP 9168)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  32 Thread 0x7f1e8b4e6700 (LWP 9169)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  31 Thread 0x7f1e8b3e5700 (LWP 9170)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  30 Thread 0x7f1e8b2e4700 (LWP 9171)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  29 Thread 0x7f1e8b1e3700 (LWP 9172)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  28 Thread 0x7f1e8b0e2700 (LWP 9173)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  27 Thread 0x7f1e8afe1700 (LWP 9174)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  26 Thread 0x7f1e8aee0700 (LWP 9175)  0x00007f1e9be5e63c in pthread_cond_wait@@---Type <return> to continue, or q <return> to quit---
GLIBC_2.3.2 () from /lib64/libpthread.so.0
  25 Thread 0x7f1e87824700 (LWP 9177)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  24 Thread 0x7f1e87723700 (LWP 9180)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  23 Thread 0x7f1e87622700 (LWP 9182)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  22 Thread 0x7f1e81690700 (LWP 9183)  0x00007f1e9be609b0 in sem_wait ()
   from /lib64/libpthread.so.0
  21 Thread 0x7f1e8158f700 (LWP 9184)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  20 Thread 0x7f1e8148e700 (LWP 9193)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  19 Thread 0x7f1e8138d700 (LWP 9195)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  18 Thread 0x7f1e8128c700 (LWP 9196)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  17 Thread 0x7f1e7e3ba700 (LWP 9247)  0x00007f1ea1c95f33 in epoll_wait ()
   from /lib64/libc.so.6
  16 Thread 0x7f1e7e0b8700 (LWP 9248)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  15 Thread 0x7f1e7dfb7700 (LWP 9301)  0x00007f1ea1c8c113 in poll ()
   from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
  14 Thread 0x7f1e7a712700 (LWP 10456)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  13 Thread 0x7f1e7a20f700 (LWP 10528)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  12 Thread 0x7f1e7a10e700 (LWP 10531)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11 Thread 0x7f1e7a00d700 (LWP 10532)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10 Thread 0x7f1e79f0c700 (LWP 10722)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9 Thread 0x7f1e79e0b700 (LWP 10723)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8 Thread 0x7f1e79d0a700 (LWP 10724)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7 Thread 0x7f1e7ad18700 (LWP 14325)  0x00007f1ea1c95f33 in epoll_wait ()
   from /lib64/libc.so.6
  6 Thread 0x7f1e7b01b700 (LWP 14326)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x7f1e7b21d700 (LWP 14328)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 Thread 0x7f1e7b31e700 (LWP 14329)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 Thread 0x7f1e7ba26700 (LWP 14336)  0x00007f1e9be5ea0e in pthread_cond_timedw---Type <return> to continue, or q <return> to quit---
ait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7f1e7cea0700 (LWP 16913)  0x00007f1e9be612e4 in __lll_lock_wait ()
   from /lib64/libpthread.so.0
* 1 Thread 0x7f1ea4aa42c0 (LWP 9113)  0x00007f1e9be5b2ad in pthread_join ()
   from /lib64/libpthread.so.0
(gdb)

Everything is waiting. I'm not sure what thread is interesting to look at but thread 2 is different.  Looking at that one:

(gdb) thread 2
[Switching to thread 2 (Thread 0x7f1e7cea0700 (LWP 16913))]#0  0x00007f1e9be612e4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f1e9be612e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f1e9be5c588 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f1e9be5c457 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f1e9e2c913c in ExLobLock::lock (this=0x3c52480)
    at ../exp/ExpLOBaccess.cpp:3281
#4  0x00007f1e9e2c80e6 in ExLobGlobals::performRequest (this=0x3c4e640,
    request=0x3c53a90) at ../exp/ExpLOBaccess.cpp:2963
#5  0x00007f1e9e2c96fb in ExLobGlobals::doWorkInThread (this=0x3c4e640)
    at ../exp/ExpLOBaccess.cpp:3441
#6  0x00007f1e9e2c8fc9 in workerThreadMain (arg=0x3c4e640)
    at ../exp/ExpLOBaccess.cpp:3235
#7  0x00007f1e9be5aa51 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f1ea1c9593d in clone () from /lib64/libc.so.6

At this point I killed the process and let the tests continue.

Is this a known bug?

Thanks,

Dave

RE: Hang in hive/TEST007?

Posted by Roberta Marton <ro...@esgyn.com>.
This test failed for me in a slightly different way but maybe they are related.
The hive constructor for NATable is not initializing the variable (prototype_) so once in a while the call to the destructor cores.
I noticed that you were trying to call the destructor for a different object. 
 If the NATable destructor actually did memory deletes for prototype_ on an invalid address perhaps it caused some kind of memory corruption.  Later, when then destructor for ExLobGlobals is called, memory was invalid.

     Roberta 
 
-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com] 
Sent: Thursday, April 6, 2017 8:50 AM
To: dev@trafodion.incubator.apache.org
Subject: Hang in hive/TEST007?

Hi,

I was running Trafodion regressions on a workstation overnight and came in this morning to find them hung in test hive/TEST007, at the following statement:

>>insert into trafodion.sch007.vhive1 values (3,4);

--- 1 row(s) inserted.
>>select * from thive1;

A            B
-----------  -----------

          1            2
          3            4

--- 2 row(s) selected.

Notice the "--- 2 rows(s) selected" and then no sqlci prompt. Just stuck there.

Checking dtmci, there were no transactions active or outstanding.

I put the sqlci process into gdb, and looked at the stack trace:

(gdb) bt
#0  0x00007f1e9be5b2ad in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f1e9e2c8c5b in ExLobGlobals::~ExLobGlobals (this=0x3c4e640,
    __in_chrg=<value optimized out>) at ../exp/ExpLOBaccess.cpp:3176
#2  0x00007f1e9e2c717f in ExLobsOper (lobName=0x7f1e9eb666d7 "dummy", handleIn=
    0x0, handleInLen=0, hdfsServer=0x0, hdfsPort=0, handleOut=0x0,
    handleOutLen=@0x7ffc34b26334, descNumIn=0, descNumOut=@0x7ffc34b26340,
    retOperLen=@0x7ffc34b26340, requestTagIn=0, requestTagOut=@0x7ffc34b26340,
    requestStatus=@0x7ffc34b26348, cliError=@0x7ffc34b26338,
    lobStorageLocation=0x0, storage=Lob_HDFS_File, source=0x0, sourceLen=0,
    cursorBytes=0, cursorId=0x0, operation=Lob_Cleanup, subOperation=Lob_None,
    waited=1, globPtr=@0x7f1ea49a7de0, transId=0, blackBox=0x7f1ea4963ea0,
    blackBoxLen=0, lobMaxSize=0, lobMaxChunkMemSize=0, lobGCLimit=0,
    bufferSize=0, replication=0, blockSize=0, openType=0)
    at ../exp/ExpLOBaccess.cpp:2670
#3  0x00007f1e9e2cfdcf in ExpLOBinterfaceCleanup (exLobGlob=@0x7f1ea49a7de0,
    lobHeap=0x7f1ea4963ea0) at ../exp/ExpLOBinterface.cpp:201
#4  0x00007f1ea01480cf in ExHdfsScanTcb::freeResources (this=0x7f1ea49a7cd8)
    at ../executor/ExHdfsScan.cpp:283
#5  0x00007f1ea0147b21 in ExHdfsScanTcb::~ExHdfsScanTcb (this=0x7f1ea49a7cd8,
    __in_chrg=<value optimized out>) at ../executor/ExHdfsScan.cpp:224
#6  0x00007f1ea0147c50 in ExHdfsScanTcb::~ExHdfsScanTcb (this=0x7f1ea49a7cd8,
    __in_chrg=<value optimized out>) at ../executor/ExHdfsScan.cpp:225
#7  0x00007f1e9fefcce5 in ex_globals::cleanupTcbs (this=0x7f1ea4983b48)
---Type <return> to continue, or q <return> to quit---
    at ../executor/ex_globals.cpp:199
#8  0x00007f1e9fefcac1 in ex_globals::deleteMe (this=0x7f1ea4983b48,
    fatalError=0) at ../executor/ex_globals.cpp:143
#9  0x00007f1e9fed9600 in ExExeStmtGlobals::deleteMe (this=0x7f1ea4983b48,
    fatalError=0) at ../executor/ex_exe_stmt_globals.cpp:321
#10 0x00007f1e9feda06d in ExMasterStmtGlobals::deleteMe (this=0x7f1ea4983b48,
    fatalError=0) at ../executor/ex_exe_stmt_globals.cpp:672
#11 0x00007f1e9ff41f42 in ex_root_tcb::deallocAndDelete (this=0x7f1ea49a8660,
    glob=0x7f1ea4983b48, fragTable=0x7f1ea49a7980)
    at ../executor/ex_root.cpp:2470
#12 0x00007f1ea1922345 in CliStatement::releaseTcbs (this=0x7f1ea4963de0,
    closeAllOpens=0) at ../cli/Statement.cpp:4372
#13 0x00007f1ea1922506 in CliStatement::dealloc (this=0x7f1ea4963de0,
    closeAllOpens=0) at ../cli/Statement.cpp:4420
#14 0x00007f1ea1917caa in CliStatement::~CliStatement (this=0x7f1ea4963de0,
    __in_chrg=<value optimized out>) at ../cli/Statement.cpp:575
#15 0x00007f1ea19187bc in CliStatement::~CliStatement (this=0x7f1ea4963de0,
    __in_chrg=<value optimized out>) at ../cli/Statement.cpp:745
#16 0x00007f1ea18cad3e in ContextCli::deallocStmt (this=0x7f1ea49c2120,
    statement_id=0x3a14d70, deallocStaticStmt=0) at ../cli/Context.cpp:2348
#17 0x00007f1ea1890c89 in SQLCLI_DeallocStmt (cliGlobals=0x296bfe0,
    statement_id=0x3a14d70) at ../cli/Cli.cpp:1660
#18 0x00007f1ea1935309 in SQL_EXEC_DeallocStmt (statement_id=0x3a14d70)
---Type <return> to continue, or q <return> to quit---
    at ../cli/CliExtern.cpp:1832
#19 0x00007f1ea42622c9 in SqlCmd::deallocate (sqlci_env=0x2954a10, prep_stmt=
    0x3bd67b0) at ../sqlci/SqlCmd.cpp:2854
#20 0x00007f1ea426283c in DML::process (this=0x489e460, sqlci_env=0x2954a10)
    at ../sqlci/SqlCmd.cpp:2988
#21 0x00007f1ea424300c in Obey::process (this=0x623ca00, sqlci_env=0x2954a10)
    at ../sqlci/Obey.cpp:267
#22 0x00007f1ea424300c in Obey::process (this=0x2939090, sqlci_env=0x2954a10)
    at ../sqlci/Obey.cpp:267
#23 0x00007f1ea424bb43 in SqlciEnv::run (this=0x2954a10, in_filename=
    0x7ffc34b291f3 "TEST007.tmp", input_string=0x0)
    at ../sqlci/SqlciEnv.cpp:720
#24 0x000000000040272b in main (argc=2, argv=0x7ffc34b26f38)
    at ../bin/SqlciMain.cpp:329
(gdb)

So, we are waiting for some other thread to complete which never completes. Looking at the threads in the process I see:

(gdb) info thread
  37 Thread 0x7f1e93799700 (LWP 9140)  0x00007f1ea1c964fd in accept ()
   from /lib64/libc.so.6
  36 Thread 0x7f1e92d98700 (LWP 9141)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  35 Thread 0x7f1e9046c700 (LWP 9142)  0x00007f1ea1be071d in sigtimedwait ()
   from /lib64/libc.so.6
  34 Thread 0x7f1e8f96a700 (LWP 9157)  0x00007f1ea1c95f33 in epoll_wait ()
   from /lib64/libc.so.6
  33 Thread 0x7f1e8b5e7700 (LWP 9168)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  32 Thread 0x7f1e8b4e6700 (LWP 9169)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  31 Thread 0x7f1e8b3e5700 (LWP 9170)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  30 Thread 0x7f1e8b2e4700 (LWP 9171)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  29 Thread 0x7f1e8b1e3700 (LWP 9172)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  28 Thread 0x7f1e8b0e2700 (LWP 9173)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  27 Thread 0x7f1e8afe1700 (LWP 9174)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  26 Thread 0x7f1e8aee0700 (LWP 9175)  0x00007f1e9be5e63c in pthread_cond_wait@@---Type <return> to continue, or q <return> to quit---
GLIBC_2.3.2 () from /lib64/libpthread.so.0
  25 Thread 0x7f1e87824700 (LWP 9177)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  24 Thread 0x7f1e87723700 (LWP 9180)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  23 Thread 0x7f1e87622700 (LWP 9182)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  22 Thread 0x7f1e81690700 (LWP 9183)  0x00007f1e9be609b0 in sem_wait ()
   from /lib64/libpthread.so.0
  21 Thread 0x7f1e8158f700 (LWP 9184)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  20 Thread 0x7f1e8148e700 (LWP 9193)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  19 Thread 0x7f1e8138d700 (LWP 9195)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  18 Thread 0x7f1e8128c700 (LWP 9196)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  17 Thread 0x7f1e7e3ba700 (LWP 9247)  0x00007f1ea1c95f33 in epoll_wait ()
   from /lib64/libc.so.6
  16 Thread 0x7f1e7e0b8700 (LWP 9248)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  15 Thread 0x7f1e7dfb7700 (LWP 9301)  0x00007f1ea1c8c113 in poll ()
   from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---
  14 Thread 0x7f1e7a712700 (LWP 10456)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  13 Thread 0x7f1e7a20f700 (LWP 10528)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  12 Thread 0x7f1e7a10e700 (LWP 10531)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11 Thread 0x7f1e7a00d700 (LWP 10532)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10 Thread 0x7f1e79f0c700 (LWP 10722)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9 Thread 0x7f1e79e0b700 (LWP 10723)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8 Thread 0x7f1e79d0a700 (LWP 10724)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7 Thread 0x7f1e7ad18700 (LWP 14325)  0x00007f1ea1c95f33 in epoll_wait ()
   from /lib64/libc.so.6
  6 Thread 0x7f1e7b01b700 (LWP 14326)  0x00007f1e9be5e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x7f1e7b21d700 (LWP 14328)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 Thread 0x7f1e7b31e700 (LWP 14329)  0x00007f1e9be5ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 Thread 0x7f1e7ba26700 (LWP 14336)  0x00007f1e9be5ea0e in pthread_cond_timedw---Type <return> to continue, or q <return> to quit---
ait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7f1e7cea0700 (LWP 16913)  0x00007f1e9be612e4 in __lll_lock_wait ()
   from /lib64/libpthread.so.0
* 1 Thread 0x7f1ea4aa42c0 (LWP 9113)  0x00007f1e9be5b2ad in pthread_join ()
   from /lib64/libpthread.so.0
(gdb)

Everything is waiting. I'm not sure what thread is interesting to look at but thread 2 is different.  Looking at that one:

(gdb) thread 2
[Switching to thread 2 (Thread 0x7f1e7cea0700 (LWP 16913))]#0  0x00007f1e9be612e4 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f1e9be612e4 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f1e9be5c588 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f1e9be5c457 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f1e9e2c913c in ExLobLock::lock (this=0x3c52480)
    at ../exp/ExpLOBaccess.cpp:3281
#4  0x00007f1e9e2c80e6 in ExLobGlobals::performRequest (this=0x3c4e640,
    request=0x3c53a90) at ../exp/ExpLOBaccess.cpp:2963
#5  0x00007f1e9e2c96fb in ExLobGlobals::doWorkInThread (this=0x3c4e640)
    at ../exp/ExpLOBaccess.cpp:3441
#6  0x00007f1e9e2c8fc9 in workerThreadMain (arg=0x3c4e640)
    at ../exp/ExpLOBaccess.cpp:3235
#7  0x00007f1e9be5aa51 in start_thread () from /lib64/libpthread.so.0
#8  0x00007f1ea1c9593d in clone () from /lib64/libc.so.6

At this point I killed the process and let the tests continue.

Is this a known bug?

Thanks,

Dave