You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Sandhya Sundaresan (JIRA)" <ji...@apache.org> on 2017/04/19 21:22:41 UTC

[jira] [Commented] (TRAFODION-2597) ESP cores seen during daily builds after hive tests run

    [ https://issues.apache.org/jira/browse/TRAFODION-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975543#comment-15975543 ] 

Sandhya Sundaresan commented on TRAFODION-2597:
-----------------------------------------------

There is a small window where a reader thread could have emptied the cursor list by and start to work on reading that range form Hdfs. In the meantime, the main thread (in ExHdfsScanTcb::work) may have called ExpLOBinterfaceCloseFile. This may deallocate the memory held by the lobPtr . This would cause the reader thread to see deallocated/garbage memory while it's still processing. This scenario is likely to happen especially when cancel is called by the HdfsScan operator or when it decides to close a file when it's done with it. So instead of allowing the ExpLOBinterfaceCloseFile to deallocate resources, we can simply close the file which is what is intended. The resources can then be deallocated when the HdfsScanOperator is really fully done and calles ExpLOBinterfaceCleanup ( within the ExpLobGlobals destructor).


Also the preOpenLIst must be destructed AFTER the threads have exited to avoid accessing a data strcutore that potentially still could be in use. 

> ESP cores seen during daily builds after hive tests run
> -------------------------------------------------------
>
>                 Key: TRAFODION-2597
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2597
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-exe
>            Reporter: Sandhya Sundaresan
>             Fix For: 2.2-incubating
>
>
> After hive tetss run and pass successfully, soometimes we see core files of ESP with the following trace :
> Thread 6 (Thread 0x7fe4f36e7700 (LWP 46076)):
> #0 0x00007fe55ecb168c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> 0000001 0x00007fe565841f1d in ExLobLock::wait (this=0x2dc0580) at ../exp/ExpLOBaccess.cpp:3367
> 0000002 0x00007fe565842f4a in ExLobGlobals::getHdfsRequest (this=0x2dc0550) at ../exp/ExpLOBaccess.cpp:3464
> 0000003 0x00007fe565846a31 in ExLobGlobals::doWorkInThread (this=0x2dc0550) at ../exp/ExpLOBaccess.cpp:3494
> 0000004 0x00007fe565846a69 in workerThreadMain (arg=<value optimized out>) at ../exp/ExpLOBaccess.cpp:3300
> 0000005 0x00007fe55ecadaa1 in start_thread () from /lib64/libpthread.so.0
> 0000006 0x00007fe561f1caad in clone () from /lib64/libc.so.6
> Thread 5 (Thread 0x7fe5532cd700 (LWP 45641)):
> #0 0x00007fe561f1d0a3 in epoll_wait () from /lib64/libc.so.6
> 0000001 0x00007fe561be08e1 in SB_Trans::Sock_Controller::epoll_wait (this=0x7fe561e32de0, pp_where=0x7fe561c043a8 "Sock_Comp_Thread::run", pv_timeout=-1) at sock.cpp:366
> 0000002 0x00007fe561bdfcf3 in SB_Trans::Sock_Comp_Thread::run (this=0x19190b0) at sock.cpp:108
> 0000003 0x00007fe561bdfb2d in sock_comp_thread_fun (pp_arg=0x19190b0) at sock.cpp:78
> 0000004 0x00007fe5605ce71f in SB_Thread::Thread::disp (this=0x19190b0, pp_arg=0x19190b0) at thread.cpp:214
> 0000005 0x00007fe5605ceb77 in thread_fun (pp_arg=0x19190b0) at thread.cpp:310
> 0000006 0x00007fe5605d1f3e in sb_thread_sthr_disp (pp_arg=0x1922240) at threadl.cpp:270
> 0000007 0x00007fe55ecadaa1 in start_thread () from /lib64/libpthread.so.0
> 0000008 0x00007fe561f1caad in clone () from /lib64/libc.so.6
> Thread 4 (Thread 0x7fe553ecf700 (LWP 45627)):
> #0 0x00007fe561e676dd in sigtimedwait () from /lib64/libc.so.6
> 0000001 0x00007fe561ba578f in local_monitor_reader (pp_arg=0x28fd) at ../../../monitor/linux/clio.cxx:291
> 0000002 0x00007fe55ecadaa1 in start_thread () from /lib64/libpthread.so.0
> 0000003 0x00007fe561f1caad in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x7fe4f5fdf700 (LWP 45725)):
> #0 0x00007fe55ecb3a00 in sem_wait () from /lib64/libpthread.so.0
> 0000001 0x00007fe563c78c41 in ?? () from /usr/lib/jvm/java-1.8.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
> 0000002 0x00007fe563c6fa4a in ?? () from /usr/lib/jvm/java-1.8.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
> 0000003 0x00007fe563db7335 in ?? () from /usr/lib/jvm/java-1.8.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
> 0000004 0x00007fe563db7590 in ?? () from /usr/lib/jvm/java-1.8.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
> 0000005 0x00007fe563c7a8b2 in ?? () from /usr/lib/jvm/java-1.8.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
> 0000006 0x00007fe55ecadaa1 in start_thread () from /lib64/libpthread.so.0
> 0000007 0x00007fe561f1caad in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x7fe567e2e920 (LWP 45584)):
> #0 0x00007fe55ecb1a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> 0000001 0x00007fe5605d136c in SB_Thread::CV::wait (this=0x1902b38, pv_sec=0, pv_us=399999) at /home/jenkins/workspace/build-rh6-AdvEnt2.3-release@2/trafodion/core/sqf/export/include/seabed/int/thread.inl:652
> 0000002 0x00007fe5605d1431 in SB_Thread::CV::wait (this=0x1902b38, pv_lock=true, pv_sec=0, pv_us=399999) at /home/jenkins/workspace/build-rh6-AdvEnt2.3-release@2/trafodion/core/sqf/export/include/seabed/int/thread.inl:704
> 0000003 0x00007fe561bb7c6b in SB_Ms_Event_Mgr::wait (this=0x1902a40, pv_us=399999) at mseventmgr.inl:354
> 0000004 0x00007fe561bd8c6e in XWAIT_com (pv_mask=1280, pv_time=40, pv_residual=false) at pctl.cpp:982
> 0000005 0x00007fe561bd8a6f in XWAITNO0 (pv_mask=1280, pv_time=40) at pctl.cpp:905
> 0000006 0x00007fe564e2b59a in IpcSetOfConnections::waitOnSet (this=0x7fe5532ce288, timeout=-1, calledByESP=1, timedout=0x7ffd57ec2c88) at ../common/Ipc.cpp:1607
> 0000007 0x000000000040718c in waitOnAll (argc=3, argv=0x7ffd57ec2de8, guaReceiveFastStart=0x0) at ../common/Ipc.h:3094
> 0000008 runESP (argc=3, argv=0x7ffd57ec2de8, guaReceiveFastStart=0x0) at ../bin/ex_esp_main.cpp:416
> 0000009 0x00000000004075d3 in main (argc=3, argv=0x7ffd57ec2de8) at ../bin/ex_esp_main.cpp:258
> Thread 1 (Thread 0x7fe4f2ce6700 (LWP 46075)):
> #0 0x00007fe561e665e5 in raise () from /lib64/libc.so.6
> 0000001 0x00007fe561e67dc5 in abort () from /lib64/libc.so.6
> 0000002 0x00007fe563c78495 in ?? () from /usr/lib/jvm/java-1.8.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
> 0000003 0x00007fe563e06a93 in ?? () from /usr/lib/jvm/java-1.8.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
> 0000004 0x00007fe563c7df12 in JVM_handle_linux_signal () from /usr/lib/jvm/java-1.8.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
> 0000005 0x00007fe563c73de3 in ?? () from /usr/lib/jvm/java-1.8.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
> 0000006 <signal handler called>
> 0000007 0x00007fe5658453b0 in c_str (this=0x7fe4f22652d0, file=0x7fe4f2265588 "hdfs://localhost:8020/user/hive/exttables/ins_store_sales/INS_STORE_SALES-1-20170410124533-723:7fe5532e6310:8", type=<value optimized out>, range=140621291868944, bufMaxSize=140621302869952, maxBytes=140621302869648, waited=0, lobGlobals=0x2dc0550, hdfsDetailError=0x0) at ../export/FBString.h:596
> 0000008 c_str (this=0x7fe4f22652d0, file=0x7fe4f2265588 "hdfs://localhost:8020/user/hive/exttables/ins_store_sales/INS_STORE_SALES-1-20170410124533-723:7fe5532e6310:8", type=<value optimized out>, range=140621291868944, bufMaxSize=140621302869952, maxBytes=140621302869648, waited=0, lobGlobals=0x2dc0550, hdfsDetailError=0x0) at ../export/FBString.h:1944
> 0000009 data (this=0x7fe4f22652d0, file=0x7fe4f2265588 "hdfs://localhost:8020/user/hive/exttables/ins_store_sales/INS_STORE_SALES-1-20170410124533-723:7fe5532e6310:8", type=<value optimized out>, range=140621291868944, bufMaxSize=140621302869952, maxBytes=140621302869648, waited=0, lobGlobals=0x2dc0550, hdfsDetailError=0x0) at ../export/FBString.h:1947
> 0000010 data (this=0x7fe4f22652d0, file=0x7fe4f2265588 "hdfs://localhost:8020/user/hive/exttables/ins_store_sales/INS_STORE_SALES-1-20170410124533-723:7fe5532e6310:8", type=<value optimized out>, range=140621291868944, bufMaxSize=140621302869952, maxBytes=140621302869648, waited=0, lobGlobals=0x2dc0550, hdfsDetailError=0x0) at ../export/NAStringDef.h:385
> 0000011 ExLob::openDataCursor (this=0x7fe4f22652d0, file=0x7fe4f2265588 "hdfs://localhost:8020/user/hive/exttables/ins_store_sales/INS_STORE_SALES-1-20170410124533-723:7fe5532e6310:8", type=<value optimized out>, range=140621291868944, bufMaxSize=140621302869952, maxBytes=140621302869648, waited=0, lobGlobals=0x2dc0550, hdfsDetailError=0x0) at ../exp/ExpLOBaccess.cpp:1427
> 0000012 0x00007fe5658458a9 in ExLobGlobals::processPreOpens (this=0x2dc0550) at ../exp/ExpLOBaccess.cpp:3543
> 0000013 0x00007fe5658463d2 in ExLobGlobals::performRequest (this=0x2dc0550, request=<value optimized out>) at ../exp/ExpLOBaccess.cpp:3072
> 0000014 0x00007fe565846a19 in ExLobGlobals::doWorkInThread (this=0x2dc0550) at ../exp/ExpLOBaccess.cpp:3506
> 0000015 0x00007fe565846a69 in workerThreadMain (arg=<value optimized out>) at ../exp/ExpLOBaccess.cpp:3300
> 0000016 0x00007fe55ecadaa1 in start_thread () from /lib64/libpthread.so.0
> 0000017 0x00007fe561f1caad in clone () from /lib64/libc.so.6



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)