You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@trafodion.apache.org by "Selvaganesan Govindarajan (JIRA)" <ji...@apache.org> on 2019/02/12 02:47:00 UTC

[jira] [Commented] (TRAFODION-3274) At times sqlci or any other SQL process fails to come up and dumps core

    [ https://issues.apache.org/jira/browse/TRAFODION-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765611#comment-16765611 ] 

Selvaganesan Govindarajan commented on TRAFODION-3274:
------------------------------------------------------

The analysis of the core dump is as follows:

It looks like epoll fd is invalid and hence EINVAL is returned at the time of epoll_ctl API.  Epoll_fd is initialized in the global object gv_sock_ctrl. From the dump,

 

(gdb) p gv_sock_ctlr

$5 = {

  _vptr.Sock_Controller = 0x7f7d700287b0,

  ip_comp_thread = 0x0,

  ip_shutdown_eh = 0x0,

  iv_ctlr_mutex = {

    _vptr.Mutex = 0x7f7d69440630,

    iv_destroyed = false,

    iv_mutex = {

      __data = {

        __lock = 0,

        __count = 0,

        __owner = 0,

        __nusers = 0,

        __kind = 0,

        __spins = 0,

        __list = {

          __prev = 0x0,

          __next = 0x0

        }

      },

      __size = '\000' <repeats 39 times>,

      __align = 0

    },

    ip_mutex_name = 0x0,

    iv_errorcheck = false,

    iv_recursive = false

  },

  *_iv_efd = 2_*

}

 

Supposedly stderr fd of 2 is then assigned to different file possibly during error redirection.

 

(gdb) p *stderr

$6 = {

  _flags = -72537977,

  _IO_read_ptr = 0x7f7d703ce1a3 "",

  _IO_read_end = 0x7f7d703ce1a3 "",

  _IO_read_base = 0x7f7d703ce1a3 "",

  _IO_write_base = 0x7f7d703ce1a3 "",

  _IO_write_ptr = 0x7f7d703ce1a3 "",

  _IO_write_end = 0x7f7d703ce1a3 "",

  _IO_buf_base = 0x7f7d703ce1a3 "",

  _IO_buf_end = 0x7f7d703ce1a4 "",

  _IO_save_base = 0x0,

  _IO_backup_base = 0x0,

  _IO_save_end = 0x0,

  _markers = 0x0,

  _chain = 0x7f7d703ce040,

  *_fileno = 2,* 

  _flags2 = 0,

  _old_offset = -1,

  _cur_column = 0,

  _vtable_offset = 0 '\000',

  _shortbuf = "",

  _lock = 0x7f7d703cf6c0,

  _offset = -1,

  __pad1 = 0x0,

  __pad2 = 0x7f7d703ce4e0,

  __pad3 = 0x0,

  __pad4 = 0x0,

  __pad5 = 0,

  _mode = -1,

  _unused2 = '\000' <repeats 19 times>

}

(gdb)

 

Hence, sqlci dumps core when epoll_ctl returns EINVAL Out of the possible causes for EINVAL return for epoll_ctl,  

 *EINVAL* _epfd_ is not an *epoll* file descriptor, or _fd_ is the same as  _epfd_, or the requested operation _op_ is not supported by this   interface.       . 

 I think there is some race condition in C++ main function prologue while initializing the embedded global objects and the stdin, stdout and stderr file descriptors. I haven’t looked at how stderr got redirected in our code. Most likely it assumed the fd of 2 for stderr.  This would explain why epoll_ctl returned EINVAL.

 

I don’t suspect that the global variable gv_sock_ctrl got corrupted to have iv_efd to be 2, though I can’t explain why isn’t.

Possible solutions are:
 # Make gv_sock_ctrl as a pointer and construct it when accessed for the first time in seabed layer
 # Make our redirector code to use proper stderr fd for redirection.

> At times sqlci or any other SQL process fails to come up and dumps core
> -----------------------------------------------------------------------
>
>                 Key: TRAFODION-3274
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-3274
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: foundation
>            Reporter: Selvaganesan Govindarajan
>            Assignee: Selvaganesan Govindarajan
>            Priority: Major
>
> sqlci dumps core with the following stack trace
> Thread 1 (Thread 0x7f7d63dde700 (LWP 58936)):
> #0  0x00007f7d70071495 in raise () from /lib64/libc.so.6
> #1  0x00007f7d70072c75 in abort () from /lib64/libc.so.6
> #2  0x00007f7d69232b02 in sb_util_assert_fun_com (pv_assert=ASSERT_INTCMP, pp_exp=0x7f7d6fe0fcd2 "lv_err != -1", pv_lhs=-1, pp_op=0x7f7d69239b71 "!=", pv_rhs=-1, pp_file=0x7f7d6fe0fb7a "sock.cpp", pv_line=348, pp_fun=0x7f7d6fe109c0 "void SB_Trans::Sock_Controller::epoll_ctl(const char*, int, int, int, void*)") at util.cpp:274
> #3  0x00007f7d69232f23 in SB_util_assert_fun_ine (pp_exp=0x7f7d6fe0fcd2 "lv_err != -1", pv_lhs=-1, pv_rhs=-1, pp_file=0x7f7d6fe0fb7a "sock.cpp", pv_line=348, pp_fun=0x7f7d6fe109c0 "void SB_Trans::Sock_Controller::epoll_ctl(const char*, int, int, int, void*)") at util.cpp:480
> #4  0x00007f7d6fdebcbb in SB_Trans::Sock_Controller::epoll_ctl (this=0x7f7d7003e560, pp_where=0x7f7d63ddad30 "Sock_Client::event_init", pv_op=1, pv_fd=16, pv_event=1, pp_data=0x1ed3150) at sock.cpp:348
> #5  0x00007f7d6fdec585 in SB_Trans::Sock_Controller::sock_add (this=0x7f7d7003e560, pp_where=0x7f7d63ddad30 "Sock_Client::event_init", pv_sock=16, pp_eh=0x1ed3150) at sock.cpp:593
> #6  0x00007f7d6fdedf95 in SB_Trans::Sock_User_Common::event_init (this=0x1ed2180, pp_eh=0x1ed3150) at sock.cpp:1060
> #7  0x00007f7d6fdef3bc in SB_Trans::Sock_Stream::create (pp_name=0x7f7d63ddb0f0 "connect p-id=0/40881/17 ($TM0-tm)", pp_pname=0x7f7d63ddb5a0 "$TM0", pp_prog=0x7f7d63ddb420 "tm", pv_ic=false, pp_sock=0x1ed2180, pv_mon_callback=0, pv_mon_unsol_callback=0, pv_ms_comp_callback=0x7f7d6fdb8e74 <ms_ldone_cbt(MS_Md_Type*)>, pv_ms_abandon_callback=0x7f7d6fdb7a50 <ms_abandon_cbt(MS_Md_Type*, bool)>, pv_ms_oc_callback=0x7f7d6fdcfdc8 <msg_mon_oc_cbt(MS_Md_Type*, void*)>, pv_ms_lim_callback=0, pp_ms_recv_q=0x7f7d7002cfe0, pp_ms_lim_q=0x7f7d7002cee0, pp_event_mgr=0x7f7d7002cd58, pv_open_nid=-1, pv_open_pid=-1, pv_open_verif=-1, pv_opened_nid=0, pv_opened_pid=40881, pv_opened_verif=17, pv_opened_type=2) at sockstream.cpp:435
> #8  0x00007f7d6fdd16b5 in msg_mon_open_process_com_ph2_sock (pp_where=0x7f7d6fe09921 "msg_mon_open_process-ph2", pp_name=0x7f7d63ddb5a0 "$TM0", pp_phandle=0x7f7d63ddb6c0, pp_oid=0x7f7d63ddb72c, pv_reopen=false, pv_ic=false, pp_od=0x1ed4c20, pp_msg=0x7f7d6494268c, pv_nid=0, pv_pid=40881, pv_verif=17, pv_ptype=2) at msmon.cpp:5113
> #9  0x00007f7d6fdd124d in msg_mon_open_process_com_ph2 (pp_name=0x7f7d63ddb5a0 "$TM0", pp_phandle=0x7f7d63ddb6c0, pp_oid=0x7f7d63ddb72c, pv_reopen=false, pv_death_notif=true, pv_ic=false, pp_od=0x1ed4c20) at msmon.cpp:4989
> #10 0x00007f7d6fdd076c in msg_mon_open_process_com (pp_name=0x7f7d63ddb720 "$tm0", pp_phandle=0x7f7d63ddb6c0, pp_oid=0x7f7d63ddb72c, pv_reopen=false, pv_death_notif=true, pv_self=false, pv_backup=false, pv_ic=false, pv_fs_open=false) at msmon.cpp:4724
> #11 0x00007f7d6fdcfac8 in msg_mon_open_process (pp_name=0x7f7d63ddb720 "$tm0", pp_phandle=0x7f7d63ddb6c0, pp_oid=0x7f7d63ddb72c) at msmon.cpp:4358
> #12 0x00007f7d70da1af6 in TMLIB::open_tm (this=0x7f7d70fb8f80, pv_node=0, pv_startup=<value optimized out>) at tmlib.cpp:3070
> #13 0x00007f7d70da19cc in TMLIB::initialize (this=0x7f7d70fb8f80) at tmlib.cpp:2970
> #14 0x00007f7d70da7e8c in tmlib_check_active_tx () at tmlib.cpp:262
> #15 0x00007f7d70da8439 in GETTRANSID (pp_transid=0x7f7d63ddb828) at tmlib.cpp:1091
> #16 0x00007f7d7664744d in ExTransaction::getCurrentXnId (this=<value optimized out>, tcbref=<value optimized out>, transId=0x7f7d63ddb820, txHandle=0x7f7d63ddb800) at ../executor/ex_transaction.cpp:180
> #17 0x00007f7d766475cc in ExTransaction::inheritTransaction (this=0x7f7d632ded80) at ../executor/ex_transaction.cpp:770
> #18 0x00007f7d76d891cd in CliPrologue (cliGlobals=0x1eb77a0, module=<value optimized out>) at ../cli/Cli.cpp:431
> #19 0x00007f7d76d8cae4 in SQLCLI_GetAuthID (cliGlobals=0x1eb77a0, authName=0x1ea0350 "SQL_USER1", authID=@0x7f7d63ddba68) at ../cli/Cli.cpp:6195
> #20 0x00007f7d76e19b60 in SQL_EXEC_GetAuthID (authName=0x1ea0350 "SQL_USER1", authID=@0x7f7d63ddba68) at ../cli/CliExtern.cpp:4117
> #21 0x00007f7d78655a5b in SqlciEnv::setUserIdentityInCLI (this=0x1ea0240) at ../sqlci/SqlciEnv.cpp:1426
> #22 0x00007f7d78656239 in SqlciEnv_prologue_to_run (sqlciEnv=0x1ea0240) at ../sqlci/SqlciEnv.cpp:513
> #23 0x00007f7d7865648c in SqlciEnv::run (this=0x1ea0240, in_filename=0x7ffcaf9a4a1d "TEST142(user1_cmds)", input_string=<value optimized out>) at ../sqlci/SqlciEnv.cpp:654
> #24 0x0000000000401fde in thread_main (p_arg=<value optimized out>) at ../bin/SqlciMain.cpp:333
> #25 0x00007f7d6f937aa1 in start_thread () from /lib64/libpthread.so.0
> #26 0x00007f7d70127bcd in clone () from /lib64/libc.so.6
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)