You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@trafodion.apache.org by "Huang, Jack" <Ja...@dell.com> on 2017/09/21 06:18:21 UTC

trafodion mxosrvr down

Hi trafodioner,
I run the trafodion database workload with HammerDB about 40 hours.
Initially the mxosrvr actual is 254, but now they are all down. Would you help to triage it? Now the database can not receive any connection.


[trafodion@trafodion ~]$ sqcheck



*** Checking Trafodion Environment ***



Checking if processes are up.

Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.



The Trafodion environment is up!





Process         Configured      Actual      Down

-------         ----------      ------      ----

DTM             2               2

RMS             4               4

DcsMaster       1               1

DcsServer       1               0           1

mxosrvr         256             2           254

RestServer      1               1




Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>





RE: trafodion mxosrvr down

Posted by "Huang, Jack" <Ja...@dell.com>.
The new coredump found, The system cannot running about 10 hours.
Anyone can help on this? It's really blocking my testing.  I attached the core.38375 bt track for your debug.

The coredump is to large, I cannot attached it in here.

[root@trafodion apache-trafodion-2.1.0]# ll -lt core.3*
-rw-------. 1 trafodion trafodion 904044544 Sep 27 07:52 core.32517
-rw-------. 1 trafodion trafodion 897847296 Sep 27 07:52 core.37759
-rw-------. 1 trafodion trafodion 935981056 Sep 27 07:29 core.38357

[root@trafodion apache-trafodion-2.1.0]# ll -lt hs_err_pid3*
-rw-r--r--. 1 trafodion trafodion 131963 Sep 27 07:52 hs_err_pid32517.log
-rw-r--r--. 1 trafodion trafodion 134229 Sep 27 07:52 hs_err_pid37759.log
-rw-r--r--. 1 trafodion trafodion 130115 Sep 27 07:29 hs_err_pid38357.log


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com]
Sent: Wednesday, September 27, 2017 10:53 PM
To: user@trafodion.incubator.apache.org
Subject: RE: trafodion mxosrvr down

The attached dump file was incomplete. It didn't have stack trace.  I thought that I had sent the response earlier, but forgot to hit the send button.

Selva

From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Wednesday, September 27, 2017 4:16 AM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Any update on it?


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Monday, September 25, 2017 9:39 AM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Hi,
Please see the ouput and attached dump file.

[root@trafodion apache-trafodion-2.1.0]# ls -lt core.*
-rw-------. 1 trafodion trafodion 2134945792 Sep 24 00:46 core.54809
-rw-------. 1 trafodion trafodion 2100211712 Sep 24 00:43 core.43521
-rw-------. 1 trafodion trafodion 2109403136 Sep 24 00:43 core.52966
-rw-------. 1 trafodion trafodion 2096242688 Sep 24 00:43 core.38905
-rw-------. 1 trafodion trafodion 2102181888 Sep 24 00:43 core.45648
-rw-r--r--. 1 trafodion trafodion  271522520 Jun 21 09:48 core.2017-06-21_09-48-29.ZSM000.16632.mxssmp
-rw-r--r--. 1 trafodion trafodion 4030914880 Jun 21 09:36 core.2017-06-21_09-36-27.Z000T2R.34396.tdm_udrserv

[trafodion@trafodion ~]$ sqvers -u
TRAF_HOME=/home/trafodion/apache-trafodion-2.1.0
who@host=trafodion@trafodion
JAVA_HOME=/usr/jdk64/jdk1.8.0_60
linux=2.6.32-642.el6.x86_64
redhat=6.8
NO patches
Most common Apache_Trafodion Release 2.1.0 (Build release [2.1.0-0-gdc3d97f], branch 20170406-no_branch, date 06Apr17)
UTT count is 2
[4]     Apache Trafodion Release 2.1.0 (Build release [2.1.0-0-gdc3d97f], branch release2.1, date 06Apr17)
          export/lib/jdbcT2.jar
          export/lib/jdbcT4-2.1.0.jar
          export/lib/jdbcT4.jar
          export/lib/lib_mgmt.jar
[15]    Apache_Trafodion Release 2.1.0 (Build release [2.1.0-0-gdc3d97f], branch release2.1, date 06Apr17)
          export/lib/hbase-trx-apache1_0-2.1.0.jar
          export/lib/hbase-trx-apache1_1-2.1.0.jar
          export/lib/hbase-trx-apache1_2-2.1.0.jar
          export/lib/hbase-trx-cdh5_4-2.1.0.jar
          export/lib/hbase-trx-cdh5_5-2.1.0.jar
          export/lib/hbase-trx-cdh5_7-2.1.0.jar
          export/lib/hbase-trx-hdp2_3-2.1.0.jar
          export/lib/sqmanvers.jar
          export/lib/trafodion-dtm-apache-2.1.0.jar
          export/lib/trafodion-dtm-cdh-2.1.0.jar
          export/lib/trafodion-dtm-hdp-2.1.0.jar
          export/lib/trafodion-sql-apache-2.1.0.jar
          export/lib/trafodion-sql-cdh-2.1.0.jar
          export/lib/trafodion-sql-hdp-2.1.0.jar
          export/lib/trafodion-utility-2.1.0.jar


[trafodion@trafodion ~]$ sqcheck

*** Checking Trafodion Environment ***

Checking if processes are up.
Checking attempt: 1; user specified max: 2. Execution time in seconds: 1.

The Trafodion environment is up!


Process         Configured      Actual      Down
-------         ----------      ------      ----
DTM             2               2
RMS             4               4
DcsMaster       1               1
DcsServer       1               0           1
mxosrvr         256             2           254
RestServer      1               1


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com]
Sent: Friday, September 22, 2017 1:58 AM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Can you please get the stack trace of some of the core files.

In the directory where core files are found, issue

ls -lt core.*

The cores with earlier timestamp will be displayed at the end.

gdb mxosrvr <core_file>
thread apply all bt

And send the stack trace of few of these core files?
Please issue at the shell prompt to get the version of Trafodion installed.
sqvers -u

and send the output of this command too.

Selva

From: Liu, Yuan (Yuan) [mailto:yuan.liu@esgyn.cn]
Sent: Thursday, September 21, 2017 12:05 AM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Mxosrvr is managed by dcsserver. You should run dcsstop/dcsstart and restart dcs, or just run dcsstart.

Best regards,
Yuan

From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:26 PM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Several core dump found. Does anyone how to restart the mxosrvr?

[root@trafodion apache-trafodion-2.1.0]# file core.40973
core.40973: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60', real uid: 1003, effective uid: 1003, real gid: 502, effective gid: 502, execfn: '/home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr', platform: 'x86_64'
[root@trafodion apache-trafodion-2.1.0]# gdb /home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr  core.40973

Core was generated by `mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60'.
Program terminated with signal 6, Aborted.
#0  0x0000003b30a325e5 in raise () from /lib64/libc.so.6


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:18 PM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: trafodion mxosrvr down

Hi trafodioner,
I run the trafodion database workload with HammerDB about 40 hours.
Initially the mxosrvr actual is 254, but now they are all down. Would you help to triage it? Now the database can not receive any connection.


[trafodion@trafodion ~]$ sqcheck



*** Checking Trafodion Environment ***



Checking if processes are up.

Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.



The Trafodion environment is up!





Process         Configured      Actual      Down

-------         ----------      ------      ----

DTM             2               2

RMS             4               4

DcsMaster       1               1

DcsServer       1               0           1

mxosrvr         256             2           254

RestServer      1               1




Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>





RE: trafodion mxosrvr down

Posted by Selva Govindarajan <se...@esgyn.com>.
The attached dump file was incomplete. It didn't have stack trace.  I thought that I had sent the response earlier, but forgot to hit the send button.

Selva

From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Wednesday, September 27, 2017 4:16 AM
To: user@trafodion.incubator.apache.org
Subject: RE: trafodion mxosrvr down

Any update on it?


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Monday, September 25, 2017 9:39 AM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Hi,
Please see the ouput and attached dump file.

[root@trafodion apache-trafodion-2.1.0]# ls -lt core.*
-rw-------. 1 trafodion trafodion 2134945792 Sep 24 00:46 core.54809
-rw-------. 1 trafodion trafodion 2100211712 Sep 24 00:43 core.43521
-rw-------. 1 trafodion trafodion 2109403136 Sep 24 00:43 core.52966
-rw-------. 1 trafodion trafodion 2096242688 Sep 24 00:43 core.38905
-rw-------. 1 trafodion trafodion 2102181888 Sep 24 00:43 core.45648
-rw-r--r--. 1 trafodion trafodion  271522520 Jun 21 09:48 core.2017-06-21_09-48-29.ZSM000.16632.mxssmp
-rw-r--r--. 1 trafodion trafodion 4030914880 Jun 21 09:36 core.2017-06-21_09-36-27.Z000T2R.34396.tdm_udrserv

[trafodion@trafodion ~]$ sqvers -u
TRAF_HOME=/home/trafodion/apache-trafodion-2.1.0
who@host=trafodion@trafodion
JAVA_HOME=/usr/jdk64/jdk1.8.0_60
linux=2.6.32-642.el6.x86_64
redhat=6.8
NO patches
Most common Apache_Trafodion Release 2.1.0 (Build release [2.1.0-0-gdc3d97f], branch 20170406-no_branch, date 06Apr17)
UTT count is 2
[4]     Apache Trafodion Release 2.1.0 (Build release [2.1.0-0-gdc3d97f], branch release2.1, date 06Apr17)
          export/lib/jdbcT2.jar
          export/lib/jdbcT4-2.1.0.jar
          export/lib/jdbcT4.jar
          export/lib/lib_mgmt.jar
[15]    Apache_Trafodion Release 2.1.0 (Build release [2.1.0-0-gdc3d97f], branch release2.1, date 06Apr17)
          export/lib/hbase-trx-apache1_0-2.1.0.jar
          export/lib/hbase-trx-apache1_1-2.1.0.jar
          export/lib/hbase-trx-apache1_2-2.1.0.jar
          export/lib/hbase-trx-cdh5_4-2.1.0.jar
          export/lib/hbase-trx-cdh5_5-2.1.0.jar
          export/lib/hbase-trx-cdh5_7-2.1.0.jar
          export/lib/hbase-trx-hdp2_3-2.1.0.jar
          export/lib/sqmanvers.jar
          export/lib/trafodion-dtm-apache-2.1.0.jar
          export/lib/trafodion-dtm-cdh-2.1.0.jar
          export/lib/trafodion-dtm-hdp-2.1.0.jar
          export/lib/trafodion-sql-apache-2.1.0.jar
          export/lib/trafodion-sql-cdh-2.1.0.jar
          export/lib/trafodion-sql-hdp-2.1.0.jar
          export/lib/trafodion-utility-2.1.0.jar


[trafodion@trafodion ~]$ sqcheck

*** Checking Trafodion Environment ***

Checking if processes are up.
Checking attempt: 1; user specified max: 2. Execution time in seconds: 1.

The Trafodion environment is up!


Process         Configured      Actual      Down
-------         ----------      ------      ----
DTM             2               2
RMS             4               4
DcsMaster       1               1
DcsServer       1               0           1
mxosrvr         256             2           254
RestServer      1               1


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com]
Sent: Friday, September 22, 2017 1:58 AM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Can you please get the stack trace of some of the core files.

In the directory where core files are found, issue

ls -lt core.*

The cores with earlier timestamp will be displayed at the end.

gdb mxosrvr <core_file>
thread apply all bt

And send the stack trace of few of these core files?
Please issue at the shell prompt to get the version of Trafodion installed.
sqvers -u

and send the output of this command too.

Selva

From: Liu, Yuan (Yuan) [mailto:yuan.liu@esgyn.cn]
Sent: Thursday, September 21, 2017 12:05 AM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Mxosrvr is managed by dcsserver. You should run dcsstop/dcsstart and restart dcs, or just run dcsstart.

Best regards,
Yuan

From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:26 PM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Several core dump found. Does anyone how to restart the mxosrvr?

[root@trafodion apache-trafodion-2.1.0]# file core.40973
core.40973: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60', real uid: 1003, effective uid: 1003, real gid: 502, effective gid: 502, execfn: '/home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr', platform: 'x86_64'
[root@trafodion apache-trafodion-2.1.0]# gdb /home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr  core.40973

Core was generated by `mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60'.
Program terminated with signal 6, Aborted.
#0  0x0000003b30a325e5 in raise () from /lib64/libc.so.6


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:18 PM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: trafodion mxosrvr down

Hi trafodioner,
I run the trafodion database workload with HammerDB about 40 hours.
Initially the mxosrvr actual is 254, but now they are all down. Would you help to triage it? Now the database can not receive any connection.


[trafodion@trafodion ~]$ sqcheck



*** Checking Trafodion Environment ***



Checking if processes are up.

Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.



The Trafodion environment is up!





Process         Configured      Actual      Down

-------         ----------      ------      ----

DTM             2               2

RMS             4               4

DcsMaster       1               1

DcsServer       1               0           1

mxosrvr         256             2           254

RestServer      1               1




Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>





RE: trafodion mxosrvr down

Posted by "Huang, Jack" <Ja...@dell.com>.
Any update on it?


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Monday, September 25, 2017 9:39 AM
To: user@trafodion.incubator.apache.org
Subject: RE: trafodion mxosrvr down

Hi,
Please see the ouput and attached dump file.

[root@trafodion apache-trafodion-2.1.0]# ls -lt core.*
-rw-------. 1 trafodion trafodion 2134945792 Sep 24 00:46 core.54809
-rw-------. 1 trafodion trafodion 2100211712 Sep 24 00:43 core.43521
-rw-------. 1 trafodion trafodion 2109403136 Sep 24 00:43 core.52966
-rw-------. 1 trafodion trafodion 2096242688 Sep 24 00:43 core.38905
-rw-------. 1 trafodion trafodion 2102181888 Sep 24 00:43 core.45648
-rw-r--r--. 1 trafodion trafodion  271522520 Jun 21 09:48 core.2017-06-21_09-48-29.ZSM000.16632.mxssmp
-rw-r--r--. 1 trafodion trafodion 4030914880 Jun 21 09:36 core.2017-06-21_09-36-27.Z000T2R.34396.tdm_udrserv

[trafodion@trafodion ~]$ sqvers -u
TRAF_HOME=/home/trafodion/apache-trafodion-2.1.0
who@host=trafodion@trafodion
JAVA_HOME=/usr/jdk64/jdk1.8.0_60
linux=2.6.32-642.el6.x86_64
redhat=6.8
NO patches
Most common Apache_Trafodion Release 2.1.0 (Build release [2.1.0-0-gdc3d97f], branch 20170406-no_branch, date 06Apr17)
UTT count is 2
[4]     Apache Trafodion Release 2.1.0 (Build release [2.1.0-0-gdc3d97f], branch release2.1, date 06Apr17)
          export/lib/jdbcT2.jar
          export/lib/jdbcT4-2.1.0.jar
          export/lib/jdbcT4.jar
          export/lib/lib_mgmt.jar
[15]    Apache_Trafodion Release 2.1.0 (Build release [2.1.0-0-gdc3d97f], branch release2.1, date 06Apr17)
          export/lib/hbase-trx-apache1_0-2.1.0.jar
          export/lib/hbase-trx-apache1_1-2.1.0.jar
          export/lib/hbase-trx-apache1_2-2.1.0.jar
          export/lib/hbase-trx-cdh5_4-2.1.0.jar
          export/lib/hbase-trx-cdh5_5-2.1.0.jar
          export/lib/hbase-trx-cdh5_7-2.1.0.jar
          export/lib/hbase-trx-hdp2_3-2.1.0.jar
          export/lib/sqmanvers.jar
          export/lib/trafodion-dtm-apache-2.1.0.jar
          export/lib/trafodion-dtm-cdh-2.1.0.jar
          export/lib/trafodion-dtm-hdp-2.1.0.jar
          export/lib/trafodion-sql-apache-2.1.0.jar
          export/lib/trafodion-sql-cdh-2.1.0.jar
          export/lib/trafodion-sql-hdp-2.1.0.jar
          export/lib/trafodion-utility-2.1.0.jar


[trafodion@trafodion ~]$ sqcheck

*** Checking Trafodion Environment ***

Checking if processes are up.
Checking attempt: 1; user specified max: 2. Execution time in seconds: 1.

The Trafodion environment is up!


Process         Configured      Actual      Down
-------         ----------      ------      ----
DTM             2               2
RMS             4               4
DcsMaster       1               1
DcsServer       1               0           1
mxosrvr         256             2           254
RestServer      1               1


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com]
Sent: Friday, September 22, 2017 1:58 AM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Can you please get the stack trace of some of the core files.

In the directory where core files are found, issue

ls -lt core.*

The cores with earlier timestamp will be displayed at the end.

gdb mxosrvr <core_file>
thread apply all bt

And send the stack trace of few of these core files?
Please issue at the shell prompt to get the version of Trafodion installed.
sqvers -u

and send the output of this command too.

Selva

From: Liu, Yuan (Yuan) [mailto:yuan.liu@esgyn.cn]
Sent: Thursday, September 21, 2017 12:05 AM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Mxosrvr is managed by dcsserver. You should run dcsstop/dcsstart and restart dcs, or just run dcsstart.

Best regards,
Yuan

From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:26 PM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Several core dump found. Does anyone how to restart the mxosrvr?

[root@trafodion apache-trafodion-2.1.0]# file core.40973
core.40973: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60', real uid: 1003, effective uid: 1003, real gid: 502, effective gid: 502, execfn: '/home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr', platform: 'x86_64'
[root@trafodion apache-trafodion-2.1.0]# gdb /home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr  core.40973

Core was generated by `mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60'.
Program terminated with signal 6, Aborted.
#0  0x0000003b30a325e5 in raise () from /lib64/libc.so.6


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:18 PM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: trafodion mxosrvr down

Hi trafodioner,
I run the trafodion database workload with HammerDB about 40 hours.
Initially the mxosrvr actual is 254, but now they are all down. Would you help to triage it? Now the database can not receive any connection.


[trafodion@trafodion ~]$ sqcheck



*** Checking Trafodion Environment ***



Checking if processes are up.

Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.



The Trafodion environment is up!





Process         Configured      Actual      Down

-------         ----------      ------      ----

DTM             2               2

RMS             4               4

DcsMaster       1               1

DcsServer       1               0           1

mxosrvr         256             2           254

RestServer      1               1




Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>





RE: trafodion mxosrvr down

Posted by "Huang, Jack" <Ja...@dell.com>.
Hi,
Please see the ouput and attached dump file.

[root@trafodion apache-trafodion-2.1.0]# ls -lt core.*
-rw-------. 1 trafodion trafodion 2134945792 Sep 24 00:46 core.54809
-rw-------. 1 trafodion trafodion 2100211712 Sep 24 00:43 core.43521
-rw-------. 1 trafodion trafodion 2109403136 Sep 24 00:43 core.52966
-rw-------. 1 trafodion trafodion 2096242688 Sep 24 00:43 core.38905
-rw-------. 1 trafodion trafodion 2102181888 Sep 24 00:43 core.45648
-rw-r--r--. 1 trafodion trafodion  271522520 Jun 21 09:48 core.2017-06-21_09-48-29.ZSM000.16632.mxssmp
-rw-r--r--. 1 trafodion trafodion 4030914880 Jun 21 09:36 core.2017-06-21_09-36-27.Z000T2R.34396.tdm_udrserv

[trafodion@trafodion ~]$ sqvers -u
TRAF_HOME=/home/trafodion/apache-trafodion-2.1.0
who@host=trafodion@trafodion
JAVA_HOME=/usr/jdk64/jdk1.8.0_60
linux=2.6.32-642.el6.x86_64
redhat=6.8
NO patches
Most common Apache_Trafodion Release 2.1.0 (Build release [2.1.0-0-gdc3d97f], branch 20170406-no_branch, date 06Apr17)
UTT count is 2
[4]     Apache Trafodion Release 2.1.0 (Build release [2.1.0-0-gdc3d97f], branch release2.1, date 06Apr17)
          export/lib/jdbcT2.jar
          export/lib/jdbcT4-2.1.0.jar
          export/lib/jdbcT4.jar
          export/lib/lib_mgmt.jar
[15]    Apache_Trafodion Release 2.1.0 (Build release [2.1.0-0-gdc3d97f], branch release2.1, date 06Apr17)
          export/lib/hbase-trx-apache1_0-2.1.0.jar
          export/lib/hbase-trx-apache1_1-2.1.0.jar
          export/lib/hbase-trx-apache1_2-2.1.0.jar
          export/lib/hbase-trx-cdh5_4-2.1.0.jar
          export/lib/hbase-trx-cdh5_5-2.1.0.jar
          export/lib/hbase-trx-cdh5_7-2.1.0.jar
          export/lib/hbase-trx-hdp2_3-2.1.0.jar
          export/lib/sqmanvers.jar
          export/lib/trafodion-dtm-apache-2.1.0.jar
          export/lib/trafodion-dtm-cdh-2.1.0.jar
          export/lib/trafodion-dtm-hdp-2.1.0.jar
          export/lib/trafodion-sql-apache-2.1.0.jar
          export/lib/trafodion-sql-cdh-2.1.0.jar
          export/lib/trafodion-sql-hdp-2.1.0.jar
          export/lib/trafodion-utility-2.1.0.jar


[trafodion@trafodion ~]$ sqcheck

*** Checking Trafodion Environment ***

Checking if processes are up.
Checking attempt: 1; user specified max: 2. Execution time in seconds: 1.

The Trafodion environment is up!


Process         Configured      Actual      Down
-------         ----------      ------      ----
DTM             2               2
RMS             4               4
DcsMaster       1               1
DcsServer       1               0           1
mxosrvr         256             2           254
RestServer      1               1


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com]
Sent: Friday, September 22, 2017 1:58 AM
To: user@trafodion.incubator.apache.org
Subject: RE: trafodion mxosrvr down

Can you please get the stack trace of some of the core files.

In the directory where core files are found, issue

ls -lt core.*

The cores with earlier timestamp will be displayed at the end.

gdb mxosrvr <core_file>
thread apply all bt

And send the stack trace of few of these core files?
Please issue at the shell prompt to get the version of Trafodion installed.
sqvers -u

and send the output of this command too.

Selva

From: Liu, Yuan (Yuan) [mailto:yuan.liu@esgyn.cn]
Sent: Thursday, September 21, 2017 12:05 AM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Mxosrvr is managed by dcsserver. You should run dcsstop/dcsstart and restart dcs, or just run dcsstart.

Best regards,
Yuan

From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:26 PM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Several core dump found. Does anyone how to restart the mxosrvr?

[root@trafodion apache-trafodion-2.1.0]# file core.40973
core.40973: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60', real uid: 1003, effective uid: 1003, real gid: 502, effective gid: 502, execfn: '/home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr', platform: 'x86_64'
[root@trafodion apache-trafodion-2.1.0]# gdb /home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr  core.40973

Core was generated by `mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60'.
Program terminated with signal 6, Aborted.
#0  0x0000003b30a325e5 in raise () from /lib64/libc.so.6


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:18 PM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: trafodion mxosrvr down

Hi trafodioner,
I run the trafodion database workload with HammerDB about 40 hours.
Initially the mxosrvr actual is 254, but now they are all down. Would you help to triage it? Now the database can not receive any connection.


[trafodion@trafodion ~]$ sqcheck



*** Checking Trafodion Environment ***



Checking if processes are up.

Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.



The Trafodion environment is up!





Process         Configured      Actual      Down

-------         ----------      ------      ----

DTM             2               2

RMS             4               4

DcsMaster       1               1

DcsServer       1               0           1

mxosrvr         256             2           254

RestServer      1               1




Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>





RE: trafodion mxosrvr down

Posted by Selva Govindarajan <se...@esgyn.com>.
Can you please get the stack trace of some of the core files.

In the directory where core files are found, issue

ls -lt core.*

The cores with earlier timestamp will be displayed at the end.

gdb mxosrvr <core_file>
thread apply all bt

And send the stack trace of few of these core files?
Please issue at the shell prompt to get the version of Trafodion installed.
sqvers -u

and send the output of this command too.

Selva

From: Liu, Yuan (Yuan) [mailto:yuan.liu@esgyn.cn]
Sent: Thursday, September 21, 2017 12:05 AM
To: user@trafodion.incubator.apache.org
Subject: RE: trafodion mxosrvr down

Mxosrvr is managed by dcsserver. You should run dcsstop/dcsstart and restart dcs, or just run dcsstart.

Best regards,
Yuan

From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:26 PM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: RE: trafodion mxosrvr down

Several core dump found. Does anyone how to restart the mxosrvr?

[root@trafodion apache-trafodion-2.1.0]# file core.40973
core.40973: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60', real uid: 1003, effective uid: 1003, real gid: 502, effective gid: 502, execfn: '/home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr', platform: 'x86_64'
[root@trafodion apache-trafodion-2.1.0]# gdb /home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr  core.40973

Core was generated by `mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60'.
Program terminated with signal 6, Aborted.
#0  0x0000003b30a325e5 in raise () from /lib64/libc.so.6


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:18 PM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: trafodion mxosrvr down

Hi trafodioner,
I run the trafodion database workload with HammerDB about 40 hours.
Initially the mxosrvr actual is 254, but now they are all down. Would you help to triage it? Now the database can not receive any connection.


[trafodion@trafodion ~]$ sqcheck



*** Checking Trafodion Environment ***



Checking if processes are up.

Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.



The Trafodion environment is up!





Process         Configured      Actual      Down

-------         ----------      ------      ----

DTM             2               2

RMS             4               4

DcsMaster       1               1

DcsServer       1               0           1

mxosrvr         256             2           254

RestServer      1               1




Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>





RE: trafodion mxosrvr down

Posted by "Liu, Yuan (Yuan)" <yu...@esgyn.cn>.
Mxosrvr is managed by dcsserver. You should run dcsstop/dcsstart and restart dcs, or just run dcsstart.

Best regards,
Yuan

From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:26 PM
To: user@trafodion.incubator.apache.org
Subject: RE: trafodion mxosrvr down

Several core dump found. Does anyone how to restart the mxosrvr?

[root@trafodion apache-trafodion-2.1.0]# file core.40973
core.40973: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60', real uid: 1003, effective uid: 1003, real gid: 502, effective gid: 502, execfn: '/home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr', platform: 'x86_64'
[root@trafodion apache-trafodion-2.1.0]# gdb /home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr  core.40973

Core was generated by `mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60'.
Program terminated with signal 6, Aborted.
#0  0x0000003b30a325e5 in raise () from /lib64/libc.so.6


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:18 PM
To: user@trafodion.incubator.apache.org<ma...@trafodion.incubator.apache.org>
Subject: trafodion mxosrvr down

Hi trafodioner,
I run the trafodion database workload with HammerDB about 40 hours.
Initially the mxosrvr actual is 254, but now they are all down. Would you help to triage it? Now the database can not receive any connection.


[trafodion@trafodion ~]$ sqcheck



*** Checking Trafodion Environment ***



Checking if processes are up.

Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.



The Trafodion environment is up!





Process         Configured      Actual      Down

-------         ----------      ------      ----

DTM             2               2

RMS             4               4

DcsMaster       1               1

DcsServer       1               0           1

mxosrvr         256             2           254

RestServer      1               1




Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>





RE: trafodion mxosrvr down

Posted by "Huang, Jack" <Ja...@dell.com>.
Several core dump found. Does anyone how to restart the mxosrvr?

[root@trafodion apache-trafodion-2.1.0]# file core.40973
core.40973: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from 'mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60', real uid: 1003, effective uid: 1003, real gid: 502, effective gid: 502, execfn: '/home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr', platform: 'x86_64'
[root@trafodion apache-trafodion-2.1.0]# gdb /home/trafodion/apache-trafodion-2.1.0/export/bin64/mxosrvr  core.40973

Core was generated by `mxosrvr -ZKHOST trafodion:2181 -RZ trafodion:1:24 -ZKPNODE /trafodion -CNGTO 60'.
Program terminated with signal 6, Aborted.
#0  0x0000003b30a325e5 in raise () from /lib64/libc.so.6


Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>




From: Huang, Jack [mailto:Jack.Huang@dell.com]
Sent: Thursday, September 21, 2017 2:18 PM
To: user@trafodion.incubator.apache.org
Subject: trafodion mxosrvr down

Hi trafodioner,
I run the trafodion database workload with HammerDB about 40 hours.
Initially the mxosrvr actual is 254, but now they are all down. Would you help to triage it? Now the database can not receive any connection.


[trafodion@trafodion ~]$ sqcheck



*** Checking Trafodion Environment ***



Checking if processes are up.

Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.



The Trafodion environment is up!





Process         Configured      Actual      Down

-------         ----------      ------      ----

DTM             2               2

RMS             4               4

DcsMaster       1               1

DcsServer       1               0           1

mxosrvr         256             2           254

RestServer      1               1




Jack Huang
Dell EMC | CTD MRES Cyclone Group
mobile +86-13880577652<tel:+86-13880577652>
jack.huang@dell.com<ma...@dell.com>