You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafodion.apache.org by "Liu, Ming (Ming)" <mi...@esgyn.cn> on 2016/03/11 18:15:33 UTC

using gcc 4.8 build trafodion , a strange issue

Hi, all,

Our team start to investigate and work on the task to use gcc 4.8 to build Trafodion.
We are now blocking on a strange issue.

After modifying something, we can build successfully and all core components of Trafodion start well except for DCS. We can use sqlci to do some simple test. But DCS crash, so the system still not available.

DCS master are all java code, it crashed when it try to load a native shared object. And my question is about this.

Three shared objects cause problem here.
libsbms.so, libsbutil.so , and libjdbcT2.so
libjdbcT2 requires both libsbms.so and libsbutil.so

DCS master core dump when it wants:
System.loadLibrary("jdbcT2");

I narrow down the issue by try this java code:
System.loadLibrary("sbms");
It crash too, so root cause is not in jdbcT2, but in sbms. jdbcT2 just linke with sbms.
After various tries, I found out if I remove the sbutil from the Makefile of sbms, then that java load library will not crash.
However, jdbcT2 needs both sbms and sbutil, So when load jdbcT2, it still crash. That means, if we link sbms and sbutil together java's System.loadLibrary() will crash:
--------------------------------------------------------------------------------
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fe0bac355f7, pid=3785, tid=140603191486208
#
# JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 1.7.0_79-b15)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libdl.so.2+0x15f7]  _dlerror_run+0x37
#
# Core dump written. Default location: /home/liuliumi/core or core.3785
------------------------------------------------------------------------------------------------------

But if I use a C code
dlopen("libjdbcT2.so", RDTL_NOW);
it works very well. I have been googling and testing for a while, but run out of ideas.

I even download openjdk source code and read the loadLibrary implementation, to me its core is simply a dlopen.
So I run out of idea what I can test next, does anyone have any good suggestions ?
What could be possible reason of this? C dlopen works fine, but java's loadlibrary crash?

Thanks,
Ming


RE: using gcc 4.8 build trafodion , a strange issue

Posted by Dave Birdsall <da...@esgyn.com>.
Hi,

My understanding is that the order that global object constructors are
called is not specified. That is, one should not count on any specific
order. The same is true for global object destructors. I have debugged
problems in the past where a new link changed the destructor order, and
because of a dependency between them, cores resulted.

Global objects should be used sparingly if at all. Better to use primitive
data types or pointers as global variables, so one avoids these kinds of
issues.

Dave

-----Original Message-----
From: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn]
Sent: Tuesday, March 15, 2016 9:56 AM
To: dev@trafodion.incubator.apache.org
Subject: 答复: using gcc 4.8 build trafodion , a strange issue

Hi, all,

Problem solved in a way that I cannot have a convincing theory.

Libsbms.so and libsbutil.so reference each other, and a few .o objects for
unknown reason conflict in mysterious ways. By switching link objects
order and different combination of the shared object's linked objects. I
found a correct combination and link order that both C++ and Java work
fine.

Gcc 4.4 behave differently with gcc 4.8 from link objects order point of
view.
Pure C program is fine, but for C++ , object need the runtime to invoke
its constructor first. Behavior change here. For global objects, the
sequence in which the object initialized is same as the order in the ld
command if using gcc4.8. But gcc 4.4 seems does a better job to find a
correct initialize sequence. If object A and B have dependency, program
built with gcc4.4 will automatically find the correct init sequence, but
gcc 4.8 totally rely on the order you link it...

Trafodion has some very complex class hierarchy, so when object A init, it
needs its parent C++ object constructor to be invoked first, since its
parent constructor will malloc a buffer. If the sequence is wrong, objects
will access a null pointer.

But it still cannot explain the whole story, why Java System.loadLibrary()
crash but C dlopen() work fine. But I believe the root cause is something
changes in the gcc 4.8 about link objects order. And good news is it works
although I don't have a perfect explanation...

Maybe someone remember something :-) ? I will keep researching this.

Thanks,
Ming

-----邮件原件-----
发件人: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn]
发送时间: 2016年3月12日 1:16
收件人: dev@trafodion.incubator.apache.org
主题: using gcc 4.8 build trafodion , a strange issue

Hi, all,

Our team start to investigate and work on the task to use gcc 4.8 to build
Trafodion.
We are now blocking on a strange issue.

After modifying something, we can build successfully and all core
components of Trafodion start well except for DCS. We can use sqlci to do
some simple test. But DCS crash, so the system still not available.

DCS master are all java code, it crashed when it try to load a native
shared object. And my question is about this.

Three shared objects cause problem here.
libsbms.so, libsbutil.so , and libjdbcT2.so
libjdbcT2 requires both libsbms.so and libsbutil.so

DCS master core dump when it wants:
System.loadLibrary("jdbcT2");

I narrow down the issue by try this java code:
System.loadLibrary("sbms");
It crash too, so root cause is not in jdbcT2, but in sbms. jdbcT2 just
linke with sbms.
After various tries, I found out if I remove the sbutil from the Makefile
of sbms, then that java load library will not crash.
However, jdbcT2 needs both sbms and sbutil, So when load jdbcT2, it still
crash. That means, if we link sbms and sbutil together java's
System.loadLibrary() will crash:
--------------------------------------------------------------------------
------
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fe0bac355f7, pid=3785, tid=140603191486208 #
# JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build
1.7.0_79-b15) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02
mixed mode linux-amd64 compressed oops) # Problematic frame:
# C  [libdl.so.2+0x15f7]  _dlerror_run+0x37 # # Core dump written. Default
location: /home/liuliumi/core or core.3785
--------------------------------------------------------------------------
----------------------------

But if I use a C code
dlopen("libjdbcT2.so", RDTL_NOW);
it works very well. I have been googling and testing for a while, but run
out of ideas.

I even download openjdk source code and read the loadLibrary
implementation, to me its core is simply a dlopen.
So I run out of idea what I can test next, does anyone have any good
suggestions ?
What could be possible reason of this? C dlopen works fine, but java's
loadlibrary crash?

Thanks,
Ming

答复: using gcc 4.8 build trafodion , a strange issue

Posted by "Liu, Ming (Ming)" <mi...@esgyn.cn>.
Hi, all,

Problem solved in a way that I cannot have a convincing theory.

Libsbms.so and libsbutil.so reference each other, and a few .o objects for unknown reason conflict in mysterious ways. By switching link objects order and different combination of the shared object's linked objects. I found a correct combination and link order that both C++ and Java work fine.

Gcc 4.4 behave differently with gcc 4.8 from link objects order point of view.
Pure C program is fine, but for C++ , object need the runtime to invoke its constructor first. Behavior change here. For global objects, the sequence in which the object initialized is same as the order in the ld command if using gcc4.8. But gcc 4.4 seems does a better job to find a correct initialize sequence. If object A and B have dependency, program built with gcc4.4 will automatically find the correct init sequence, but gcc 4.8 totally rely on the order you link it...

Trafodion has some very complex class hierarchy, so when object A init, it needs its parent C++ object constructor to be invoked first, since its parent constructor will malloc a buffer. If the sequence is wrong, objects will access a null pointer.

But it still cannot explain the whole story, why Java System.loadLibrary() crash but C dlopen() work fine. But I believe the root cause is something changes in the gcc 4.8 about link objects order. And good news is it works although I don't have a perfect explanation...

Maybe someone remember something :-) ? I will keep researching this.

Thanks,
Ming

-----邮件原件-----
发件人: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn] 
发送时间: 2016年3月12日 1:16
收件人: dev@trafodion.incubator.apache.org
主题: using gcc 4.8 build trafodion , a strange issue

Hi, all,

Our team start to investigate and work on the task to use gcc 4.8 to build Trafodion.
We are now blocking on a strange issue.

After modifying something, we can build successfully and all core components of Trafodion start well except for DCS. We can use sqlci to do some simple test. But DCS crash, so the system still not available.

DCS master are all java code, it crashed when it try to load a native shared object. And my question is about this.

Three shared objects cause problem here.
libsbms.so, libsbutil.so , and libjdbcT2.so
libjdbcT2 requires both libsbms.so and libsbutil.so

DCS master core dump when it wants:
System.loadLibrary("jdbcT2");

I narrow down the issue by try this java code:
System.loadLibrary("sbms");
It crash too, so root cause is not in jdbcT2, but in sbms. jdbcT2 just linke with sbms.
After various tries, I found out if I remove the sbutil from the Makefile of sbms, then that java load library will not crash.
However, jdbcT2 needs both sbms and sbutil, So when load jdbcT2, it still crash. That means, if we link sbms and sbutil together java's System.loadLibrary() will crash:
--------------------------------------------------------------------------------
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fe0bac355f7, pid=3785, tid=140603191486208 # # JRE version: Java(TM) SE Runtime Environment (7.0_79-b15) (build 1.7.0_79-b15) # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.79-b02 mixed mode linux-amd64 compressed oops) # Problematic frame:
# C  [libdl.so.2+0x15f7]  _dlerror_run+0x37 # # Core dump written. Default location: /home/liuliumi/core or core.3785
------------------------------------------------------------------------------------------------------

But if I use a C code
dlopen("libjdbcT2.so", RDTL_NOW);
it works very well. I have been googling and testing for a while, but run out of ideas.

I even download openjdk source code and read the loadLibrary implementation, to me its core is simply a dlopen.
So I run out of idea what I can test next, does anyone have any good suggestions ?
What could be possible reason of this? C dlopen works fine, but java's loadlibrary crash?

Thanks,
Ming