You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafodion.apache.org by Nieyuanyuan <ni...@huawei.com> on 2015/09/08 03:39:54 UTC

[Urgent Help] Trafodion Build Environment Problem

Dear Guys,

I recently downloaded trafodion 1.1 from https://github.com/apache/incubator-trafodion/tree/stable/1.1, and followed the build guide from https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and solved a lot of problems (no need to list all details), I am able to run trafodion over a hadoop sandbox environment.

But I got a serious problem, that is, all Trafodion related process will go down after several minutes (not sure how long), only few of them will left:
[nieyy@redhat-72 ~]$ ps ux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_namenode -Xmx1000m -Djava.net.prefe
nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_datanode -Xmx1000m -Djava.net.prefe
nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_secondarynamenode -Xmx1000m -Djava.
nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_resourcemanager -Xmx1000m -Dhadoop.
nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00 /bin/sh ./bin/mysqld_safe --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01 /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/local_hadoop/mysql/bin/mysq
nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/local_hadoop/hbase/bin
nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill
nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun -disable-auto-cleanup -demux select -env SQ_IC TCP -env MPI_ERROR_LEVEL 2 -env SQ_PIDMAP 1 -
nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00 [hydra_pmi_proxy] <defunct>
nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17 /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export/bin64d/monitor COLD
nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14 /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export/bin64d/monitor COLD
nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m

And then I need to kill all processes and use swstartall and sqstart to reset the environment, however, the environment will still go down after a while, and I need to restart again.

I found some cores under trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts, all cored were generated by mxssmp:
[nieyy@redhat-72 scripts]$ ll core*
...
-rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
-rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
-rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
-rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197

I used gdb to track the stack:
[nieyy@redhat-72 scripts]$ gdb /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/lib/linux/64bit/debug/mxssmp ./core.mxssmp.141469
...
(gdb) where
#0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at ../runtimestats/SqlStats.h:271
#1  0x000000000043990a in StatsGlobals::removeProcess (this=0x10000000, pid=65536, calledAtAdd=0) at ../runtimestats/SqlStats.cpp:276
#2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
#3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at ../runtimestats/ssmpipc.cpp:582
#4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at ../bin/ex_ssmp_main.cpp:259
#5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at ../bin/ex_ssmp_main.cpp:127

Then I searched via Google, and found a link https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar, but it claimed the bug has been fixed at v0.9, but my version is 1.1.

So, could you kindly help me to solve this problem cause I can't find more useful information via Google.

Thanks a lot.


Re: 答复: 答复: 答复: [Urgent Help] Trafodion Build Environment Problem

Posted by Hans Zeller <ha...@esgyn.com>.
Hi, thank you for the kind words! Those of us who have worked on this code
base for a while were hoping the many years of work that went into it would
show.

Adding another storage engine should be doable, we can talk about how much
effort it took to replace the previous engine with HBase. One option to
prototype another engine might be to use TMUDFs (Table Mapping UDFs). These
can be used to feed outside data into an SQL query and they also have an
optimizer interface that allows parallelization, costing, statistics, and
predicate evaluation on TMUDFs. Here is a link to TMUDFs:
https://cwiki.apache.org/confluence/display/TRAFODION/Tutorial%3A+The+object-oriented+UDF+interface.
Java TMUDF support is not yet officially supported and documented but it is
available in the current build, I can send you the Javadoc.

The old Trafodion wiki ( https://wiki.trafodion.org/ ) has some more info
and you can find some info on training (geared towards users, not
developers) on the Esgyn web site.

If you have any specific areas where you look for more info, some of the
people in this dev list might be able to provide you with more information,
or be able to have a call to talk about it.

Thanks,

Hans

On Wed, Sep 9, 2015 at 6:27 PM, Nieyuanyuan <ni...@huawei.com> wrote:

> Dear Hans,
>
> Thanks for your attention on our works, I got another mail from Liu Ming,
> looks I can contact w/ him for further issues later.
>
> Briefly saying, we think Trafodion is the most powerful SQL Engine in the
> open source world after evaluation, not only in Hadoop ecosystem, now we
> are working on to investigate if we can plug another kind of storage
> services into (not only HBase).
>
> And we hope later we can become contributors or even committers in
> Trafodion community, so if you can provide more developer docs, that will
> be great.
>
> -----邮件原件-----
> 发件人: Hans Zeller [mailto:hans.zeller@esgyn.com]
> 发送时间: 2015年9月9日 22:53
> 收件人: dev; Amanda Moran; Liu, Ming (Ming); Zhang, Yi (Eason)
> 抄送: Lijian (Q)
> 主题: Re: 答复: 答复: [Urgent Help] Trafodion Build Environment Problem
>
> Hi, you could also send your notes in Chinese to Liu, Ming and Zhang, Yi
> (copied on this email).
>
> The Trafodion wiki is
> https://cwiki.apache.org/confluence/display/TRAFODION/Apache+Trafodion+Home
> ,
> it is currently being converted from an earlier form. To edit this wiki,
> you can follow these steps:
>
> https://cwiki.apache.org/confluence/display/TRAFODION/Contribute+to+the+Wiki
> .
>
> The old wiki was translated into Chinese (
> https://wiki.trafodion.org/wiki/index.php/Main_Page?setlang=zh-hant ), we
> may plan something similar for the new Apache wiki.
>
> Thank you,
>
> Hans
>
> On Tue, Sep 8, 2015 at 8:18 PM, Nieyuanyuan <ni...@huawei.com>
> wrote:
>
> > Hi, Hans,
> >
> > Ok, I can do that, so far my steps were all written in Chinese, I need
> > to translate into English, any apache wiki link I can post?
> >
> > -----邮件原件-----
> > 发件人: Hans Zeller [mailto:hans.zeller@esgyn.com]
> > 发送时间: 2015年9月9日 9:43
> > 收件人: dev
> > 抄送: Lijian (Q)
> > 主题: Re: 答复: [Urgent Help] Trafodion Build Environment Problem
> >
> > Hi Nieyuanyuan,
> >
> > If you have a list of things you did, like the examples with ANT,
> > MAVEN you already mentioned, that would be great. I know that this can
> > be hard to do, since often the installation is trial and error.
> >
> > Thank you,
> >
> > Hans
> >
> > On Tue, Sep 8, 2015 at 6:30 PM, Nieyuanyuan <ni...@huawei.com>
> > wrote:
> >
> > > Hi, Hans,
> > >
> > > I am willing to do that, my environment is RHEL 6.5, and my server
> > > is behind a firewall (have to use proxy to download a lot of
> > > stuffs), so require some special configurations such as for ANT,
> MAVEN, curl.
> > >
> > > Also, I found some missing dependencies while following the original
> > > build guide to finish the whole build process, and some small
> > > mistakes which should be revised.
> > >
> > > Could you plz show me a way or a link to share my installation steps?
> > >
> > > Thanks.
> > >
> > > -----邮件原件-----
> > > 发件人: Hans Zeller [mailto:hans.zeller@esgyn.com]
> > > 发送时间: 2015年9月9日 0:27
> > > 收件人: dev
> > > 抄送: Lijian (Q)
> > > 主题: Re: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Hi Nieyuanyuan,
> > >
> > > Some of us are also working on running Trafodion in a sandbox or on
> > > Apache objects. We hope to have documented steps on how to do that
> > > eventually. You mention you had to fix several things. If you have
> > > notes on what those are, would you share them?
> > >
> > > Thank you,
> > >
> > > Hans
> > >
> > > On Tue, Sep 8, 2015 at 9:19 AM, Amanda Moran
> > > <am...@esgyn.com>
> > > wrote:
> > >
> > > > Hi there-
> > > >
> > > > This is fixed in latest version of installer.
> > > >
> > > > Thanks.
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On Sep 8, 2015, at 9:07 AM, Dave Birdsall
> > > > > <da...@esgyn.com>
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I'm wondering if this should be reported as a problem? Perhaps
> > > > Nieyuanyuan
> > > > > would like to open a JIRA about supporting higher PID numbers in
> > > > Trafodion?
> > > > >
> > > > > Dave
> > > > >
> > > > > -----Original Message-----
> > > > > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > > > > Sent: Monday, September 7, 2015 7:04 PM
> > > > > To: dev@trafodion.incubator.apache.org
> > > > > Cc: Lijian (Q) <ji...@huawei.com>
> > > > > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> > > > >
> > > > > Hi Nieyuanyuan,
> > > > >
> > > > > Could you please check the 'pid_max' settings:
> > > > > sysctl -q kernel.pid_max
> > > > > (or cat /proc/sys/kernel/pid_max)
> > > > >
> > > > > If the value is > 64K, I would recommend you set it to 64K, like
> so:
> > > > > sudo sysctl -w kernel.pid_max=65535
> > > > >
> > > > > You will  have to restart Tradfodion and other Hadoop/HBase
> > processes:
> > > > > swstopall
> > > > > ckillall
> > > > > swstartall
> > > > > sqstart
> > > > >
> > > > > Just fyi, to check the list of Trafodion processes only, please
> > > > > run
> > > > 'cstat'
> > > > > on your bash.
> > > > >
> > > > > Thanks,
> > > > > -Narendra
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > > > > Sent: Monday, September 7, 2015 6:40 PM
> > > > > To: dev@trafodion.incubator.apache.org
> > > > > Cc: Lijian (Q) <ji...@huawei.com>
> > > > > Subject: [Urgent Help] Trafodion Build Environment Problem
> > > > >
> > > > > Dear Guys,
> > > > >
> > > > > I recently downloaded trafodion 1.1 from
> > > > > https://github.com/apache/incubator-trafodion/tree/stable/1.1,
> > > > > and
> > > > followed
> > > > > the build guide from
> > > > > https://wiki.trafodion.org/wiki/index.php/Building_the_Software,
> > > > > and
> > > > solved
> > > > > a lot of problems (no need to list all details), I am able to
> > > > > run
> > > > trafodion
> > > > > over a hadoop sandbox environment.
> > > > >
> > > > > But I got a serious problem, that is, all Trafodion related
> > > > > process will
> > > > go
> > > > > down after several minutes (not sure how long), only few of them
> > > > > will
> > > > > left:
> > > > > [nieyy@redhat-72 ~]$ ps ux
> > > > > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> > > COMMAND
> > > > > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > > > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > > -Dproc_namenode -Xmx1000m
> > > > > -Djava.net.prefe
> > > > > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > > -Dproc_datanode -Xmx1000m
> > > > > -Djava.net.prefe
> > > > > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > > > > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > > > > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > > > > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> > > /bin/sh
> > > > > ./bin/mysqld_safe
> > > > >
> > >
> --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > > > > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > > > >
> > > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sq
> > > > f/
> > > > sq
> > > > l/lo
> > > > > cal_hadoop/mysql/bin/mysq
> > > > > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00
> > bash
> > > > >
> > > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sq
> > > > f/
> > > > sq
> > > > l/lo
> > > > > cal_hadoop/hbase/bin
> > > > > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > > -Dproc_master -XX:OnOutOfMemoryError=kill
> > > > > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00
> > > mpirun
> > > > > -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> > > > > MPI_ERROR_LEVEL
> > > > > 2 -env SQ_PIDMAP 1 -
> > > > > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > > > > [hydra_pmi_proxy] <defunct>
> > > > > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > > > >
> > > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sq
> > > > f/
> > > > ex
> > > > port
> > > > > /bin64d/monitor COLD
> > > > > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > > > >
> > > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sq
> > > > f/
> > > > ex
> > > > port
> > > > > /bin64d/monitor COLD
> > > > > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > > > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > > >
> > > > > And then I need to kill all processes and use swstartall and
> > > > > sqstart to reset the environment, however, the environment will
> > > > > still go down after
> > > > a
> > > > > while, and I need to restart again.
> > > > >
> > > > > I found some cores under
> > > > > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scri
> > > > > pt s, all cored were generated by mxssmp:
> > > > > [nieyy@redhat-72 scripts]$ ll core* ...
> > > > > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56
> > > > > core.mxssmp.173357
> > > > > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56
> > > > > core.mxssmp.173372
> > > > > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24
> > > > > core.mxssmp.74146
> > > > > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24
> > > > > core.mxssmp.74197
> > > > >
> > > > > I used gdb to track the stack:
> > > > > [nieyy@redhat-72 scripts]$ gdb
> > > > >
> > > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sq
> > > > l/
> > > > li
> > > > b/li
> > > > > nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > > > > (gdb) where
> > > > > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > > > > ../runtimestats/SqlStats.h:271
> > > > > #1  0x000000000043990a in StatsGlobals::removeProcess
> > > > > (this=0x10000000, pid=65536, calledAtAdd=0) at
> > > > > ../runtimestats/SqlStats.cpp:276
> > > > > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > > > > (this=0x10000000, myPid=141469) at
> > > > > ../runtimestats/SqlStats.cpp:382
> > > > > #3  0x00000000004440be in SsmpGlobals::work
> > > > > (this=0x7f062660c7e8) at
> > > > > ../runtimestats/ssmpipc.cpp:582
> > > > > #4  0x000000000042f06a in runServer (argc=1,
> > > > > argv=0x7fff5b0e5a48) at
> > > > > ../bin/ex_ssmp_main.cpp:259
> > > > > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > > > > ../bin/ex_ssmp_main.cpp:127
> > > > >
> > > > > Then I searched via Google, and found a link
> > > > > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks
> > > > > similar,
> > > > but
> > > > > it claimed the bug has been fixed at v0.9, but my version is 1.1.
> > > > >
> > > > > So, could you kindly help me to solve this problem cause I can't
> > > > > find
> > > > more
> > > > > useful information via Google.
> > > > >
> > > > > Thanks a lot.
> > > >
> > >
> >
>

答复: 答复: 答复: [Urgent Help] Trafodion Build Environment Problem

Posted by Nieyuanyuan <ni...@huawei.com>.
Dear Hans,

Thanks for your attention on our works, I got another mail from Liu Ming, looks I can contact w/ him for further issues later.

Briefly saying, we think Trafodion is the most powerful SQL Engine in the open source world after evaluation, not only in Hadoop ecosystem, now we are working on to investigate if we can plug another kind of storage services into (not only HBase).

And we hope later we can become contributors or even committers in Trafodion community, so if you can provide more developer docs, that will be great.

-----邮件原件-----
发件人: Hans Zeller [mailto:hans.zeller@esgyn.com] 
发送时间: 2015年9月9日 22:53
收件人: dev; Amanda Moran; Liu, Ming (Ming); Zhang, Yi (Eason)
抄送: Lijian (Q)
主题: Re: 答复: 答复: [Urgent Help] Trafodion Build Environment Problem

Hi, you could also send your notes in Chinese to Liu, Ming and Zhang, Yi (copied on this email).

The Trafodion wiki is
https://cwiki.apache.org/confluence/display/TRAFODION/Apache+Trafodion+Home,
it is currently being converted from an earlier form. To edit this wiki, you can follow these steps:
https://cwiki.apache.org/confluence/display/TRAFODION/Contribute+to+the+Wiki
.

The old wiki was translated into Chinese ( https://wiki.trafodion.org/wiki/index.php/Main_Page?setlang=zh-hant ), we may plan something similar for the new Apache wiki.

Thank you,

Hans

On Tue, Sep 8, 2015 at 8:18 PM, Nieyuanyuan <ni...@huawei.com> wrote:

> Hi, Hans,
>
> Ok, I can do that, so far my steps were all written in Chinese, I need 
> to translate into English, any apache wiki link I can post?
>
> -----邮件原件-----
> 发件人: Hans Zeller [mailto:hans.zeller@esgyn.com]
> 发送时间: 2015年9月9日 9:43
> 收件人: dev
> 抄送: Lijian (Q)
> 主题: Re: 答复: [Urgent Help] Trafodion Build Environment Problem
>
> Hi Nieyuanyuan,
>
> If you have a list of things you did, like the examples with ANT, 
> MAVEN you already mentioned, that would be great. I know that this can 
> be hard to do, since often the installation is trial and error.
>
> Thank you,
>
> Hans
>
> On Tue, Sep 8, 2015 at 6:30 PM, Nieyuanyuan <ni...@huawei.com>
> wrote:
>
> > Hi, Hans,
> >
> > I am willing to do that, my environment is RHEL 6.5, and my server 
> > is behind a firewall (have to use proxy to download a lot of 
> > stuffs), so require some special configurations such as for ANT, MAVEN, curl.
> >
> > Also, I found some missing dependencies while following the original 
> > build guide to finish the whole build process, and some small 
> > mistakes which should be revised.
> >
> > Could you plz show me a way or a link to share my installation steps?
> >
> > Thanks.
> >
> > -----邮件原件-----
> > 发件人: Hans Zeller [mailto:hans.zeller@esgyn.com]
> > 发送时间: 2015年9月9日 0:27
> > 收件人: dev
> > 抄送: Lijian (Q)
> > 主题: Re: [Urgent Help] Trafodion Build Environment Problem
> >
> > Hi Nieyuanyuan,
> >
> > Some of us are also working on running Trafodion in a sandbox or on 
> > Apache objects. We hope to have documented steps on how to do that 
> > eventually. You mention you had to fix several things. If you have 
> > notes on what those are, would you share them?
> >
> > Thank you,
> >
> > Hans
> >
> > On Tue, Sep 8, 2015 at 9:19 AM, Amanda Moran 
> > <am...@esgyn.com>
> > wrote:
> >
> > > Hi there-
> > >
> > > This is fixed in latest version of installer.
> > >
> > > Thanks.
> > >
> > > Sent from my iPhone
> > >
> > > > On Sep 8, 2015, at 9:07 AM, Dave Birdsall 
> > > > <da...@esgyn.com>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > I'm wondering if this should be reported as a problem? Perhaps
> > > Nieyuanyuan
> > > > would like to open a JIRA about supporting higher PID numbers in
> > > Trafodion?
> > > >
> > > > Dave
> > > >
> > > > -----Original Message-----
> > > > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > > > Sent: Monday, September 7, 2015 7:04 PM
> > > > To: dev@trafodion.incubator.apache.org
> > > > Cc: Lijian (Q) <ji...@huawei.com>
> > > > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> > > >
> > > > Hi Nieyuanyuan,
> > > >
> > > > Could you please check the 'pid_max' settings:
> > > > sysctl -q kernel.pid_max
> > > > (or cat /proc/sys/kernel/pid_max)
> > > >
> > > > If the value is > 64K, I would recommend you set it to 64K, like so:
> > > > sudo sysctl -w kernel.pid_max=65535
> > > >
> > > > You will  have to restart Tradfodion and other Hadoop/HBase
> processes:
> > > > swstopall
> > > > ckillall
> > > > swstartall
> > > > sqstart
> > > >
> > > > Just fyi, to check the list of Trafodion processes only, please 
> > > > run
> > > 'cstat'
> > > > on your bash.
> > > >
> > > > Thanks,
> > > > -Narendra
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > > > Sent: Monday, September 7, 2015 6:40 PM
> > > > To: dev@trafodion.incubator.apache.org
> > > > Cc: Lijian (Q) <ji...@huawei.com>
> > > > Subject: [Urgent Help] Trafodion Build Environment Problem
> > > >
> > > > Dear Guys,
> > > >
> > > > I recently downloaded trafodion 1.1 from 
> > > > https://github.com/apache/incubator-trafodion/tree/stable/1.1, 
> > > > and
> > > followed
> > > > the build guide from
> > > > https://wiki.trafodion.org/wiki/index.php/Building_the_Software,
> > > > and
> > > solved
> > > > a lot of problems (no need to list all details), I am able to 
> > > > run
> > > trafodion
> > > > over a hadoop sandbox environment.
> > > >
> > > > But I got a serious problem, that is, all Trafodion related 
> > > > process will
> > > go
> > > > down after several minutes (not sure how long), only few of them 
> > > > will
> > > > left:
> > > > [nieyy@redhat-72 ~]$ ps ux
> > > > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> > COMMAND
> > > > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_namenode -Xmx1000m
> > > > -Djava.net.prefe
> > > > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_datanode -Xmx1000m
> > > > -Djava.net.prefe
> > > > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > > > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > > > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > > > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> > /bin/sh
> > > > ./bin/mysqld_safe
> > > >
> > --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > > > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > > >
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sq
> > > f/
> > > sq
> > > l/lo
> > > > cal_hadoop/mysql/bin/mysq
> > > > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00
> bash
> > > >
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sq
> > > f/
> > > sq
> > > l/lo
> > > > cal_hadoop/hbase/bin
> > > > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_master -XX:OnOutOfMemoryError=kill
> > > > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00
> > mpirun
> > > > -disable-auto-cleanup -demux select -env SQ_IC TCP -env 
> > > > MPI_ERROR_LEVEL
> > > > 2 -env SQ_PIDMAP 1 -
> > > > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > > > [hydra_pmi_proxy] <defunct>
> > > > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > > >
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sq
> > > f/
> > > ex
> > > port
> > > > /bin64d/monitor COLD
> > > > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > > >
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sq
> > > f/
> > > ex
> > > port
> > > > /bin64d/monitor COLD
> > > > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > >
> > > > And then I need to kill all processes and use swstartall and 
> > > > sqstart to reset the environment, however, the environment will 
> > > > still go down after
> > > a
> > > > while, and I need to restart again.
> > > >
> > > > I found some cores under
> > > > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scri
> > > > pt s, all cored were generated by mxssmp:
> > > > [nieyy@redhat-72 scripts]$ ll core* ...
> > > > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 
> > > > core.mxssmp.173357
> > > > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 
> > > > core.mxssmp.173372
> > > > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 
> > > > core.mxssmp.74146
> > > > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 
> > > > core.mxssmp.74197
> > > >
> > > > I used gdb to track the stack:
> > > > [nieyy@redhat-72 scripts]$ gdb
> > > >
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sq
> > > l/
> > > li
> > > b/li
> > > > nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > > > (gdb) where
> > > > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > > > ../runtimestats/SqlStats.h:271
> > > > #1  0x000000000043990a in StatsGlobals::removeProcess 
> > > > (this=0x10000000, pid=65536, calledAtAdd=0) at
> > > > ../runtimestats/SqlStats.cpp:276
> > > > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > > > (this=0x10000000, myPid=141469) at
> > > > ../runtimestats/SqlStats.cpp:382
> > > > #3  0x00000000004440be in SsmpGlobals::work 
> > > > (this=0x7f062660c7e8) at
> > > > ../runtimestats/ssmpipc.cpp:582
> > > > #4  0x000000000042f06a in runServer (argc=1, 
> > > > argv=0x7fff5b0e5a48) at
> > > > ../bin/ex_ssmp_main.cpp:259
> > > > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > > > ../bin/ex_ssmp_main.cpp:127
> > > >
> > > > Then I searched via Google, and found a link
> > > > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks 
> > > > similar,
> > > but
> > > > it claimed the bug has been fixed at v0.9, but my version is 1.1.
> > > >
> > > > So, could you kindly help me to solve this problem cause I can't 
> > > > find
> > > more
> > > > useful information via Google.
> > > >
> > > > Thanks a lot.
> > >
> >
>

Re: 答复: 答复: [Urgent Help] Trafodion Build Environment Problem

Posted by Hans Zeller <ha...@esgyn.com>.
Hi, you could also send your notes in Chinese to Liu, Ming and Zhang, Yi
(copied on this email).

The Trafodion wiki is
https://cwiki.apache.org/confluence/display/TRAFODION/Apache+Trafodion+Home,
it is currently being converted from an earlier form. To edit this wiki,
you can follow these steps:
https://cwiki.apache.org/confluence/display/TRAFODION/Contribute+to+the+Wiki
.

The old wiki was translated into Chinese (
https://wiki.trafodion.org/wiki/index.php/Main_Page?setlang=zh-hant ), we
may plan something similar for the new Apache wiki.

Thank you,

Hans

On Tue, Sep 8, 2015 at 8:18 PM, Nieyuanyuan <ni...@huawei.com> wrote:

> Hi, Hans,
>
> Ok, I can do that, so far my steps were all written in Chinese, I need to
> translate into English, any apache wiki link I can post?
>
> -----邮件原件-----
> 发件人: Hans Zeller [mailto:hans.zeller@esgyn.com]
> 发送时间: 2015年9月9日 9:43
> 收件人: dev
> 抄送: Lijian (Q)
> 主题: Re: 答复: [Urgent Help] Trafodion Build Environment Problem
>
> Hi Nieyuanyuan,
>
> If you have a list of things you did, like the examples with ANT, MAVEN
> you already mentioned, that would be great. I know that this can be hard to
> do, since often the installation is trial and error.
>
> Thank you,
>
> Hans
>
> On Tue, Sep 8, 2015 at 6:30 PM, Nieyuanyuan <ni...@huawei.com>
> wrote:
>
> > Hi, Hans,
> >
> > I am willing to do that, my environment is RHEL 6.5, and my server is
> > behind a firewall (have to use proxy to download a lot of stuffs), so
> > require some special configurations such as for ANT, MAVEN, curl.
> >
> > Also, I found some missing dependencies while following the original
> > build guide to finish the whole build process, and some small mistakes
> > which should be revised.
> >
> > Could you plz show me a way or a link to share my installation steps?
> >
> > Thanks.
> >
> > -----邮件原件-----
> > 发件人: Hans Zeller [mailto:hans.zeller@esgyn.com]
> > 发送时间: 2015年9月9日 0:27
> > 收件人: dev
> > 抄送: Lijian (Q)
> > 主题: Re: [Urgent Help] Trafodion Build Environment Problem
> >
> > Hi Nieyuanyuan,
> >
> > Some of us are also working on running Trafodion in a sandbox or on
> > Apache objects. We hope to have documented steps on how to do that
> > eventually. You mention you had to fix several things. If you have
> > notes on what those are, would you share them?
> >
> > Thank you,
> >
> > Hans
> >
> > On Tue, Sep 8, 2015 at 9:19 AM, Amanda Moran <am...@esgyn.com>
> > wrote:
> >
> > > Hi there-
> > >
> > > This is fixed in latest version of installer.
> > >
> > > Thanks.
> > >
> > > Sent from my iPhone
> > >
> > > > On Sep 8, 2015, at 9:07 AM, Dave Birdsall
> > > > <da...@esgyn.com>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > I'm wondering if this should be reported as a problem? Perhaps
> > > Nieyuanyuan
> > > > would like to open a JIRA about supporting higher PID numbers in
> > > Trafodion?
> > > >
> > > > Dave
> > > >
> > > > -----Original Message-----
> > > > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > > > Sent: Monday, September 7, 2015 7:04 PM
> > > > To: dev@trafodion.incubator.apache.org
> > > > Cc: Lijian (Q) <ji...@huawei.com>
> > > > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> > > >
> > > > Hi Nieyuanyuan,
> > > >
> > > > Could you please check the 'pid_max' settings:
> > > > sysctl -q kernel.pid_max
> > > > (or cat /proc/sys/kernel/pid_max)
> > > >
> > > > If the value is > 64K, I would recommend you set it to 64K, like so:
> > > > sudo sysctl -w kernel.pid_max=65535
> > > >
> > > > You will  have to restart Tradfodion and other Hadoop/HBase
> processes:
> > > > swstopall
> > > > ckillall
> > > > swstartall
> > > > sqstart
> > > >
> > > > Just fyi, to check the list of Trafodion processes only, please
> > > > run
> > > 'cstat'
> > > > on your bash.
> > > >
> > > > Thanks,
> > > > -Narendra
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > > > Sent: Monday, September 7, 2015 6:40 PM
> > > > To: dev@trafodion.incubator.apache.org
> > > > Cc: Lijian (Q) <ji...@huawei.com>
> > > > Subject: [Urgent Help] Trafodion Build Environment Problem
> > > >
> > > > Dear Guys,
> > > >
> > > > I recently downloaded trafodion 1.1 from
> > > > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> > > followed
> > > > the build guide from
> > > > https://wiki.trafodion.org/wiki/index.php/Building_the_Software,
> > > > and
> > > solved
> > > > a lot of problems (no need to list all details), I am able to run
> > > trafodion
> > > > over a hadoop sandbox environment.
> > > >
> > > > But I got a serious problem, that is, all Trafodion related
> > > > process will
> > > go
> > > > down after several minutes (not sure how long), only few of them
> > > > will
> > > > left:
> > > > [nieyy@redhat-72 ~]$ ps ux
> > > > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> > COMMAND
> > > > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_namenode -Xmx1000m
> > > > -Djava.net.prefe
> > > > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_datanode -Xmx1000m
> > > > -Djava.net.prefe
> > > > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > > > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > > > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > > > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> > /bin/sh
> > > > ./bin/mysqld_safe
> > > >
> > --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > > > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > > >
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > > sq
> > > l/lo
> > > > cal_hadoop/mysql/bin/mysq
> > > > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00
> bash
> > > >
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > > sq
> > > l/lo
> > > > cal_hadoop/hbase/bin
> > > > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_master -XX:OnOutOfMemoryError=kill
> > > > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00
> > mpirun
> > > > -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> > > > MPI_ERROR_LEVEL
> > > > 2 -env SQ_PIDMAP 1 -
> > > > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > > > [hydra_pmi_proxy] <defunct>
> > > > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > > >
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > > ex
> > > port
> > > > /bin64d/monitor COLD
> > > > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > > >
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > > ex
> > > port
> > > > /bin64d/monitor COLD
> > > > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > >
> > > > And then I need to kill all processes and use swstartall and
> > > > sqstart to reset the environment, however, the environment will
> > > > still go down after
> > > a
> > > > while, and I need to restart again.
> > > >
> > > > I found some cores under
> > > > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/script
> > > > s, all cored were generated by mxssmp:
> > > > [nieyy@redhat-72 scripts]$ ll core* ...
> > > > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > > > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > > > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > > > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> > > >
> > > > I used gdb to track the stack:
> > > > [nieyy@redhat-72 scripts]$ gdb
> > > >
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/
> > > li
> > > b/li
> > > > nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > > > (gdb) where
> > > > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > > > ../runtimestats/SqlStats.h:271
> > > > #1  0x000000000043990a in StatsGlobals::removeProcess
> > > > (this=0x10000000, pid=65536, calledAtAdd=0) at
> > > > ../runtimestats/SqlStats.cpp:276
> > > > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > > > (this=0x10000000, myPid=141469) at
> > > > ../runtimestats/SqlStats.cpp:382
> > > > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8)
> > > > at
> > > > ../runtimestats/ssmpipc.cpp:582
> > > > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48)
> > > > at
> > > > ../bin/ex_ssmp_main.cpp:259
> > > > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > > > ../bin/ex_ssmp_main.cpp:127
> > > >
> > > > Then I searched via Google, and found a link
> > > > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks
> > > > similar,
> > > but
> > > > it claimed the bug has been fixed at v0.9, but my version is 1.1.
> > > >
> > > > So, could you kindly help me to solve this problem cause I can't
> > > > find
> > > more
> > > > useful information via Google.
> > > >
> > > > Thanks a lot.
> > >
> >
>

答复: 答复: [Urgent Help] Trafodion Build Environment Problem

Posted by Nieyuanyuan <ni...@huawei.com>.
Hi, Hans,

Ok, I can do that, so far my steps were all written in Chinese, I need to translate into English, any apache wiki link I can post?

-----邮件原件-----
发件人: Hans Zeller [mailto:hans.zeller@esgyn.com] 
发送时间: 2015年9月9日 9:43
收件人: dev
抄送: Lijian (Q)
主题: Re: 答复: [Urgent Help] Trafodion Build Environment Problem

Hi Nieyuanyuan,

If you have a list of things you did, like the examples with ANT, MAVEN you already mentioned, that would be great. I know that this can be hard to do, since often the installation is trial and error.

Thank you,

Hans

On Tue, Sep 8, 2015 at 6:30 PM, Nieyuanyuan <ni...@huawei.com> wrote:

> Hi, Hans,
>
> I am willing to do that, my environment is RHEL 6.5, and my server is 
> behind a firewall (have to use proxy to download a lot of stuffs), so 
> require some special configurations such as for ANT, MAVEN, curl.
>
> Also, I found some missing dependencies while following the original 
> build guide to finish the whole build process, and some small mistakes 
> which should be revised.
>
> Could you plz show me a way or a link to share my installation steps?
>
> Thanks.
>
> -----邮件原件-----
> 发件人: Hans Zeller [mailto:hans.zeller@esgyn.com]
> 发送时间: 2015年9月9日 0:27
> 收件人: dev
> 抄送: Lijian (Q)
> 主题: Re: [Urgent Help] Trafodion Build Environment Problem
>
> Hi Nieyuanyuan,
>
> Some of us are also working on running Trafodion in a sandbox or on 
> Apache objects. We hope to have documented steps on how to do that 
> eventually. You mention you had to fix several things. If you have 
> notes on what those are, would you share them?
>
> Thank you,
>
> Hans
>
> On Tue, Sep 8, 2015 at 9:19 AM, Amanda Moran <am...@esgyn.com>
> wrote:
>
> > Hi there-
> >
> > This is fixed in latest version of installer.
> >
> > Thanks.
> >
> > Sent from my iPhone
> >
> > > On Sep 8, 2015, at 9:07 AM, Dave Birdsall 
> > > <da...@esgyn.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > I'm wondering if this should be reported as a problem? Perhaps
> > Nieyuanyuan
> > > would like to open a JIRA about supporting higher PID numbers in
> > Trafodion?
> > >
> > > Dave
> > >
> > > -----Original Message-----
> > > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > > Sent: Monday, September 7, 2015 7:04 PM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Hi Nieyuanyuan,
> > >
> > > Could you please check the 'pid_max' settings:
> > > sysctl -q kernel.pid_max
> > > (or cat /proc/sys/kernel/pid_max)
> > >
> > > If the value is > 64K, I would recommend you set it to 64K, like so:
> > > sudo sysctl -w kernel.pid_max=65535
> > >
> > > You will  have to restart Tradfodion and other Hadoop/HBase processes:
> > > swstopall
> > > ckillall
> > > swstartall
> > > sqstart
> > >
> > > Just fyi, to check the list of Trafodion processes only, please 
> > > run
> > 'cstat'
> > > on your bash.
> > >
> > > Thanks,
> > > -Narendra
> > >
> > >
> > > -----Original Message-----
> > > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > > Sent: Monday, September 7, 2015 6:40 PM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Dear Guys,
> > >
> > > I recently downloaded trafodion 1.1 from 
> > > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> > followed
> > > the build guide from
> > > https://wiki.trafodion.org/wiki/index.php/Building_the_Software, 
> > > and
> > solved
> > > a lot of problems (no need to list all details), I am able to run
> > trafodion
> > > over a hadoop sandbox environment.
> > >
> > > But I got a serious problem, that is, all Trafodion related 
> > > process will
> > go
> > > down after several minutes (not sure how long), only few of them 
> > > will
> > > left:
> > > [nieyy@redhat-72 ~]$ ps ux
> > > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> COMMAND
> > > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_namenode -Xmx1000m
> > > -Djava.net.prefe
> > > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_datanode -Xmx1000m
> > > -Djava.net.prefe
> > > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> /bin/sh
> > > ./bin/mysqld_safe
> > >
> --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > sq
> > l/lo
> > > cal_hadoop/mysql/bin/mysq
> > > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > sq
> > l/lo
> > > cal_hadoop/hbase/bin
> > > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_master -XX:OnOutOfMemoryError=kill
> > > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00
> mpirun
> > > -disable-auto-cleanup -demux select -env SQ_IC TCP -env 
> > > MPI_ERROR_LEVEL
> > > 2 -env SQ_PIDMAP 1 -
> > > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > > [hydra_pmi_proxy] <defunct>
> > > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > ex
> > port
> > > /bin64d/monitor COLD
> > > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > ex
> > port
> > > /bin64d/monitor COLD
> > > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > >
> > > And then I need to kill all processes and use swstartall and 
> > > sqstart to reset the environment, however, the environment will 
> > > still go down after
> > a
> > > while, and I need to restart again.
> > >
> > > I found some cores under
> > > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/script
> > > s, all cored were generated by mxssmp:
> > > [nieyy@redhat-72 scripts]$ ll core* ...
> > > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> > >
> > > I used gdb to track the stack:
> > > [nieyy@redhat-72 scripts]$ gdb
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/
> > li
> > b/li
> > > nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > > (gdb) where
> > > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > > ../runtimestats/SqlStats.h:271
> > > #1  0x000000000043990a in StatsGlobals::removeProcess 
> > > (this=0x10000000, pid=65536, calledAtAdd=0) at
> > > ../runtimestats/SqlStats.cpp:276
> > > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > > (this=0x10000000, myPid=141469) at 
> > > ../runtimestats/SqlStats.cpp:382
> > > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) 
> > > at
> > > ../runtimestats/ssmpipc.cpp:582
> > > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) 
> > > at
> > > ../bin/ex_ssmp_main.cpp:259
> > > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > > ../bin/ex_ssmp_main.cpp:127
> > >
> > > Then I searched via Google, and found a link
> > > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks 
> > > similar,
> > but
> > > it claimed the bug has been fixed at v0.9, but my version is 1.1.
> > >
> > > So, could you kindly help me to solve this problem cause I can't 
> > > find
> > more
> > > useful information via Google.
> > >
> > > Thanks a lot.
> >
>

Re: 答复: [Urgent Help] Trafodion Build Environment Problem

Posted by Hans Zeller <ha...@esgyn.com>.
Hi Nieyuanyuan,

If you have a list of things you did, like the examples with ANT, MAVEN you
already mentioned, that would be great. I know that this can be hard to do,
since often the installation is trial and error.

Thank you,

Hans

On Tue, Sep 8, 2015 at 6:30 PM, Nieyuanyuan <ni...@huawei.com> wrote:

> Hi, Hans,
>
> I am willing to do that, my environment is RHEL 6.5, and my server is
> behind a firewall (have to use proxy to download a lot of stuffs), so
> require some special configurations such as for ANT, MAVEN, curl.
>
> Also, I found some missing dependencies while following the original build
> guide to finish the whole build process, and some small mistakes which
> should be revised.
>
> Could you plz show me a way or a link to share my installation steps?
>
> Thanks.
>
> -----邮件原件-----
> 发件人: Hans Zeller [mailto:hans.zeller@esgyn.com]
> 发送时间: 2015年9月9日 0:27
> 收件人: dev
> 抄送: Lijian (Q)
> 主题: Re: [Urgent Help] Trafodion Build Environment Problem
>
> Hi Nieyuanyuan,
>
> Some of us are also working on running Trafodion in a sandbox or on Apache
> objects. We hope to have documented steps on how to do that eventually. You
> mention you had to fix several things. If you have notes on what those are,
> would you share them?
>
> Thank you,
>
> Hans
>
> On Tue, Sep 8, 2015 at 9:19 AM, Amanda Moran <am...@esgyn.com>
> wrote:
>
> > Hi there-
> >
> > This is fixed in latest version of installer.
> >
> > Thanks.
> >
> > Sent from my iPhone
> >
> > > On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > I'm wondering if this should be reported as a problem? Perhaps
> > Nieyuanyuan
> > > would like to open a JIRA about supporting higher PID numbers in
> > Trafodion?
> > >
> > > Dave
> > >
> > > -----Original Message-----
> > > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > > Sent: Monday, September 7, 2015 7:04 PM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Hi Nieyuanyuan,
> > >
> > > Could you please check the 'pid_max' settings:
> > > sysctl -q kernel.pid_max
> > > (or cat /proc/sys/kernel/pid_max)
> > >
> > > If the value is > 64K, I would recommend you set it to 64K, like so:
> > > sudo sysctl -w kernel.pid_max=65535
> > >
> > > You will  have to restart Tradfodion and other Hadoop/HBase processes:
> > > swstopall
> > > ckillall
> > > swstartall
> > > sqstart
> > >
> > > Just fyi, to check the list of Trafodion processes only, please run
> > 'cstat'
> > > on your bash.
> > >
> > > Thanks,
> > > -Narendra
> > >
> > >
> > > -----Original Message-----
> > > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > > Sent: Monday, September 7, 2015 6:40 PM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Dear Guys,
> > >
> > > I recently downloaded trafodion 1.1 from
> > > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> > followed
> > > the build guide from
> > > https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> > solved
> > > a lot of problems (no need to list all details), I am able to run
> > trafodion
> > > over a hadoop sandbox environment.
> > >
> > > But I got a serious problem, that is, all Trafodion related process
> > > will
> > go
> > > down after several minutes (not sure how long), only few of them
> > > will
> > > left:
> > > [nieyy@redhat-72 ~]$ ps ux
> > > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> COMMAND
> > > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_namenode -Xmx1000m
> > > -Djava.net.prefe
> > > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_datanode -Xmx1000m
> > > -Djava.net.prefe
> > > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> /bin/sh
> > > ./bin/mysqld_safe
> > >
> --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> > l/lo
> > > cal_hadoop/mysql/bin/mysq
> > > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> > l/lo
> > > cal_hadoop/hbase/bin
> > > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_master -XX:OnOutOfMemoryError=kill
> > > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00
> mpirun
> > > -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> > > MPI_ERROR_LEVEL
> > > 2 -env SQ_PIDMAP 1 -
> > > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > > [hydra_pmi_proxy] <defunct>
> > > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> > port
> > > /bin64d/monitor COLD
> > > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> > port
> > > /bin64d/monitor COLD
> > > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > >
> > > And then I need to kill all processes and use swstartall and sqstart
> > > to reset the environment, however, the environment will still go
> > > down after
> > a
> > > while, and I need to restart again.
> > >
> > > I found some cores under
> > > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts,
> > > all cored were generated by mxssmp:
> > > [nieyy@redhat-72 scripts]$ ll core*
> > > ...
> > > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> > >
> > > I used gdb to track the stack:
> > > [nieyy@redhat-72 scripts]$ gdb
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/li
> > b/li
> > > nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > > (gdb) where
> > > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > > ../runtimestats/SqlStats.h:271
> > > #1  0x000000000043990a in StatsGlobals::removeProcess
> > > (this=0x10000000, pid=65536, calledAtAdd=0) at
> > > ../runtimestats/SqlStats.cpp:276
> > > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > > (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> > > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> > > ../runtimestats/ssmpipc.cpp:582
> > > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> > > ../bin/ex_ssmp_main.cpp:259
> > > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > > ../bin/ex_ssmp_main.cpp:127
> > >
> > > Then I searched via Google, and found a link
> > > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks
> > > similar,
> > but
> > > it claimed the bug has been fixed at v0.9, but my version is 1.1.
> > >
> > > So, could you kindly help me to solve this problem cause I can't
> > > find
> > more
> > > useful information via Google.
> > >
> > > Thanks a lot.
> >
>

答复: [Urgent Help] Trafodion Build Environment Problem

Posted by Nieyuanyuan <ni...@huawei.com>.
Hi, Hans,

I am willing to do that, my environment is RHEL 6.5, and my server is behind a firewall (have to use proxy to download a lot of stuffs), so require some special configurations such as for ANT, MAVEN, curl.

Also, I found some missing dependencies while following the original build guide to finish the whole build process, and some small mistakes which should be revised.

Could you plz show me a way or a link to share my installation steps?

Thanks.

-----邮件原件-----
发件人: Hans Zeller [mailto:hans.zeller@esgyn.com] 
发送时间: 2015年9月9日 0:27
收件人: dev
抄送: Lijian (Q)
主题: Re: [Urgent Help] Trafodion Build Environment Problem

Hi Nieyuanyuan,

Some of us are also working on running Trafodion in a sandbox or on Apache objects. We hope to have documented steps on how to do that eventually. You mention you had to fix several things. If you have notes on what those are, would you share them?

Thank you,

Hans

On Tue, Sep 8, 2015 at 9:19 AM, Amanda Moran <am...@esgyn.com> wrote:

> Hi there-
>
> This is fixed in latest version of installer.
>
> Thanks.
>
> Sent from my iPhone
>
> > On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com>
> wrote:
> >
> > Hi,
> >
> > I'm wondering if this should be reported as a problem? Perhaps
> Nieyuanyuan
> > would like to open a JIRA about supporting higher PID numbers in
> Trafodion?
> >
> > Dave
> >
> > -----Original Message-----
> > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > Sent: Monday, September 7, 2015 7:04 PM
> > To: dev@trafodion.incubator.apache.org
> > Cc: Lijian (Q) <ji...@huawei.com>
> > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> >
> > Hi Nieyuanyuan,
> >
> > Could you please check the 'pid_max' settings:
> > sysctl -q kernel.pid_max
> > (or cat /proc/sys/kernel/pid_max)
> >
> > If the value is > 64K, I would recommend you set it to 64K, like so:
> > sudo sysctl -w kernel.pid_max=65535
> >
> > You will  have to restart Tradfodion and other Hadoop/HBase processes:
> > swstopall
> > ckillall
> > swstartall
> > sqstart
> >
> > Just fyi, to check the list of Trafodion processes only, please run
> 'cstat'
> > on your bash.
> >
> > Thanks,
> > -Narendra
> >
> >
> > -----Original Message-----
> > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > Sent: Monday, September 7, 2015 6:40 PM
> > To: dev@trafodion.incubator.apache.org
> > Cc: Lijian (Q) <ji...@huawei.com>
> > Subject: [Urgent Help] Trafodion Build Environment Problem
> >
> > Dear Guys,
> >
> > I recently downloaded trafodion 1.1 from 
> > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> followed
> > the build guide from
> > https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> solved
> > a lot of problems (no need to list all details), I am able to run
> trafodion
> > over a hadoop sandbox environment.
> >
> > But I got a serious problem, that is, all Trafodion related process 
> > will
> go
> > down after several minutes (not sure how long), only few of them 
> > will
> > left:
> > [nieyy@redhat-72 ~]$ ps ux
> > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_namenode -Xmx1000m
> > -Djava.net.prefe
> > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_datanode -Xmx1000m
> > -Djava.net.prefe
> > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00 /bin/sh
> > ./bin/mysqld_safe
> > --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> > cal_hadoop/mysql/bin/mysq
> > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> > cal_hadoop/hbase/bin
> > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java 
> > -Dproc_master -XX:OnOutOfMemoryError=kill
> > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
> > -disable-auto-cleanup -demux select -env SQ_IC TCP -env 
> > MPI_ERROR_LEVEL
> > 2 -env SQ_PIDMAP 1 -
> > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > [hydra_pmi_proxy] <defunct>
> > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> > /bin64d/monitor COLD
> > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> > /bin64d/monitor COLD
> > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> >
> > And then I need to kill all processes and use swstartall and sqstart 
> > to reset the environment, however, the environment will still go 
> > down after
> a
> > while, and I need to restart again.
> >
> > I found some cores under
> > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts, 
> > all cored were generated by mxssmp:
> > [nieyy@redhat-72 scripts]$ ll core*
> > ...
> > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> >
> > I used gdb to track the stack:
> > [nieyy@redhat-72 scripts]$ gdb
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/li
> b/li
> > nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > (gdb) where
> > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > ../runtimestats/SqlStats.h:271
> > #1  0x000000000043990a in StatsGlobals::removeProcess 
> > (this=0x10000000, pid=65536, calledAtAdd=0) at 
> > ../runtimestats/SqlStats.cpp:276
> > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> > ../runtimestats/ssmpipc.cpp:582
> > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> > ../bin/ex_ssmp_main.cpp:259
> > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > ../bin/ex_ssmp_main.cpp:127
> >
> > Then I searched via Google, and found a link
> > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks 
> > similar,
> but
> > it claimed the bug has been fixed at v0.9, but my version is 1.1.
> >
> > So, could you kindly help me to solve this problem cause I can't 
> > find
> more
> > useful information via Google.
> >
> > Thanks a lot.
>

Re: [Urgent Help] Trafodion Build Environment Problem

Posted by Qifan Chen <qi...@esgyn.com>.
If there is a collision, the run-time stats data from two or more processes
will be mixed.

On Tue, Sep 8, 2015 at 12:40 PM, Eric Owhadi <er...@esgyn.com> wrote:

> Would there be a huge problem to add a modulus 65535 to avoid this without
> moving to a hash and get performance impact?
> Eric
>
> -----Original Message-----
> From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com]
> Sent: Tuesday, September 8, 2015 12:27 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: RE: [Urgent Help] Trafodion Build Environment Problem
>
> The whole Trafodion stack may not have been tested for pids more than 65K.
> However, the problems with pids more than 65k will be first observed by
> mxssmp or mxsscp processes and it dumps core. These processes provide the
> capability to trouble shoot problems with query execution in Trafodion
> infrastructure by providing real time execution statistics.  Every
> Trafodion
> SQL processes is registered when it calls Trafodion SQL Cli calls and
> unregisters itself when it goes away. Internally, we use array for this
> purpose for performance reasons.
>
> Selva
>
> -----Original Message-----
> From: Qifan Chen [mailto:qifan.chen@esgyn.com]
> Sent: Tuesday, September 8, 2015 10:03 AM
> To: dev <de...@trafodion.incubator.apache.org>
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: Re: [Urgent Help] Trafodion Build Environment Problem
>
> For pids larger than 65K, we probably can use a hash table.  Thanks --Qifan
>
> On Tue, Sep 8, 2015 at 11:27 AM, Hans Zeller <ha...@esgyn.com>
> wrote:
>
> > Hi Nieyuanyuan,
> >
> > Some of us are also working on running Trafodion in a sandbox or on
> > Apache objects. We hope to have documented steps on how to do that
> > eventually. You mention you had to fix several things. If you have
> > notes on what those are, would you share them?
> >
> > Thank you,
> >
> > Hans
> >
> > On Tue, Sep 8, 2015 at 9:19 AM, Amanda Moran <am...@esgyn.com>
> > wrote:
> >
> > > Hi there-
> > >
> > > This is fixed in latest version of installer.
> > >
> > > Thanks.
> > >
> > > Sent from my iPhone
> > >
> > > > On Sep 8, 2015, at 9:07 AM, Dave Birdsall
> > > > <da...@esgyn.com>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > I'm wondering if this should be reported as a problem? Perhaps
> > > Nieyuanyuan
> > > > would like to open a JIRA about supporting higher PID numbers in
> > > Trafodion?
> > > >
> > > > Dave
> > > >
> > > > -----Original Message-----
> > > > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > > > Sent: Monday, September 7, 2015 7:04 PM
> > > > To: dev@trafodion.incubator.apache.org
> > > > Cc: Lijian (Q) <ji...@huawei.com>
> > > > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> > > >
> > > > Hi Nieyuanyuan,
> > > >
> > > > Could you please check the 'pid_max' settings:
> > > > sysctl -q kernel.pid_max
> > > > (or cat /proc/sys/kernel/pid_max)
> > > >
> > > > If the value is > 64K, I would recommend you set it to 64K, like so:
> > > > sudo sysctl -w kernel.pid_max=65535
> > > >
> > > > You will  have to restart Tradfodion and other Hadoop/HBase
> processes:
> > > > swstopall
> > > > ckillall
> > > > swstartall
> > > > sqstart
> > > >
> > > > Just fyi, to check the list of Trafodion processes only, please
> > > > run
> > > 'cstat'
> > > > on your bash.
> > > >
> > > > Thanks,
> > > > -Narendra
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > > > Sent: Monday, September 7, 2015 6:40 PM
> > > > To: dev@trafodion.incubator.apache.org
> > > > Cc: Lijian (Q) <ji...@huawei.com>
> > > > Subject: [Urgent Help] Trafodion Build Environment Problem
> > > >
> > > > Dear Guys,
> > > >
> > > > I recently downloaded trafodion 1.1 from
> > > > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> > > followed
> > > > the build guide from
> > > > https://wiki.trafodion.org/wiki/index.php/Building_the_Software,
> > > > and
> > > solved
> > > > a lot of problems (no need to list all details), I am able to run
> > > trafodion
> > > > over a hadoop sandbox environment.
> > > >
> > > > But I got a serious problem, that is, all Trafodion related
> > > > process
> > will
> > > go
> > > > down after several minutes (not sure how long), only few of them
> > > > will
> > > > left:
> > > > [nieyy@redhat-72 ~]$ ps ux
> > > > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> > COMMAND
> > > > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_namenode -Xmx1000m
> > > > -Djava.net.prefe
> > > > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_datanode -Xmx1000m
> > > > -Djava.net.prefe
> > > > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > > > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > > > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > > > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> > /bin/sh
> > > > ./bin/mysqld_safe
> > > >
> > --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > > > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > > >
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> > l/lo
> > > > cal_hadoop/mysql/bin/mysq
> > > > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00
> bash
> > > >
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> > l/lo
> > > > cal_hadoop/hbase/bin
> > > > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_master -XX:OnOutOfMemoryError=kill
> > > > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00
> > mpirun
> > > > -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> > > > MPI_ERROR_LEVEL
> > > > 2 -env SQ_PIDMAP 1 -
> > > > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > > > [hydra_pmi_proxy] <defunct>
> > > > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > > >
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> > port
> > > > /bin64d/monitor COLD
> > > > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > > >
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> > port
> > > > /bin64d/monitor COLD
> > > > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > >
> > > > And then I need to kill all processes and use swstartall and
> > > > sqstart to reset the environment, however, the environment will
> > > > still go down
> > after
> > > a
> > > > while, and I need to restart again.
> > > >
> > > > I found some cores under
> > > > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/script
> > > > s,
> > all
> > > > cored were generated by mxssmp:
> > > > [nieyy@redhat-72 scripts]$ ll core* ...
> > > > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > > > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > > > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > > > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> > > >
> > > > I used gdb to track the stack:
> > > > [nieyy@redhat-72 scripts]$ gdb
> > > >
> > >
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/li
> > b/li
> > > > nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > > > (gdb) where
> > > > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > > > ../runtimestats/SqlStats.h:271
> > > > #1  0x000000000043990a in StatsGlobals::removeProcess
> > > > (this=0x10000000, pid=65536, calledAtAdd=0) at
> > > > ../runtimestats/SqlStats.cpp:276
> > > > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > > > (this=0x10000000, myPid=141469) at
> > > > ../runtimestats/SqlStats.cpp:382
> > > > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8)
> > > > at
> > > > ../runtimestats/ssmpipc.cpp:582
> > > > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48)
> > > > at
> > > > ../bin/ex_ssmp_main.cpp:259
> > > > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > > > ../bin/ex_ssmp_main.cpp:127
> > > >
> > > > Then I searched via Google, and found a link
> > > > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks
> > > > similar,
> > > but
> > > > it claimed the bug has been fixed at v0.9, but my version is 1.1.
> > > >
> > > > So, could you kindly help me to solve this problem cause I can't
> > > > find
> > > more
> > > > useful information via Google.
> > > >
> > > > Thanks a lot.
> > >
> >
>
>
>
> --
> Regards, --Qifan
>



-- 
Regards, --Qifan

RE: [Urgent Help] Trafodion Build Environment Problem

Posted by Eric Owhadi <er...@esgyn.com>.
Would there be a huge problem to add a modulus 65535 to avoid this without
moving to a hash and get performance impact?
Eric

-----Original Message-----
From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com]
Sent: Tuesday, September 8, 2015 12:27 PM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: RE: [Urgent Help] Trafodion Build Environment Problem

The whole Trafodion stack may not have been tested for pids more than 65K.
However, the problems with pids more than 65k will be first observed by
mxssmp or mxsscp processes and it dumps core. These processes provide the
capability to trouble shoot problems with query execution in Trafodion
infrastructure by providing real time execution statistics.  Every Trafodion
SQL processes is registered when it calls Trafodion SQL Cli calls and
unregisters itself when it goes away. Internally, we use array for this
purpose for performance reasons.

Selva

-----Original Message-----
From: Qifan Chen [mailto:qifan.chen@esgyn.com]
Sent: Tuesday, September 8, 2015 10:03 AM
To: dev <de...@trafodion.incubator.apache.org>
Cc: Lijian (Q) <ji...@huawei.com>
Subject: Re: [Urgent Help] Trafodion Build Environment Problem

For pids larger than 65K, we probably can use a hash table.  Thanks --Qifan

On Tue, Sep 8, 2015 at 11:27 AM, Hans Zeller <ha...@esgyn.com> wrote:

> Hi Nieyuanyuan,
>
> Some of us are also working on running Trafodion in a sandbox or on
> Apache objects. We hope to have documented steps on how to do that
> eventually. You mention you had to fix several things. If you have
> notes on what those are, would you share them?
>
> Thank you,
>
> Hans
>
> On Tue, Sep 8, 2015 at 9:19 AM, Amanda Moran <am...@esgyn.com>
> wrote:
>
> > Hi there-
> >
> > This is fixed in latest version of installer.
> >
> > Thanks.
> >
> > Sent from my iPhone
> >
> > > On Sep 8, 2015, at 9:07 AM, Dave Birdsall
> > > <da...@esgyn.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > I'm wondering if this should be reported as a problem? Perhaps
> > Nieyuanyuan
> > > would like to open a JIRA about supporting higher PID numbers in
> > Trafodion?
> > >
> > > Dave
> > >
> > > -----Original Message-----
> > > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > > Sent: Monday, September 7, 2015 7:04 PM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Hi Nieyuanyuan,
> > >
> > > Could you please check the 'pid_max' settings:
> > > sysctl -q kernel.pid_max
> > > (or cat /proc/sys/kernel/pid_max)
> > >
> > > If the value is > 64K, I would recommend you set it to 64K, like so:
> > > sudo sysctl -w kernel.pid_max=65535
> > >
> > > You will  have to restart Tradfodion and other Hadoop/HBase processes:
> > > swstopall
> > > ckillall
> > > swstartall
> > > sqstart
> > >
> > > Just fyi, to check the list of Trafodion processes only, please
> > > run
> > 'cstat'
> > > on your bash.
> > >
> > > Thanks,
> > > -Narendra
> > >
> > >
> > > -----Original Message-----
> > > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > > Sent: Monday, September 7, 2015 6:40 PM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Dear Guys,
> > >
> > > I recently downloaded trafodion 1.1 from
> > > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> > followed
> > > the build guide from
> > > https://wiki.trafodion.org/wiki/index.php/Building_the_Software,
> > > and
> > solved
> > > a lot of problems (no need to list all details), I am able to run
> > trafodion
> > > over a hadoop sandbox environment.
> > >
> > > But I got a serious problem, that is, all Trafodion related
> > > process
> will
> > go
> > > down after several minutes (not sure how long), only few of them
> > > will
> > > left:
> > > [nieyy@redhat-72 ~]$ ps ux
> > > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> COMMAND
> > > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_namenode -Xmx1000m
> > > -Djava.net.prefe
> > > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_datanode -Xmx1000m
> > > -Djava.net.prefe
> > > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> /bin/sh
> > > ./bin/mysqld_safe
> > >
> --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> > > cal_hadoop/mysql/bin/mysq
> > > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> > > cal_hadoop/hbase/bin
> > > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_master -XX:OnOutOfMemoryError=kill
> > > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00
> mpirun
> > > -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> > > MPI_ERROR_LEVEL
> > > 2 -env SQ_PIDMAP 1 -
> > > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > > [hydra_pmi_proxy] <defunct>
> > > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> > > /bin64d/monitor COLD
> > > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> > > /bin64d/monitor COLD
> > > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > >
> > > And then I need to kill all processes and use swstartall and
> > > sqstart to reset the environment, however, the environment will
> > > still go down
> after
> > a
> > > while, and I need to restart again.
> > >
> > > I found some cores under
> > > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/script
> > > s,
> all
> > > cored were generated by mxssmp:
> > > [nieyy@redhat-72 scripts]$ ll core* ...
> > > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> > >
> > > I used gdb to track the stack:
> > > [nieyy@redhat-72 scripts]$ gdb
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/li
> b/li
> > > nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > > (gdb) where
> > > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > > ../runtimestats/SqlStats.h:271
> > > #1  0x000000000043990a in StatsGlobals::removeProcess
> > > (this=0x10000000, pid=65536, calledAtAdd=0) at
> > > ../runtimestats/SqlStats.cpp:276
> > > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > > (this=0x10000000, myPid=141469) at
> > > ../runtimestats/SqlStats.cpp:382
> > > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8)
> > > at
> > > ../runtimestats/ssmpipc.cpp:582
> > > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48)
> > > at
> > > ../bin/ex_ssmp_main.cpp:259
> > > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > > ../bin/ex_ssmp_main.cpp:127
> > >
> > > Then I searched via Google, and found a link
> > > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks
> > > similar,
> > but
> > > it claimed the bug has been fixed at v0.9, but my version is 1.1.
> > >
> > > So, could you kindly help me to solve this problem cause I can't
> > > find
> > more
> > > useful information via Google.
> > >
> > > Thanks a lot.
> >
>



--
Regards, --Qifan

RE: [Urgent Help] Trafodion Build Environment Problem

Posted by Selva Govindarajan <se...@esgyn.com>.
The whole Trafodion stack may not have been tested for pids more than 65K.
However, the problems with pids more than 65k will be first observed by
mxssmp or mxsscp processes and it dumps core. These processes provide the
capability to trouble shoot problems with query execution in Trafodion
infrastructure by providing real time execution statistics.  Every Trafodion
SQL processes is registered when it calls Trafodion SQL Cli calls and
unregisters itself when it goes away. Internally, we use array for this
purpose for performance reasons.

Selva

-----Original Message-----
From: Qifan Chen [mailto:qifan.chen@esgyn.com]
Sent: Tuesday, September 8, 2015 10:03 AM
To: dev <de...@trafodion.incubator.apache.org>
Cc: Lijian (Q) <ji...@huawei.com>
Subject: Re: [Urgent Help] Trafodion Build Environment Problem

For pids larger than 65K, we probably can use a hash table.  Thanks --Qifan

On Tue, Sep 8, 2015 at 11:27 AM, Hans Zeller <ha...@esgyn.com> wrote:

> Hi Nieyuanyuan,
>
> Some of us are also working on running Trafodion in a sandbox or on
> Apache objects. We hope to have documented steps on how to do that
> eventually. You mention you had to fix several things. If you have
> notes on what those are, would you share them?
>
> Thank you,
>
> Hans
>
> On Tue, Sep 8, 2015 at 9:19 AM, Amanda Moran <am...@esgyn.com>
> wrote:
>
> > Hi there-
> >
> > This is fixed in latest version of installer.
> >
> > Thanks.
> >
> > Sent from my iPhone
> >
> > > On Sep 8, 2015, at 9:07 AM, Dave Birdsall
> > > <da...@esgyn.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > I'm wondering if this should be reported as a problem? Perhaps
> > Nieyuanyuan
> > > would like to open a JIRA about supporting higher PID numbers in
> > Trafodion?
> > >
> > > Dave
> > >
> > > -----Original Message-----
> > > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > > Sent: Monday, September 7, 2015 7:04 PM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Hi Nieyuanyuan,
> > >
> > > Could you please check the 'pid_max' settings:
> > > sysctl -q kernel.pid_max
> > > (or cat /proc/sys/kernel/pid_max)
> > >
> > > If the value is > 64K, I would recommend you set it to 64K, like so:
> > > sudo sysctl -w kernel.pid_max=65535
> > >
> > > You will  have to restart Tradfodion and other Hadoop/HBase processes:
> > > swstopall
> > > ckillall
> > > swstartall
> > > sqstart
> > >
> > > Just fyi, to check the list of Trafodion processes only, please
> > > run
> > 'cstat'
> > > on your bash.
> > >
> > > Thanks,
> > > -Narendra
> > >
> > >
> > > -----Original Message-----
> > > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > > Sent: Monday, September 7, 2015 6:40 PM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Dear Guys,
> > >
> > > I recently downloaded trafodion 1.1 from
> > > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> > followed
> > > the build guide from
> > > https://wiki.trafodion.org/wiki/index.php/Building_the_Software,
> > > and
> > solved
> > > a lot of problems (no need to list all details), I am able to run
> > trafodion
> > > over a hadoop sandbox environment.
> > >
> > > But I got a serious problem, that is, all Trafodion related
> > > process
> will
> > go
> > > down after several minutes (not sure how long), only few of them
> > > will
> > > left:
> > > [nieyy@redhat-72 ~]$ ps ux
> > > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> COMMAND
> > > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_namenode -Xmx1000m
> > > -Djava.net.prefe
> > > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_datanode -Xmx1000m
> > > -Djava.net.prefe
> > > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> /bin/sh
> > > ./bin/mysqld_safe
> > >
> --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> > > cal_hadoop/mysql/bin/mysq
> > > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> > > cal_hadoop/hbase/bin
> > > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_master -XX:OnOutOfMemoryError=kill
> > > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00
> mpirun
> > > -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> > > MPI_ERROR_LEVEL
> > > 2 -env SQ_PIDMAP 1 -
> > > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > > [hydra_pmi_proxy] <defunct>
> > > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> > > /bin64d/monitor COLD
> > > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> > > /bin64d/monitor COLD
> > > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > >
> > > And then I need to kill all processes and use swstartall and
> > > sqstart to reset the environment, however, the environment will
> > > still go down
> after
> > a
> > > while, and I need to restart again.
> > >
> > > I found some cores under
> > > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/script
> > > s,
> all
> > > cored were generated by mxssmp:
> > > [nieyy@redhat-72 scripts]$ ll core* ...
> > > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> > >
> > > I used gdb to track the stack:
> > > [nieyy@redhat-72 scripts]$ gdb
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/li
> b/li
> > > nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > > (gdb) where
> > > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > > ../runtimestats/SqlStats.h:271
> > > #1  0x000000000043990a in StatsGlobals::removeProcess
> > > (this=0x10000000, pid=65536, calledAtAdd=0) at
> > > ../runtimestats/SqlStats.cpp:276
> > > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > > (this=0x10000000, myPid=141469) at
> > > ../runtimestats/SqlStats.cpp:382
> > > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8)
> > > at
> > > ../runtimestats/ssmpipc.cpp:582
> > > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48)
> > > at
> > > ../bin/ex_ssmp_main.cpp:259
> > > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > > ../bin/ex_ssmp_main.cpp:127
> > >
> > > Then I searched via Google, and found a link
> > > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks
> > > similar,
> > but
> > > it claimed the bug has been fixed at v0.9, but my version is 1.1.
> > >
> > > So, could you kindly help me to solve this problem cause I can't
> > > find
> > more
> > > useful information via Google.
> > >
> > > Thanks a lot.
> >
>



--
Regards, --Qifan

Re: [Urgent Help] Trafodion Build Environment Problem

Posted by Qifan Chen <qi...@esgyn.com>.
For pids larger than 65K, we probably can use a hash table.  Thanks --Qifan

On Tue, Sep 8, 2015 at 11:27 AM, Hans Zeller <ha...@esgyn.com> wrote:

> Hi Nieyuanyuan,
>
> Some of us are also working on running Trafodion in a sandbox or on Apache
> objects. We hope to have documented steps on how to do that eventually. You
> mention you had to fix several things. If you have notes on what those are,
> would you share them?
>
> Thank you,
>
> Hans
>
> On Tue, Sep 8, 2015 at 9:19 AM, Amanda Moran <am...@esgyn.com>
> wrote:
>
> > Hi there-
> >
> > This is fixed in latest version of installer.
> >
> > Thanks.
> >
> > Sent from my iPhone
> >
> > > On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > I'm wondering if this should be reported as a problem? Perhaps
> > Nieyuanyuan
> > > would like to open a JIRA about supporting higher PID numbers in
> > Trafodion?
> > >
> > > Dave
> > >
> > > -----Original Message-----
> > > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > > Sent: Monday, September 7, 2015 7:04 PM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Hi Nieyuanyuan,
> > >
> > > Could you please check the 'pid_max' settings:
> > > sysctl -q kernel.pid_max
> > > (or cat /proc/sys/kernel/pid_max)
> > >
> > > If the value is > 64K, I would recommend you set it to 64K, like so:
> > > sudo sysctl -w kernel.pid_max=65535
> > >
> > > You will  have to restart Tradfodion and other Hadoop/HBase processes:
> > > swstopall
> > > ckillall
> > > swstartall
> > > sqstart
> > >
> > > Just fyi, to check the list of Trafodion processes only, please run
> > 'cstat'
> > > on your bash.
> > >
> > > Thanks,
> > > -Narendra
> > >
> > >
> > > -----Original Message-----
> > > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > > Sent: Monday, September 7, 2015 6:40 PM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Dear Guys,
> > >
> > > I recently downloaded trafodion 1.1 from
> > > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> > followed
> > > the build guide from
> > > https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> > solved
> > > a lot of problems (no need to list all details), I am able to run
> > trafodion
> > > over a hadoop sandbox environment.
> > >
> > > But I got a serious problem, that is, all Trafodion related process
> will
> > go
> > > down after several minutes (not sure how long), only few of them will
> > > left:
> > > [nieyy@redhat-72 ~]$ ps ux
> > > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> COMMAND
> > > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_namenode -Xmx1000m
> > > -Djava.net.prefe
> > > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_datanode -Xmx1000m
> > > -Djava.net.prefe
> > > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> /bin/sh
> > > ./bin/mysqld_safe
> > >
> --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/lo
> > > cal_hadoop/mysql/bin/mysq
> > > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/lo
> > > cal_hadoop/hbase/bin
> > > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master
> > > -XX:OnOutOfMemoryError=kill
> > > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00
> mpirun
> > > -disable-auto-cleanup -demux select -env SQ_IC TCP -env MPI_ERROR_LEVEL
> > > 2 -env SQ_PIDMAP 1 -
> > > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > > [hydra_pmi_proxy] <defunct>
> > > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export
> > > /bin64d/monitor COLD
> > > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export
> > > /bin64d/monitor COLD
> > > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > >
> > > And then I need to kill all processes and use swstartall and sqstart to
> > > reset the environment, however, the environment will still go down
> after
> > a
> > > while, and I need to restart again.
> > >
> > > I found some cores under
> > > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts,
> all
> > > cored were generated by mxssmp:
> > > [nieyy@redhat-72 scripts]$ ll core*
> > > ...
> > > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> > >
> > > I used gdb to track the stack:
> > > [nieyy@redhat-72 scripts]$ gdb
> > >
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/lib/li
> > > nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > > (gdb) where
> > > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > > ../runtimestats/SqlStats.h:271
> > > #1  0x000000000043990a in StatsGlobals::removeProcess (this=0x10000000,
> > > pid=65536, calledAtAdd=0) at ../runtimestats/SqlStats.cpp:276
> > > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > > (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> > > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> > > ../runtimestats/ssmpipc.cpp:582
> > > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> > > ../bin/ex_ssmp_main.cpp:259
> > > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > > ../bin/ex_ssmp_main.cpp:127
> > >
> > > Then I searched via Google, and found a link
> > > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar,
> > but
> > > it claimed the bug has been fixed at v0.9, but my version is 1.1.
> > >
> > > So, could you kindly help me to solve this problem cause I can't find
> > more
> > > useful information via Google.
> > >
> > > Thanks a lot.
> >
>



-- 
Regards, --Qifan

Re: [Urgent Help] Trafodion Build Environment Problem

Posted by Hans Zeller <ha...@esgyn.com>.
Hi Nieyuanyuan,

Some of us are also working on running Trafodion in a sandbox or on Apache
objects. We hope to have documented steps on how to do that eventually. You
mention you had to fix several things. If you have notes on what those are,
would you share them?

Thank you,

Hans

On Tue, Sep 8, 2015 at 9:19 AM, Amanda Moran <am...@esgyn.com> wrote:

> Hi there-
>
> This is fixed in latest version of installer.
>
> Thanks.
>
> Sent from my iPhone
>
> > On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com>
> wrote:
> >
> > Hi,
> >
> > I'm wondering if this should be reported as a problem? Perhaps
> Nieyuanyuan
> > would like to open a JIRA about supporting higher PID numbers in
> Trafodion?
> >
> > Dave
> >
> > -----Original Message-----
> > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > Sent: Monday, September 7, 2015 7:04 PM
> > To: dev@trafodion.incubator.apache.org
> > Cc: Lijian (Q) <ji...@huawei.com>
> > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> >
> > Hi Nieyuanyuan,
> >
> > Could you please check the 'pid_max' settings:
> > sysctl -q kernel.pid_max
> > (or cat /proc/sys/kernel/pid_max)
> >
> > If the value is > 64K, I would recommend you set it to 64K, like so:
> > sudo sysctl -w kernel.pid_max=65535
> >
> > You will  have to restart Tradfodion and other Hadoop/HBase processes:
> > swstopall
> > ckillall
> > swstartall
> > sqstart
> >
> > Just fyi, to check the list of Trafodion processes only, please run
> 'cstat'
> > on your bash.
> >
> > Thanks,
> > -Narendra
> >
> >
> > -----Original Message-----
> > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > Sent: Monday, September 7, 2015 6:40 PM
> > To: dev@trafodion.incubator.apache.org
> > Cc: Lijian (Q) <ji...@huawei.com>
> > Subject: [Urgent Help] Trafodion Build Environment Problem
> >
> > Dear Guys,
> >
> > I recently downloaded trafodion 1.1 from
> > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> followed
> > the build guide from
> > https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> solved
> > a lot of problems (no need to list all details), I am able to run
> trafodion
> > over a hadoop sandbox environment.
> >
> > But I got a serious problem, that is, all Trafodion related process will
> go
> > down after several minutes (not sure how long), only few of them will
> > left:
> > [nieyy@redhat-72 ~]$ ps ux
> > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_namenode -Xmx1000m
> > -Djava.net.prefe
> > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_datanode -Xmx1000m
> > -Djava.net.prefe
> > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00 /bin/sh
> > ./bin/mysqld_safe
> > --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/lo
> > cal_hadoop/mysql/bin/mysq
> > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/lo
> > cal_hadoop/hbase/bin
> > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master
> > -XX:OnOutOfMemoryError=kill
> > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
> > -disable-auto-cleanup -demux select -env SQ_IC TCP -env MPI_ERROR_LEVEL
> > 2 -env SQ_PIDMAP 1 -
> > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > [hydra_pmi_proxy] <defunct>
> > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export
> > /bin64d/monitor COLD
> > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export
> > /bin64d/monitor COLD
> > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> >
> > And then I need to kill all processes and use swstartall and sqstart to
> > reset the environment, however, the environment will still go down after
> a
> > while, and I need to restart again.
> >
> > I found some cores under
> > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts, all
> > cored were generated by mxssmp:
> > [nieyy@redhat-72 scripts]$ ll core*
> > ...
> > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> >
> > I used gdb to track the stack:
> > [nieyy@redhat-72 scripts]$ gdb
> >
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/lib/li
> > nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > (gdb) where
> > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > ../runtimestats/SqlStats.h:271
> > #1  0x000000000043990a in StatsGlobals::removeProcess (this=0x10000000,
> > pid=65536, calledAtAdd=0) at ../runtimestats/SqlStats.cpp:276
> > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> > ../runtimestats/ssmpipc.cpp:582
> > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> > ../bin/ex_ssmp_main.cpp:259
> > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > ../bin/ex_ssmp_main.cpp:127
> >
> > Then I searched via Google, and found a link
> > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar,
> but
> > it claimed the bug has been fixed at v0.9, but my version is 1.1.
> >
> > So, could you kindly help me to solve this problem cause I can't find
> more
> > useful information via Google.
> >
> > Thanks a lot.
>

RE: [Urgent Help] Trafodion Build Environment Problem

Posted by Dave Birdsall <da...@esgyn.com>.
Yes.

-----Original Message-----
From: Atanu Mishra [mailto:atanu.mishra@esgyn.com]
Sent: Tuesday, September 8, 2015 12:01 PM
To: dev@trafodion.incubator.apache.org
Subject: RE: [Urgent Help] Trafodion Build Environment Problem

This was the Exceed project, if I recall right?

Regards,
Atanu

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Tuesday, September 8, 2015 11:06 AM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: RE: [Urgent Help] Trafodion Build Environment Problem

Hi,

Going forward, though, we are likely to see requests to support higher Pid
#s. As the number of cores per node ramps up so will the number of Pids.
This is not unlike the problem many of us dealt with 20 years ago at Tandem
when the number of Pids exceeded 255. So this is something we should
consider addressing at some point.

Dave

-----Original Message-----
From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com]
Sent: Tuesday, September 8, 2015 11:02 AM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: RE: [Urgent Help] Trafodion Build Environment Problem

Hi Amanda,

I presume that the installer will flag this as a requirement for Trafodion
to be installed. Will it abort the installation or will the installer fix
the pid_max settings automatically.

Selva

-----Original Message-----
From: Amanda Moran [mailto:amanda.moran@esgyn.com]
Sent: Tuesday, September 8, 2015 9:20 AM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: Re: [Urgent Help] Trafodion Build Environment Problem

Hi there-

This is fixed in latest version of installer.

Thanks.

Sent from my iPhone

> On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com>
wrote:
>
> Hi,
>
> I'm wondering if this should be reported as a problem? Perhaps
> Nieyuanyuan would like to open a JIRA about supporting higher PID
numbers in Trafodion?
>
> Dave
>
> -----Original Message-----
> From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> Sent: Monday, September 7, 2015 7:04 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: RE: [Urgent Help] Trafodion Build Environment Problem
>
> Hi Nieyuanyuan,
>
> Could you please check the 'pid_max' settings:
> sysctl -q kernel.pid_max
> (or cat /proc/sys/kernel/pid_max)
>
> If the value is > 64K, I would recommend you set it to 64K, like so:
> sudo sysctl -w kernel.pid_max=65535
>
> You will  have to restart Tradfodion and other Hadoop/HBase processes:
> swstopall
> ckillall
> swstartall
> sqstart
>
> Just fyi, to check the list of Trafodion processes only, please run
'cstat'
> on your bash.
>
> Thanks,
> -Narendra
>
>
> -----Original Message-----
> From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> Sent: Monday, September 7, 2015 6:40 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: [Urgent Help] Trafodion Build Environment Problem
>
> Dear Guys,
>
> I recently downloaded trafodion 1.1 from
> https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> followed the build guide from
> https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> solved a lot of problems (no need to list all details), I am able to
> run trafodion over a hadoop sandbox environment.
>
> But I got a serious problem, that is, all Trafodion related process
> will go down after several minutes (not sure how long), only few of
> them will
> left:
> [nieyy@redhat-72 ~]$ ps ux
> USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
COMMAND
> nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_namenode -Xmx1000m
> -Djava.net.prefe
> nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_datanode -Xmx1000m
> -Djava.net.prefe
> nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_secondarynamenode -Xmx1000m -Djava.
> nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
/bin/sh
> ./bin/mysqld_safe
>
--defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> cal_hadoop/mysql/bin/mysq
> nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> cal_hadoop/hbase/bin
> nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master
> -XX:OnOutOfMemoryError=kill
> nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
> -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> MPI_ERROR_LEVEL
> 2 -env SQ_PIDMAP 1 -
> nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> [hydra_pmi_proxy] <defunct>
> nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> /bin64d/monitor COLD
> nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> /bin64d/monitor COLD
> nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
>
> And then I need to kill all processes and use swstartall and sqstart
> to reset the environment, however, the environment will still go down
> after a while, and I need to restart again.
>
> I found some cores under
> trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts,
> all cored were generated by mxssmp:
> [nieyy@redhat-72 scripts]$ ll core*
> ...
> -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
>
> I used gdb to track the stack:
> [nieyy@redhat-72 scripts]$ gdb
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/li
> b/li nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> (gdb) where
> #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> ../runtimestats/SqlStats.h:271
> #1  0x000000000043990a in StatsGlobals::removeProcess
> (this=0x10000000, pid=65536, calledAtAdd=0) at
> ../runtimestats/SqlStats.cpp:276
> #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> ../runtimestats/ssmpipc.cpp:582
> #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> ../bin/ex_ssmp_main.cpp:259
> #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> ../bin/ex_ssmp_main.cpp:127
>
> Then I searched via Google, and found a link
> https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar,
> but it claimed the bug has been fixed at v0.9, but my version is 1.1.
>
> So, could you kindly help me to solve this problem cause I can't find
> more useful information via Google.
>
> Thanks a lot.

RE: [Urgent Help] Trafodion Build Environment Problem

Posted by Atanu Mishra <at...@esgyn.com>.
This was the Exceed project, if I recall right?

Regards,
Atanu

-----Original Message-----
From: Dave Birdsall [mailto:dave.birdsall@esgyn.com]
Sent: Tuesday, September 8, 2015 11:06 AM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: RE: [Urgent Help] Trafodion Build Environment Problem

Hi,

Going forward, though, we are likely to see requests to support higher Pid
#s. As the number of cores per node ramps up so will the number of Pids.
This is not unlike the problem many of us dealt with 20 years ago at Tandem
when the number of Pids exceeded 255. So this is something we should
consider addressing at some point.

Dave

-----Original Message-----
From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com]
Sent: Tuesday, September 8, 2015 11:02 AM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: RE: [Urgent Help] Trafodion Build Environment Problem

Hi Amanda,

I presume that the installer will flag this as a requirement for Trafodion
to be installed. Will it abort the installation or will the installer fix
the pid_max settings automatically.

Selva

-----Original Message-----
From: Amanda Moran [mailto:amanda.moran@esgyn.com]
Sent: Tuesday, September 8, 2015 9:20 AM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: Re: [Urgent Help] Trafodion Build Environment Problem

Hi there-

This is fixed in latest version of installer.

Thanks.

Sent from my iPhone

> On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com>
wrote:
>
> Hi,
>
> I'm wondering if this should be reported as a problem? Perhaps
> Nieyuanyuan would like to open a JIRA about supporting higher PID
numbers in Trafodion?
>
> Dave
>
> -----Original Message-----
> From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> Sent: Monday, September 7, 2015 7:04 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: RE: [Urgent Help] Trafodion Build Environment Problem
>
> Hi Nieyuanyuan,
>
> Could you please check the 'pid_max' settings:
> sysctl -q kernel.pid_max
> (or cat /proc/sys/kernel/pid_max)
>
> If the value is > 64K, I would recommend you set it to 64K, like so:
> sudo sysctl -w kernel.pid_max=65535
>
> You will  have to restart Tradfodion and other Hadoop/HBase processes:
> swstopall
> ckillall
> swstartall
> sqstart
>
> Just fyi, to check the list of Trafodion processes only, please run
'cstat'
> on your bash.
>
> Thanks,
> -Narendra
>
>
> -----Original Message-----
> From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> Sent: Monday, September 7, 2015 6:40 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: [Urgent Help] Trafodion Build Environment Problem
>
> Dear Guys,
>
> I recently downloaded trafodion 1.1 from
> https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> followed the build guide from
> https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> solved a lot of problems (no need to list all details), I am able to
> run trafodion over a hadoop sandbox environment.
>
> But I got a serious problem, that is, all Trafodion related process
> will go down after several minutes (not sure how long), only few of
> them will
> left:
> [nieyy@redhat-72 ~]$ ps ux
> USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
COMMAND
> nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_namenode -Xmx1000m
> -Djava.net.prefe
> nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_datanode -Xmx1000m
> -Djava.net.prefe
> nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_secondarynamenode -Xmx1000m -Djava.
> nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
/bin/sh
> ./bin/mysqld_safe
>
--defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> cal_hadoop/mysql/bin/mysq
> nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> cal_hadoop/hbase/bin
> nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master
> -XX:OnOutOfMemoryError=kill
> nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
> -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> MPI_ERROR_LEVEL
> 2 -env SQ_PIDMAP 1 -
> nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> [hydra_pmi_proxy] <defunct>
> nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> /bin64d/monitor COLD
> nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> /bin64d/monitor COLD
> nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
>
> And then I need to kill all processes and use swstartall and sqstart
> to reset the environment, however, the environment will still go down
> after a while, and I need to restart again.
>
> I found some cores under
> trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts,
> all cored were generated by mxssmp:
> [nieyy@redhat-72 scripts]$ ll core*
> ...
> -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
>
> I used gdb to track the stack:
> [nieyy@redhat-72 scripts]$ gdb
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/li
> b/li nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> (gdb) where
> #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> ../runtimestats/SqlStats.h:271
> #1  0x000000000043990a in StatsGlobals::removeProcess
> (this=0x10000000, pid=65536, calledAtAdd=0) at
> ../runtimestats/SqlStats.cpp:276
> #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> ../runtimestats/ssmpipc.cpp:582
> #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> ../bin/ex_ssmp_main.cpp:259
> #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> ../bin/ex_ssmp_main.cpp:127
>
> Then I searched via Google, and found a link
> https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar,
> but it claimed the bug has been fixed at v0.9, but my version is 1.1.
>
> So, could you kindly help me to solve this problem cause I can't find
> more useful information via Google.
>
> Thanks a lot.

RE: [Urgent Help] Trafodion Build Environment Problem

Posted by Dave Birdsall <da...@esgyn.com>.
Hi,

Going forward, though, we are likely to see requests to support higher Pid
#s. As the number of cores per node ramps up so will the number of Pids.
This is not unlike the problem many of us dealt with 20 years ago at Tandem
when the number of Pids exceeded 255. So this is something we should
consider addressing at some point.

Dave

-----Original Message-----
From: Selva Govindarajan [mailto:selva.govindarajan@esgyn.com]
Sent: Tuesday, September 8, 2015 11:02 AM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: RE: [Urgent Help] Trafodion Build Environment Problem

Hi Amanda,

I presume that the installer will flag this as a requirement for Trafodion
to be installed. Will it abort the installation or will the installer fix
the pid_max settings automatically.

Selva

-----Original Message-----
From: Amanda Moran [mailto:amanda.moran@esgyn.com]
Sent: Tuesday, September 8, 2015 9:20 AM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: Re: [Urgent Help] Trafodion Build Environment Problem

Hi there-

This is fixed in latest version of installer.

Thanks.

Sent from my iPhone

> On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com>
wrote:
>
> Hi,
>
> I'm wondering if this should be reported as a problem? Perhaps
> Nieyuanyuan would like to open a JIRA about supporting higher PID
numbers in Trafodion?
>
> Dave
>
> -----Original Message-----
> From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> Sent: Monday, September 7, 2015 7:04 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: RE: [Urgent Help] Trafodion Build Environment Problem
>
> Hi Nieyuanyuan,
>
> Could you please check the 'pid_max' settings:
> sysctl -q kernel.pid_max
> (or cat /proc/sys/kernel/pid_max)
>
> If the value is > 64K, I would recommend you set it to 64K, like so:
> sudo sysctl -w kernel.pid_max=65535
>
> You will  have to restart Tradfodion and other Hadoop/HBase processes:
> swstopall
> ckillall
> swstartall
> sqstart
>
> Just fyi, to check the list of Trafodion processes only, please run
'cstat'
> on your bash.
>
> Thanks,
> -Narendra
>
>
> -----Original Message-----
> From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> Sent: Monday, September 7, 2015 6:40 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: [Urgent Help] Trafodion Build Environment Problem
>
> Dear Guys,
>
> I recently downloaded trafodion 1.1 from
> https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> followed the build guide from
> https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> solved a lot of problems (no need to list all details), I am able to
> run trafodion over a hadoop sandbox environment.
>
> But I got a serious problem, that is, all Trafodion related process
> will go down after several minutes (not sure how long), only few of
> them will
> left:
> [nieyy@redhat-72 ~]$ ps ux
> USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
COMMAND
> nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_namenode -Xmx1000m
> -Djava.net.prefe
> nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_datanode -Xmx1000m
> -Djava.net.prefe
> nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_secondarynamenode -Xmx1000m -Djava.
> nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
/bin/sh
> ./bin/mysqld_safe
>
--defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> cal_hadoop/mysql/bin/mysq
> nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> cal_hadoop/hbase/bin
> nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master
> -XX:OnOutOfMemoryError=kill
> nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
> -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> MPI_ERROR_LEVEL
> 2 -env SQ_PIDMAP 1 -
> nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> [hydra_pmi_proxy] <defunct>
> nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> /bin64d/monitor COLD
> nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> /bin64d/monitor COLD
> nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
>
> And then I need to kill all processes and use swstartall and sqstart
> to reset the environment, however, the environment will still go down
> after a while, and I need to restart again.
>
> I found some cores under
> trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts,
> all cored were generated by mxssmp:
> [nieyy@redhat-72 scripts]$ ll core*
> ...
> -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
>
> I used gdb to track the stack:
> [nieyy@redhat-72 scripts]$ gdb
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/li
> b/li nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> (gdb) where
> #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> ../runtimestats/SqlStats.h:271
> #1  0x000000000043990a in StatsGlobals::removeProcess
> (this=0x10000000, pid=65536, calledAtAdd=0) at
> ../runtimestats/SqlStats.cpp:276
> #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> ../runtimestats/ssmpipc.cpp:582
> #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> ../bin/ex_ssmp_main.cpp:259
> #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> ../bin/ex_ssmp_main.cpp:127
>
> Then I searched via Google, and found a link
> https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar,
> but it claimed the bug has been fixed at v0.9, but my version is 1.1.
>
> So, could you kindly help me to solve this problem cause I can't find
> more useful information via Google.
>
> Thanks a lot.

Re: [Urgent Help] Trafodion Build Environment Problem

Posted by Qifan Chen <qi...@esgyn.com>.
The best approach would be neutral and not to restrict the # PIDs to be
within a range.

The argument is similar to the one about whether Trafodion should increase
a time-out value for HBase Scans.  Our conclusion is that to be a good
citizen in the Hadoop Eco system, and to be able to handle the mix
workload, it is better not to touch that value.

Thanks --Qifan

On Tue, Sep 8, 2015 at 1:44 PM, Amanda Moran <am...@esgyn.com> wrote:

> What is the "too small" number? Also what is the guidance if the number is
> too large (other than setting the kernel.pid_max=65535 on all nodes).
>
> Looks like we need two jira's created one for the installer and one for
> Trafodion core.
>
> Thanks.
>
> On Tue, Sep 8, 2015 at 11:26 AM, Gunnar Tapper <gu...@esgyn.com>
> wrote:
>
> > Hi,
> >
> > I am not sure this is a good idea since this might cause issues for the
> > overall configuration. For example, Cassandra recommends 999999 for
> > kernel.pid_max while Hawq wants at least 798720. IBM Big Insight wants
> > another number. Overriding their settings would make Trafodion a bad
> > citizen
> > in a Hadoop stack.
> >
> > A better approach might be to check the current value recommending an
> > increase if too small and provide guidance if it's too large.
> >
> > Thanks,
> >
> > Gunnar
> >
> > -----Original Message-----
> > From: Amanda Moran [mailto:amanda.moran@esgyn.com]
> > Sent: Tuesday, September 8, 2015 12:08 PM
> > To: dev <de...@trafodion.incubator.apache.org>
> > Subject: Re: [Urgent Help] Trafodion Build Environment Problem
> >
> > Hi there All-
> >
> > Sorry if my first email was confusing. The "problem" itself is not fixed
> by
> > the installer, the installer just sets sudo sysctl -w
> kernel.pid_max=65535
> > on all nodes.
> >
> > Thanks.
> >
> > On Tue, Sep 8, 2015 at 11:02 AM, Selva Govindarajan <
> > selva.govindarajan@esgyn.com> wrote:
> >
> > > Hi Amanda,
> > >
> > > I presume that the installer will flag this as a requirement for
> > > Trafodion to be installed. Will it abort the installation or will the
> > > installer fix the pid_max settings automatically.
> > >
> > > Selva
> > >
> > > -----Original Message-----
> > > From: Amanda Moran [mailto:amanda.moran@esgyn.com]
> > > Sent: Tuesday, September 8, 2015 9:20 AM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: Re: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Hi there-
> > >
> > > This is fixed in latest version of installer.
> > >
> > > Thanks.
> > >
> > > Sent from my iPhone
> > >
> > > > On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > I'm wondering if this should be reported as a problem? Perhaps
> > > > Nieyuanyuan would like to open a JIRA about supporting higher PID
> > > numbers in Trafodion?
> > > >
> > > > Dave
> > > >
> > > > -----Original Message-----
> > > > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > > > Sent: Monday, September 7, 2015 7:04 PM
> > > > To: dev@trafodion.incubator.apache.org
> > > > Cc: Lijian (Q) <ji...@huawei.com>
> > > > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> > > >
> > > > Hi Nieyuanyuan,
> > > >
> > > > Could you please check the 'pid_max' settings:
> > > > sysctl -q kernel.pid_max
> > > > (or cat /proc/sys/kernel/pid_max)
> > > >
> > > > If the value is > 64K, I would recommend you set it to 64K, like so:
> > > > sudo sysctl -w kernel.pid_max=65535
> > > >
> > > > You will  have to restart Tradfodion and other Hadoop/HBase
> processes:
> > > > swstopall
> > > > ckillall
> > > > swstartall
> > > > sqstart
> > > >
> > > > Just fyi, to check the list of Trafodion processes only, please run
> > > 'cstat'
> > > > on your bash.
> > > >
> > > > Thanks,
> > > > -Narendra
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > > > Sent: Monday, September 7, 2015 6:40 PM
> > > > To: dev@trafodion.incubator.apache.org
> > > > Cc: Lijian (Q) <ji...@huawei.com>
> > > > Subject: [Urgent Help] Trafodion Build Environment Problem
> > > >
> > > > Dear Guys,
> > > >
> > > > I recently downloaded trafodion 1.1 from
> > > > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> > > > followed the build guide from
> > > > https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> > > > solved a lot of problems (no need to list all details), I am able to
> > > > run trafodion over a hadoop sandbox environment.
> > > >
> > > > But I got a serious problem, that is, all Trafodion related process
> > > > will go down after several minutes (not sure how long), only few of
> > > > them will
> > > > left:
> > > > [nieyy@redhat-72 ~]$ ps ux
> > > > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> > > COMMAND
> > > > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_namenode -Xmx1000m
> > > > -Djava.net.prefe
> > > > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_datanode -Xmx1000m
> > > > -Djava.net.prefe
> > > > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > > > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > > > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > > > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> > > /bin/sh
> > > > ./bin/mysqld_safe
> > > >
> > >
> --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > > > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > > > sq
> > > > l/lo
> > > > cal_hadoop/mysql/bin/mysq
> > > > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00
> bash
> > > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > > > sq
> > > > l/lo
> > > > cal_hadoop/hbase/bin
> > > > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -Dproc_master -XX:OnOutOfMemoryError=kill
> > > > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00
> > mpirun
> > > > -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> > > > MPI_ERROR_LEVEL
> > > > 2 -env SQ_PIDMAP 1 -
> > > > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > > > [hydra_pmi_proxy] <defunct>
> > > > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > > > ex
> > > > port
> > > > /bin64d/monitor COLD
> > > > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > > > ex
> > > > port
> > > > /bin64d/monitor COLD
> > > > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > >
> > > > And then I need to kill all processes and use swstartall and sqstart
> > > > to reset the environment, however, the environment will still go
> > > > down after a while, and I need to restart again.
> > > >
> > > > I found some cores under
> > > > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts,
> > > > all cored were generated by mxssmp:
> > > > [nieyy@redhat-72 scripts]$ ll core*
> > > > ...
> > > > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > > > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > > > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > > > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> > > >
> > > > I used gdb to track the stack:
> > > > [nieyy@redhat-72 scripts]$ gdb
> > > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/
> > > > li b/li nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > > > (gdb) where
> > > > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > > > ../runtimestats/SqlStats.h:271
> > > > #1  0x000000000043990a in StatsGlobals::removeProcess
> > > > (this=0x10000000, pid=65536, calledAtAdd=0) at
> > > > ../runtimestats/SqlStats.cpp:276
> > > > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > > > (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> > > > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> > > > ../runtimestats/ssmpipc.cpp:582
> > > > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> > > > ../bin/ex_ssmp_main.cpp:259
> > > > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > > > ../bin/ex_ssmp_main.cpp:127
> > > >
> > > > Then I searched via Google, and found a link
> > > > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks
> > > > similar, but it claimed the bug has been fixed at v0.9, but my
> version
> > > > is 1.1.
> > > >
> > > > So, could you kindly help me to solve this problem cause I can't
> > > > find more useful information via Google.
> > > >
> > > > Thanks a lot.
> > >
> >
> >
> >
> > --
> > Thanks,
> >
> > Amanda Moran
> >
>
>
>
> --
> Thanks,
>
> Amanda Moran
>



-- 
Regards, --Qifan

Re: [Urgent Help] Trafodion Build Environment Problem

Posted by Amanda Moran <am...@esgyn.com>.
What is the "too small" number? Also what is the guidance if the number is
too large (other than setting the kernel.pid_max=65535 on all nodes).

Looks like we need two jira's created one for the installer and one for
Trafodion core.

Thanks.

On Tue, Sep 8, 2015 at 11:26 AM, Gunnar Tapper <gu...@esgyn.com>
wrote:

> Hi,
>
> I am not sure this is a good idea since this might cause issues for the
> overall configuration. For example, Cassandra recommends 999999 for
> kernel.pid_max while Hawq wants at least 798720. IBM Big Insight wants
> another number. Overriding their settings would make Trafodion a bad
> citizen
> in a Hadoop stack.
>
> A better approach might be to check the current value recommending an
> increase if too small and provide guidance if it's too large.
>
> Thanks,
>
> Gunnar
>
> -----Original Message-----
> From: Amanda Moran [mailto:amanda.moran@esgyn.com]
> Sent: Tuesday, September 8, 2015 12:08 PM
> To: dev <de...@trafodion.incubator.apache.org>
> Subject: Re: [Urgent Help] Trafodion Build Environment Problem
>
> Hi there All-
>
> Sorry if my first email was confusing. The "problem" itself is not fixed by
> the installer, the installer just sets sudo sysctl -w kernel.pid_max=65535
> on all nodes.
>
> Thanks.
>
> On Tue, Sep 8, 2015 at 11:02 AM, Selva Govindarajan <
> selva.govindarajan@esgyn.com> wrote:
>
> > Hi Amanda,
> >
> > I presume that the installer will flag this as a requirement for
> > Trafodion to be installed. Will it abort the installation or will the
> > installer fix the pid_max settings automatically.
> >
> > Selva
> >
> > -----Original Message-----
> > From: Amanda Moran [mailto:amanda.moran@esgyn.com]
> > Sent: Tuesday, September 8, 2015 9:20 AM
> > To: dev@trafodion.incubator.apache.org
> > Cc: Lijian (Q) <ji...@huawei.com>
> > Subject: Re: [Urgent Help] Trafodion Build Environment Problem
> >
> > Hi there-
> >
> > This is fixed in latest version of installer.
> >
> > Thanks.
> >
> > Sent from my iPhone
> >
> > > On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > I'm wondering if this should be reported as a problem? Perhaps
> > > Nieyuanyuan would like to open a JIRA about supporting higher PID
> > numbers in Trafodion?
> > >
> > > Dave
> > >
> > > -----Original Message-----
> > > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > > Sent: Monday, September 7, 2015 7:04 PM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Hi Nieyuanyuan,
> > >
> > > Could you please check the 'pid_max' settings:
> > > sysctl -q kernel.pid_max
> > > (or cat /proc/sys/kernel/pid_max)
> > >
> > > If the value is > 64K, I would recommend you set it to 64K, like so:
> > > sudo sysctl -w kernel.pid_max=65535
> > >
> > > You will  have to restart Tradfodion and other Hadoop/HBase processes:
> > > swstopall
> > > ckillall
> > > swstartall
> > > sqstart
> > >
> > > Just fyi, to check the list of Trafodion processes only, please run
> > 'cstat'
> > > on your bash.
> > >
> > > Thanks,
> > > -Narendra
> > >
> > >
> > > -----Original Message-----
> > > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > > Sent: Monday, September 7, 2015 6:40 PM
> > > To: dev@trafodion.incubator.apache.org
> > > Cc: Lijian (Q) <ji...@huawei.com>
> > > Subject: [Urgent Help] Trafodion Build Environment Problem
> > >
> > > Dear Guys,
> > >
> > > I recently downloaded trafodion 1.1 from
> > > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> > > followed the build guide from
> > > https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> > > solved a lot of problems (no need to list all details), I am able to
> > > run trafodion over a hadoop sandbox environment.
> > >
> > > But I got a serious problem, that is, all Trafodion related process
> > > will go down after several minutes (not sure how long), only few of
> > > them will
> > > left:
> > > [nieyy@redhat-72 ~]$ ps ux
> > > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> > COMMAND
> > > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_namenode -Xmx1000m
> > > -Djava.net.prefe
> > > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_datanode -Xmx1000m
> > > -Djava.net.prefe
> > > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> > /bin/sh
> > > ./bin/mysqld_safe
> > >
> > --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > > sq
> > > l/lo
> > > cal_hadoop/mysql/bin/mysq
> > > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > > sq
> > > l/lo
> > > cal_hadoop/hbase/bin
> > > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -Dproc_master -XX:OnOutOfMemoryError=kill
> > > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00
> mpirun
> > > -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> > > MPI_ERROR_LEVEL
> > > 2 -env SQ_PIDMAP 1 -
> > > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > > [hydra_pmi_proxy] <defunct>
> > > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > > ex
> > > port
> > > /bin64d/monitor COLD
> > > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > > ex
> > > port
> > > /bin64d/monitor COLD
> > > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > >
> > > And then I need to kill all processes and use swstartall and sqstart
> > > to reset the environment, however, the environment will still go
> > > down after a while, and I need to restart again.
> > >
> > > I found some cores under
> > > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts,
> > > all cored were generated by mxssmp:
> > > [nieyy@redhat-72 scripts]$ ll core*
> > > ...
> > > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> > >
> > > I used gdb to track the stack:
> > > [nieyy@redhat-72 scripts]$ gdb
> > > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/
> > > li b/li nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > > (gdb) where
> > > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > > ../runtimestats/SqlStats.h:271
> > > #1  0x000000000043990a in StatsGlobals::removeProcess
> > > (this=0x10000000, pid=65536, calledAtAdd=0) at
> > > ../runtimestats/SqlStats.cpp:276
> > > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > > (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> > > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> > > ../runtimestats/ssmpipc.cpp:582
> > > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> > > ../bin/ex_ssmp_main.cpp:259
> > > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > > ../bin/ex_ssmp_main.cpp:127
> > >
> > > Then I searched via Google, and found a link
> > > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks
> > > similar, but it claimed the bug has been fixed at v0.9, but my version
> > > is 1.1.
> > >
> > > So, could you kindly help me to solve this problem cause I can't
> > > find more useful information via Google.
> > >
> > > Thanks a lot.
> >
>
>
>
> --
> Thanks,
>
> Amanda Moran
>



-- 
Thanks,

Amanda Moran

RE: [Urgent Help] Trafodion Build Environment Problem

Posted by Gunnar Tapper <gu...@esgyn.com>.
Hi,

I am not sure this is a good idea since this might cause issues for the
overall configuration. For example, Cassandra recommends 999999 for
kernel.pid_max while Hawq wants at least 798720. IBM Big Insight wants
another number. Overriding their settings would make Trafodion a bad citizen
in a Hadoop stack.

A better approach might be to check the current value recommending an
increase if too small and provide guidance if it's too large.

Thanks,

Gunnar

-----Original Message-----
From: Amanda Moran [mailto:amanda.moran@esgyn.com]
Sent: Tuesday, September 8, 2015 12:08 PM
To: dev <de...@trafodion.incubator.apache.org>
Subject: Re: [Urgent Help] Trafodion Build Environment Problem

Hi there All-

Sorry if my first email was confusing. The "problem" itself is not fixed by
the installer, the installer just sets sudo sysctl -w kernel.pid_max=65535
on all nodes.

Thanks.

On Tue, Sep 8, 2015 at 11:02 AM, Selva Govindarajan <
selva.govindarajan@esgyn.com> wrote:

> Hi Amanda,
>
> I presume that the installer will flag this as a requirement for
> Trafodion to be installed. Will it abort the installation or will the
> installer fix the pid_max settings automatically.
>
> Selva
>
> -----Original Message-----
> From: Amanda Moran [mailto:amanda.moran@esgyn.com]
> Sent: Tuesday, September 8, 2015 9:20 AM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: Re: [Urgent Help] Trafodion Build Environment Problem
>
> Hi there-
>
> This is fixed in latest version of installer.
>
> Thanks.
>
> Sent from my iPhone
>
> > On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com>
> wrote:
> >
> > Hi,
> >
> > I'm wondering if this should be reported as a problem? Perhaps
> > Nieyuanyuan would like to open a JIRA about supporting higher PID
> numbers in Trafodion?
> >
> > Dave
> >
> > -----Original Message-----
> > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > Sent: Monday, September 7, 2015 7:04 PM
> > To: dev@trafodion.incubator.apache.org
> > Cc: Lijian (Q) <ji...@huawei.com>
> > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> >
> > Hi Nieyuanyuan,
> >
> > Could you please check the 'pid_max' settings:
> > sysctl -q kernel.pid_max
> > (or cat /proc/sys/kernel/pid_max)
> >
> > If the value is > 64K, I would recommend you set it to 64K, like so:
> > sudo sysctl -w kernel.pid_max=65535
> >
> > You will  have to restart Tradfodion and other Hadoop/HBase processes:
> > swstopall
> > ckillall
> > swstartall
> > sqstart
> >
> > Just fyi, to check the list of Trafodion processes only, please run
> 'cstat'
> > on your bash.
> >
> > Thanks,
> > -Narendra
> >
> >
> > -----Original Message-----
> > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > Sent: Monday, September 7, 2015 6:40 PM
> > To: dev@trafodion.incubator.apache.org
> > Cc: Lijian (Q) <ji...@huawei.com>
> > Subject: [Urgent Help] Trafodion Build Environment Problem
> >
> > Dear Guys,
> >
> > I recently downloaded trafodion 1.1 from
> > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> > followed the build guide from
> > https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> > solved a lot of problems (no need to list all details), I am able to
> > run trafodion over a hadoop sandbox environment.
> >
> > But I got a serious problem, that is, all Trafodion related process
> > will go down after several minutes (not sure how long), only few of
> > them will
> > left:
> > [nieyy@redhat-72 ~]$ ps ux
> > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> COMMAND
> > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_namenode -Xmx1000m
> > -Djava.net.prefe
> > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_datanode -Xmx1000m
> > -Djava.net.prefe
> > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> /bin/sh
> > ./bin/mysqld_safe
> >
> --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > sq
> > l/lo
> > cal_hadoop/mysql/bin/mysq
> > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > sq
> > l/lo
> > cal_hadoop/hbase/bin
> > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_master -XX:OnOutOfMemoryError=kill
> > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
> > -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> > MPI_ERROR_LEVEL
> > 2 -env SQ_PIDMAP 1 -
> > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > [hydra_pmi_proxy] <defunct>
> > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > ex
> > port
> > /bin64d/monitor COLD
> > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/
> > ex
> > port
> > /bin64d/monitor COLD
> > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> >
> > And then I need to kill all processes and use swstartall and sqstart
> > to reset the environment, however, the environment will still go
> > down after a while, and I need to restart again.
> >
> > I found some cores under
> > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts,
> > all cored were generated by mxssmp:
> > [nieyy@redhat-72 scripts]$ ll core*
> > ...
> > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> >
> > I used gdb to track the stack:
> > [nieyy@redhat-72 scripts]$ gdb
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/
> > li b/li nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > (gdb) where
> > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > ../runtimestats/SqlStats.h:271
> > #1  0x000000000043990a in StatsGlobals::removeProcess
> > (this=0x10000000, pid=65536, calledAtAdd=0) at
> > ../runtimestats/SqlStats.cpp:276
> > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> > ../runtimestats/ssmpipc.cpp:582
> > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> > ../bin/ex_ssmp_main.cpp:259
> > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > ../bin/ex_ssmp_main.cpp:127
> >
> > Then I searched via Google, and found a link
> > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks
> > similar, but it claimed the bug has been fixed at v0.9, but my version
> > is 1.1.
> >
> > So, could you kindly help me to solve this problem cause I can't
> > find more useful information via Google.
> >
> > Thanks a lot.
>



--
Thanks,

Amanda Moran

Re: [Urgent Help] Trafodion Build Environment Problem

Posted by Amanda Moran <am...@esgyn.com>.
Hi there All-

Sorry if my first email was confusing. The "problem" itself is not fixed by
the installer, the installer just sets sudo sysctl -w kernel.pid_max=65535
on all nodes.

Thanks.

On Tue, Sep 8, 2015 at 11:02 AM, Selva Govindarajan <
selva.govindarajan@esgyn.com> wrote:

> Hi Amanda,
>
> I presume that the installer will flag this as a requirement for Trafodion
> to be installed. Will it abort the installation or will the installer fix
> the pid_max settings automatically.
>
> Selva
>
> -----Original Message-----
> From: Amanda Moran [mailto:amanda.moran@esgyn.com]
> Sent: Tuesday, September 8, 2015 9:20 AM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: Re: [Urgent Help] Trafodion Build Environment Problem
>
> Hi there-
>
> This is fixed in latest version of installer.
>
> Thanks.
>
> Sent from my iPhone
>
> > On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com>
> wrote:
> >
> > Hi,
> >
> > I'm wondering if this should be reported as a problem? Perhaps
> > Nieyuanyuan would like to open a JIRA about supporting higher PID
> numbers in Trafodion?
> >
> > Dave
> >
> > -----Original Message-----
> > From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> > Sent: Monday, September 7, 2015 7:04 PM
> > To: dev@trafodion.incubator.apache.org
> > Cc: Lijian (Q) <ji...@huawei.com>
> > Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> >
> > Hi Nieyuanyuan,
> >
> > Could you please check the 'pid_max' settings:
> > sysctl -q kernel.pid_max
> > (or cat /proc/sys/kernel/pid_max)
> >
> > If the value is > 64K, I would recommend you set it to 64K, like so:
> > sudo sysctl -w kernel.pid_max=65535
> >
> > You will  have to restart Tradfodion and other Hadoop/HBase processes:
> > swstopall
> > ckillall
> > swstartall
> > sqstart
> >
> > Just fyi, to check the list of Trafodion processes only, please run
> 'cstat'
> > on your bash.
> >
> > Thanks,
> > -Narendra
> >
> >
> > -----Original Message-----
> > From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> > Sent: Monday, September 7, 2015 6:40 PM
> > To: dev@trafodion.incubator.apache.org
> > Cc: Lijian (Q) <ji...@huawei.com>
> > Subject: [Urgent Help] Trafodion Build Environment Problem
> >
> > Dear Guys,
> >
> > I recently downloaded trafodion 1.1 from
> > https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> > followed the build guide from
> > https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> > solved a lot of problems (no need to list all details), I am able to
> > run trafodion over a hadoop sandbox environment.
> >
> > But I got a serious problem, that is, all Trafodion related process
> > will go down after several minutes (not sure how long), only few of
> > them will
> > left:
> > [nieyy@redhat-72 ~]$ ps ux
> > USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> COMMAND
> > nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_namenode -Xmx1000m
> > -Djava.net.prefe
> > nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_datanode -Xmx1000m
> > -Djava.net.prefe
> > nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_secondarynamenode -Xmx1000m -Djava.
> > nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> > nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> > nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
> /bin/sh
> > ./bin/mysqld_safe
> >
> --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> > nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> > l/lo
> > cal_hadoop/mysql/bin/mysq
> > nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> > l/lo
> > cal_hadoop/hbase/bin
> > nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master
> > -XX:OnOutOfMemoryError=kill
> > nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
> > -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> > MPI_ERROR_LEVEL
> > 2 -env SQ_PIDMAP 1 -
> > nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> > [hydra_pmi_proxy] <defunct>
> > nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> > port
> > /bin64d/monitor COLD
> > nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> > port
> > /bin64d/monitor COLD
> > nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> > nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> > /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> > -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> >
> > And then I need to kill all processes and use swstartall and sqstart
> > to reset the environment, however, the environment will still go down
> > after a while, and I need to restart again.
> >
> > I found some cores under
> > trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts,
> > all cored were generated by mxssmp:
> > [nieyy@redhat-72 scripts]$ ll core*
> > ...
> > -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> > -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> > -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> > -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> >
> > I used gdb to track the stack:
> > [nieyy@redhat-72 scripts]$ gdb
> > /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/li
> > b/li nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> > (gdb) where
> > #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> > ../runtimestats/SqlStats.h:271
> > #1  0x000000000043990a in StatsGlobals::removeProcess
> > (this=0x10000000, pid=65536, calledAtAdd=0) at
> > ../runtimestats/SqlStats.cpp:276
> > #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> > (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> > #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> > ../runtimestats/ssmpipc.cpp:582
> > #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> > ../bin/ex_ssmp_main.cpp:259
> > #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> > ../bin/ex_ssmp_main.cpp:127
> >
> > Then I searched via Google, and found a link
> > https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar,
> > but it claimed the bug has been fixed at v0.9, but my version is 1.1.
> >
> > So, could you kindly help me to solve this problem cause I can't find
> > more useful information via Google.
> >
> > Thanks a lot.
>



-- 
Thanks,

Amanda Moran

RE: [Urgent Help] Trafodion Build Environment Problem

Posted by Selva Govindarajan <se...@esgyn.com>.
Hi Amanda,

I presume that the installer will flag this as a requirement for Trafodion
to be installed. Will it abort the installation or will the installer fix
the pid_max settings automatically.

Selva

-----Original Message-----
From: Amanda Moran [mailto:amanda.moran@esgyn.com]
Sent: Tuesday, September 8, 2015 9:20 AM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: Re: [Urgent Help] Trafodion Build Environment Problem

Hi there-

This is fixed in latest version of installer.

Thanks.

Sent from my iPhone

> On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com>
wrote:
>
> Hi,
>
> I'm wondering if this should be reported as a problem? Perhaps
> Nieyuanyuan would like to open a JIRA about supporting higher PID
numbers in Trafodion?
>
> Dave
>
> -----Original Message-----
> From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> Sent: Monday, September 7, 2015 7:04 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: RE: [Urgent Help] Trafodion Build Environment Problem
>
> Hi Nieyuanyuan,
>
> Could you please check the 'pid_max' settings:
> sysctl -q kernel.pid_max
> (or cat /proc/sys/kernel/pid_max)
>
> If the value is > 64K, I would recommend you set it to 64K, like so:
> sudo sysctl -w kernel.pid_max=65535
>
> You will  have to restart Tradfodion and other Hadoop/HBase processes:
> swstopall
> ckillall
> swstartall
> sqstart
>
> Just fyi, to check the list of Trafodion processes only, please run
'cstat'
> on your bash.
>
> Thanks,
> -Narendra
>
>
> -----Original Message-----
> From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> Sent: Monday, September 7, 2015 6:40 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: [Urgent Help] Trafodion Build Environment Problem
>
> Dear Guys,
>
> I recently downloaded trafodion 1.1 from
> https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> followed the build guide from
> https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> solved a lot of problems (no need to list all details), I am able to
> run trafodion over a hadoop sandbox environment.
>
> But I got a serious problem, that is, all Trafodion related process
> will go down after several minutes (not sure how long), only few of
> them will
> left:
> [nieyy@redhat-72 ~]$ ps ux
> USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
COMMAND
> nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_namenode -Xmx1000m
> -Djava.net.prefe
> nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_datanode -Xmx1000m
> -Djava.net.prefe
> nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_secondarynamenode -Xmx1000m -Djava.
> nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
/bin/sh
> ./bin/mysqld_safe
>
--defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> cal_hadoop/mysql/bin/mysq
> nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> cal_hadoop/hbase/bin
> nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master
> -XX:OnOutOfMemoryError=kill
> nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
> -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> MPI_ERROR_LEVEL
> 2 -env SQ_PIDMAP 1 -
> nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> [hydra_pmi_proxy] <defunct>
> nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> /bin64d/monitor COLD
> nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> /bin64d/monitor COLD
> nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
>
> And then I need to kill all processes and use swstartall and sqstart
> to reset the environment, however, the environment will still go down
> after a while, and I need to restart again.
>
> I found some cores under
> trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts,
> all cored were generated by mxssmp:
> [nieyy@redhat-72 scripts]$ ll core*
> ...
> -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
>
> I used gdb to track the stack:
> [nieyy@redhat-72 scripts]$ gdb
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/li
> b/li nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> (gdb) where
> #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> ../runtimestats/SqlStats.h:271
> #1  0x000000000043990a in StatsGlobals::removeProcess
> (this=0x10000000, pid=65536, calledAtAdd=0) at
> ../runtimestats/SqlStats.cpp:276
> #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> ../runtimestats/ssmpipc.cpp:582
> #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> ../bin/ex_ssmp_main.cpp:259
> #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> ../bin/ex_ssmp_main.cpp:127
>
> Then I searched via Google, and found a link
> https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar,
> but it claimed the bug has been fixed at v0.9, but my version is 1.1.
>
> So, could you kindly help me to solve this problem cause I can't find
> more useful information via Google.
>
> Thanks a lot.

Re: [Urgent Help] Trafodion Build Environment Problem

Posted by Amanda Moran <am...@esgyn.com>.
Hi there- 

This is fixed in latest version of installer. 

Thanks. 

Sent from my iPhone

> On Sep 8, 2015, at 9:07 AM, Dave Birdsall <da...@esgyn.com> wrote:
> 
> Hi,
> 
> I'm wondering if this should be reported as a problem? Perhaps Nieyuanyuan
> would like to open a JIRA about supporting higher PID numbers in Trafodion?
> 
> Dave
> 
> -----Original Message-----
> From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> Sent: Monday, September 7, 2015 7:04 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: RE: [Urgent Help] Trafodion Build Environment Problem
> 
> Hi Nieyuanyuan,
> 
> Could you please check the 'pid_max' settings:
> sysctl -q kernel.pid_max
> (or cat /proc/sys/kernel/pid_max)
> 
> If the value is > 64K, I would recommend you set it to 64K, like so:
> sudo sysctl -w kernel.pid_max=65535
> 
> You will  have to restart Tradfodion and other Hadoop/HBase processes:
> swstopall
> ckillall
> swstartall
> sqstart
> 
> Just fyi, to check the list of Trafodion processes only, please run 'cstat'
> on your bash.
> 
> Thanks,
> -Narendra
> 
> 
> -----Original Message-----
> From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> Sent: Monday, September 7, 2015 6:40 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <ji...@huawei.com>
> Subject: [Urgent Help] Trafodion Build Environment Problem
> 
> Dear Guys,
> 
> I recently downloaded trafodion 1.1 from
> https://github.com/apache/incubator-trafodion/tree/stable/1.1, and followed
> the build guide from
> https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and solved
> a lot of problems (no need to list all details), I am able to run trafodion
> over a hadoop sandbox environment.
> 
> But I got a serious problem, that is, all Trafodion related process will go
> down after several minutes (not sure how long), only few of them will
> left:
> [nieyy@redhat-72 ~]$ ps ux
> USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
> nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_namenode -Xmx1000m
> -Djava.net.prefe
> nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_datanode -Xmx1000m
> -Djava.net.prefe
> nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_secondarynamenode -Xmx1000m -Djava.
> nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00 /bin/sh
> ./bin/mysqld_safe
> --defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/lo
> cal_hadoop/mysql/bin/mysq
> nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/lo
> cal_hadoop/hbase/bin
> nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master
> -XX:OnOutOfMemoryError=kill
> nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
> -disable-auto-cleanup -demux select -env SQ_IC TCP -env MPI_ERROR_LEVEL
> 2 -env SQ_PIDMAP 1 -
> nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> [hydra_pmi_proxy] <defunct>
> nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export
> /bin64d/monitor COLD
> nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export
> /bin64d/monitor COLD
> nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> 
> And then I need to kill all processes and use swstartall and sqstart to
> reset the environment, however, the environment will still go down after a
> while, and I need to restart again.
> 
> I found some cores under
> trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts, all
> cored were generated by mxssmp:
> [nieyy@redhat-72 scripts]$ ll core*
> ...
> -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
> 
> I used gdb to track the stack:
> [nieyy@redhat-72 scripts]$ gdb
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/lib/li
> nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> (gdb) where
> #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> ../runtimestats/SqlStats.h:271
> #1  0x000000000043990a in StatsGlobals::removeProcess (this=0x10000000,
> pid=65536, calledAtAdd=0) at ../runtimestats/SqlStats.cpp:276
> #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> ../runtimestats/ssmpipc.cpp:582
> #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> ../bin/ex_ssmp_main.cpp:259
> #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> ../bin/ex_ssmp_main.cpp:127
> 
> Then I searched via Google, and found a link
> https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar, but
> it claimed the bug has been fixed at v0.9, but my version is 1.1.
> 
> So, could you kindly help me to solve this problem cause I can't find more
> useful information via Google.
> 
> Thanks a lot.

RE: [Urgent Help] Trafodion Build Environment Problem

Posted by Dave Birdsall <da...@esgyn.com>.
Hi,

I'm wondering if this should be reported as a problem? Perhaps Nieyuanyuan
would like to open a JIRA about supporting higher PID numbers in Trafodion?

Dave

-----Original Message-----
From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
Sent: Monday, September 7, 2015 7:04 PM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: RE: [Urgent Help] Trafodion Build Environment Problem

Hi Nieyuanyuan,

Could you please check the 'pid_max' settings:
sysctl -q kernel.pid_max
(or cat /proc/sys/kernel/pid_max)

If the value is > 64K, I would recommend you set it to 64K, like so:
sudo sysctl -w kernel.pid_max=65535

You will  have to restart Tradfodion and other Hadoop/HBase processes:
swstopall
ckillall
swstartall
sqstart

Just fyi, to check the list of Trafodion processes only, please run 'cstat'
on your bash.

Thanks,
-Narendra


-----Original Message-----
From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
Sent: Monday, September 7, 2015 6:40 PM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: [Urgent Help] Trafodion Build Environment Problem

Dear Guys,

I recently downloaded trafodion 1.1 from
https://github.com/apache/incubator-trafodion/tree/stable/1.1, and followed
the build guide from
https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and solved
a lot of problems (no need to list all details), I am able to run trafodion
over a hadoop sandbox environment.

But I got a serious problem, that is, all Trafodion related process will go
down after several minutes (not sure how long), only few of them will
left:
[nieyy@redhat-72 ~]$ ps ux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-Dproc_namenode -Xmx1000m
 -Djava.net.prefe
nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-Dproc_datanode -Xmx1000m
 -Djava.net.prefe
nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-Dproc_secondarynamenode -Xmx1000m -Djava.
nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-Dproc_resourcemanager -Xmx1000m -Dhadoop.
nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-Dproc_nodemanager -Xmx1000m -Dhadoop.log.
nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00 /bin/sh
./bin/mysqld_safe
--defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/lo
cal_hadoop/mysql/bin/mysq
nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/lo
cal_hadoop/hbase/bin
nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master
-XX:OnOutOfMemoryError=kill
nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
-disable-auto-cleanup -demux select -env SQ_IC TCP -env MPI_ERROR_LEVEL
2 -env SQ_PIDMAP 1 -
nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
[hydra_pmi_proxy] <defunct>
nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export
/bin64d/monitor COLD
nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export
/bin64d/monitor COLD
nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-XX:OnOutOfMemoryError=kill -9 %p -Xmx128m

And then I need to kill all processes and use swstartall and sqstart to
reset the environment, however, the environment will still go down after a
while, and I need to restart again.

I found some cores under
trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts, all
cored were generated by mxssmp:
[nieyy@redhat-72 scripts]$ ll core*
...
-rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
-rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
-rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
-rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197

I used gdb to track the stack:
[nieyy@redhat-72 scripts]$ gdb
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/lib/li
nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
(gdb) where
#0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
../runtimestats/SqlStats.h:271
#1  0x000000000043990a in StatsGlobals::removeProcess (this=0x10000000,
pid=65536, calledAtAdd=0) at ../runtimestats/SqlStats.cpp:276
#2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
(this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
#3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
../runtimestats/ssmpipc.cpp:582
#4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
../bin/ex_ssmp_main.cpp:259
#5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
../bin/ex_ssmp_main.cpp:127

Then I searched via Google, and found a link
https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar, but
it claimed the bug has been fixed at v0.9, but my version is 1.1.

So, could you kindly help me to solve this problem cause I can't find more
useful information via Google.

Thanks a lot.

答复: [Urgent Help] Trafodion Build Environment Problem

Posted by Nieyuanyuan <ni...@huawei.com>.
Hi, Narendra,

Looks like this problem was solved after I applied ur recommended settings, but I am not sure why in core/sql/runtimestats/SqlStats.cpp:276:

    prevHeap = statsArray_[pid].processStats_->getHeap();

(gdb) p statsArray_ 
$1 = (GlobalStatsArray *) 0x10003290
(gdb) p pid
$2 = 65536

I am not sure why the stats array use pid as its index, :), anyway, my problem was solved, I am able to go ahead now.

Thanks a lot.

-----邮件原件-----
发件人: Narendra Goyal [mailto:narendra.goyal@esgyn.com] 
发送时间: 2015年9月8日 10:04
收件人: dev@trafodion.incubator.apache.org
抄送: Lijian (Q)
主题: RE: [Urgent Help] Trafodion Build Environment Problem

Hi Nieyuanyuan,

Could you please check the 'pid_max' settings:
sysctl -q kernel.pid_max
(or cat /proc/sys/kernel/pid_max)

If the value is > 64K, I would recommend you set it to 64K, like so:
sudo sysctl -w kernel.pid_max=65535

You will  have to restart Tradfodion and other Hadoop/HBase processes:
swstopall
ckillall
swstartall
sqstart

Just fyi, to check the list of Trafodion processes only, please run 'cstat' on your bash.

Thanks,
-Narendra


-----Original Message-----
From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
Sent: Monday, September 7, 2015 6:40 PM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: [Urgent Help] Trafodion Build Environment Problem

Dear Guys,

I recently downloaded trafodion 1.1 from https://github.com/apache/incubator-trafodion/tree/stable/1.1, and followed the build guide from https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and solved a lot of problems (no need to list all details), I am able to run trafodion over a hadoop sandbox environment.

But I got a serious problem, that is, all Trafodion related process will go down after several minutes (not sure how long), only few of them will
left:
[nieyy@redhat-72 ~]$ ps ux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_namenode -Xmx1000m -Djava.net.prefe
nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_datanode -Xmx1000m -Djava.net.prefe
nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-Dproc_secondarynamenode -Xmx1000m -Djava.
nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-Dproc_resourcemanager -Xmx1000m -Dhadoop.
nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-Dproc_nodemanager -Xmx1000m -Dhadoop.log.
nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00 /bin/sh
./bin/mysqld_safe
--defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/lo
cal_hadoop/mysql/bin/mysq
nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/lo
cal_hadoop/hbase/bin
nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill
nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
-disable-auto-cleanup -demux select -env SQ_IC TCP -env MPI_ERROR_LEVEL 2 -env SQ_PIDMAP 1 -
nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
[hydra_pmi_proxy] <defunct>
nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export
/bin64d/monitor COLD
nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export
/bin64d/monitor COLD
nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-XX:OnOutOfMemoryError=kill -9 %p -Xmx128m

And then I need to kill all processes and use swstartall and sqstart to reset the environment, however, the environment will still go down after a while, and I need to restart again.

I found some cores under
trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts, all cored were generated by mxssmp:
[nieyy@redhat-72 scripts]$ ll core*
...
-rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
-rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
-rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
-rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197

I used gdb to track the stack:
[nieyy@redhat-72 scripts]$ gdb
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/lib/li
nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
(gdb) where
#0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
../runtimestats/SqlStats.h:271
#1  0x000000000043990a in StatsGlobals::removeProcess (this=0x10000000, pid=65536, calledAtAdd=0) at ../runtimestats/SqlStats.cpp:276
#2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
(this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
#3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
../runtimestats/ssmpipc.cpp:582
#4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
../bin/ex_ssmp_main.cpp:259
#5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
../bin/ex_ssmp_main.cpp:127

Then I searched via Google, and found a link
https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar, but it claimed the bug has been fixed at v0.9, but my version is 1.1.

So, could you kindly help me to solve this problem cause I can't find more useful information via Google.

Thanks a lot.

RE: [Urgent Help] Trafodion Build Environment Problem

Posted by Narendra Goyal <na...@esgyn.com>.
Hi Nieyuanyuan,

Could you please check the 'pid_max' settings:
sysctl -q kernel.pid_max
(or cat /proc/sys/kernel/pid_max)

If the value is > 64K, I would recommend you set it to 64K, like so:
sudo sysctl -w kernel.pid_max=65535

You will  have to restart Tradfodion and other Hadoop/HBase processes:
swstopall
ckillall
swstartall
sqstart

Just fyi, to check the list of Trafodion processes only, please run
'cstat' on your bash.

Thanks,
-Narendra


-----Original Message-----
From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
Sent: Monday, September 7, 2015 6:40 PM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <ji...@huawei.com>
Subject: [Urgent Help] Trafodion Build Environment Problem

Dear Guys,

I recently downloaded trafodion 1.1 from
https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
followed the build guide from
https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
solved a lot of problems (no need to list all details), I am able to run
trafodion over a hadoop sandbox environment.

But I got a serious problem, that is, all Trafodion related process will
go down after several minutes (not sure how long), only few of them will
left:
[nieyy@redhat-72 ~]$ ps ux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_namenode
-Xmx1000m -Djava.net.prefe
nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_datanode
-Xmx1000m -Djava.net.prefe
nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-Dproc_secondarynamenode -Xmx1000m -Djava.
nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-Dproc_resourcemanager -Xmx1000m -Dhadoop.
nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-Dproc_nodemanager -Xmx1000m -Dhadoop.log.
nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00 /bin/sh
./bin/mysqld_safe
--defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/lo
cal_hadoop/mysql/bin/mysq
nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/lo
cal_hadoop/hbase/bin
nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master
-XX:OnOutOfMemoryError=kill
nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
-disable-auto-cleanup -demux select -env SQ_IC TCP -env MPI_ERROR_LEVEL 2
-env SQ_PIDMAP 1 -
nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
[hydra_pmi_proxy] <defunct>
nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export
/bin64d/monitor COLD
nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/export
/bin64d/monitor COLD
nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
-XX:OnOutOfMemoryError=kill -9 %p -Xmx128m

And then I need to kill all processes and use swstartall and sqstart to
reset the environment, however, the environment will still go down after a
while, and I need to restart again.

I found some cores under
trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts, all
cored were generated by mxssmp:
[nieyy@redhat-72 scripts]$ ll core*
...
-rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
-rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
-rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
-rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197

I used gdb to track the stack:
[nieyy@redhat-72 scripts]$ gdb
/home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/lib/li
nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
(gdb) where
#0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
../runtimestats/SqlStats.h:271
#1  0x000000000043990a in StatsGlobals::removeProcess (this=0x10000000,
pid=65536, calledAtAdd=0) at ../runtimestats/SqlStats.cpp:276
#2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
(this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
#3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
../runtimestats/ssmpipc.cpp:582
#4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
../bin/ex_ssmp_main.cpp:259
#5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
../bin/ex_ssmp_main.cpp:127

Then I searched via Google, and found a link
https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar, but
it claimed the bug has been fixed at v0.9, but my version is 1.1.

So, could you kindly help me to solve this problem cause I can't find more
useful information via Google.

Thanks a lot.