You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Henning Blohm <he...@zfabrik.de> on 2011/11/17 17:15:27 UTC

Re: "Child" processes not getting killed

We had/have the same issue. Posted about it on the hadoop lists. So did
many others.

Yes, you can try to investigate your job code always some harder and may
still find some unfriendly library not shutting down all non-deamon
threads. But that is a non-sensical approach. M/R knows when things are
done and can give the processes a short while to complete. But finally,
after some small configurable timeout it should really go ahead and kill
them the hard way. In particular mapper processes are required to not
produce side effects other than in hadoop (that even takes the liberty to
start some on speculation) and there cannot be any reason to have them do
important stuff after the map has completed.

Henning


2010/12/9 Suraj Varma <sv...@gmail.com>

> Take a thread dump on those child processes before killing them. Use jstack
> for instance and take a thread dump, wait for 30 secs and take another one.
> That should tell you what they are waiting on.
> --Suraj
>
> On Thu, Dec 9, 2010 at 1:53 AM, Hari Sreekumar <hsreekumar@clickable.com
> >wrote:
>
> > Tried that, nope, didn't help much. I was opening a table and scanning in
> > the reducer. Now I am calling scanner.close() in each reducer and I have
> > put
> > HTable.close() in the cleanup() function too. Still seeing those children
> > even after the job is killed :(
> >
> > I am using TableMapReduceUtil.initTableMapperJob(). Do I have to call
> close
> > for that separately in the mapper too?
> >
> > hari
> >
> > On Wed, Dec 8, 2010 at 11:41 PM, Hari Sreekumar <
> hsreekumar@clickable.com
> > >wrote:
> >
> > > That's exactly what I was thinking right now, and there's a high
> > > probability of that, I'll check tomorrow and update here. Also we have
> > > the jvm reuse parameter set to -1. Not sure if that would matter
> > > though
> > >
> > > Thanks,
> > > hari
> > >
> > > On Wednesday, December 8, 2010, Veeramachaneni, Ravi
> > > <ra...@navteq.com> wrote:
> > > > If not already, check your code, it is possible that you might have
> > > missed on close call on Scanner that may lead to the connection still
> > > hanging and so is the process(es).
> > > >
> > > > Ravi
> > > > ________________________________________
> > > > From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [
> > > stack@duboce.net]
> > > > Sent: Wednesday, December 08, 2010 11:08 AM
> > > > To: user@hbase.apache.org
> > > > Subject: Re: "Child" processes not getting killed
> > > >
> > > > Add some logging to your Map task and retry?
> > > > St.Ack
> > > >
> > > > On Tue, Dec 7, 2010 at 10:28 PM, Hari Sreekumar
> > > > <hs...@clickable.com> wrote:
> > > >> Hi Stack,
> > > >>
> > > >>          The logs don't show anything nasty. e.g, I ran a job which
> > > spawned
> > > >> 5 mappers. All of the Child processes spawned by them remained even
> > > after
> > > >> the job completed. 3 map tasks got completed, and they have the
> > > following
> > > >> log:
> > > >>
> > > >> *stdout logs*
> > > >>
> > > >> ------------------------------
> > > >>
> > > >>
> > > >> *stderr logs*
> > > >>
> > > >> ------------------------------
> > > >>
> > > >>
> > > >> *syslog logs*
> > > >>
> > > >> 2010-12-08 11:43:28,358 INFO
> org.apache.hadoop.metrics.jvm.JvmMetrics:
> > > >> Initializing JVM Metrics with processName=MAP, sessionId=
> > > >> 2010-12-08 11:43:28,687 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >> environment:zookeeper.version=3.2.2-888565, built on 12/08/2009
> 21:51
> > > >> GMT
> > > >> 2010-12-08 11:43:28,687 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >> environment:host.name=hadoop1
> > > >> 2010-12-08 11:43:28,687 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >> environment:java.version=1.6.0_22
> > > >> 2010-12-08 11:43:28,687 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >> environment:java.vendor=Sun Microsystems Inc.
> > > >> 2010-12-08 11:43:28,687 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >> environment:java.home=/usr/java/jdk1.6.0_22/jre
> > > >> 2010-12-08 11:43:28,687 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>
> > >
> >
> environment:java.class.path=/opt/hadoop/bin/../conf:/usr/java/jdk1.6.0_22/lib/tools.jar:/opt/hadoop/bin/..:/opt/hadoop/bin/../hadoop-0.20.2-core.jar:/opt/hadoop/bin/../lib/commons-cli-1.2.jar:/opt/hadoop/bin/../lib/commons-codec-1.3.jar:/opt/hadoop/bin/../lib/commons-el-1.0.jar:/opt/hadoop/bin/../lib/commons-httpclient-3.0.1.jar:/opt/hadoop/bin/../lib/commons-logging-1.0.4.jar:/opt/hadoop/bin/../lib/commons-logging-api-1.0.4.jar:/opt/hadoop/bin/../lib/commons-net-1.4.1.jar:/opt/hadoop/bin/../lib/core-3.1.1.jar:/opt/hadoop/bin/../lib/hsqldb-1.8.0.10.jar:/opt/hadoop/bin/../lib/jasper-compiler-5.5.12.jar:/opt/hadoop/bin/../lib/jasper-runtime-5.5.12.jar:/opt/hadoop/bin/../lib/jets3t-0.6.1.jar:/opt/hadoop/bin/../lib/jetty-6.1.14.jar:/opt/hadoop/bin/../lib/jetty-util-6.1.14.jar:/opt/hadoop/bin/../lib/junit-3.8.1.jar:/opt/hadoop/bin/../lib/kfs-0.2.2.jar:/opt/hadoop/bin/../lib/log4j-1.2.15.jar:/opt/hadoop/bin/../lib/mockito-all-1.8.0.jar:/opt/hadoop/bin/../lib/oro-2.0.8.jar:/opt/hadoop/bin/../lib/servlet-api-2.5-6.1.14.jar:/opt/hadoop/bin/../lib/slf4j-api-1.4.3.jar:/opt/hadoop/bin/../lib/slf4j-log4j12-1.4.3.jar:/opt/hadoop/bin/../lib/xmlenc-0.52.jar:/opt/hadoop/bin/../lib/jsp-2.1/jsp-2.1.jar:/opt/hadoop/bin/../lib/jsp-2.1/jsp-api-2.1.jar:/hbase-0.20.6.jar:/hbase-0.20.6-test.jar:/conf:/lib/zookeeper-3.2.2.jar::/home/hadoop/DFS/MultiNode/mapred/local/taskTracker/jobcache/job_201012071556_0020/jars/classes:/home/hadoop/DFS/MultiNode/mapred/local/taskTracker/jobcache/job_201012071556_0020/jars:/home/hadoop/DFS/MultiNode/mapred/local/taskTracker/jobcache/job_201012071556_0020/attempt_201012071556_0020_m_000000_0/work
> > > >> 2010-12-08 11:43:28,688 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>
> > >
> >
> environment:java.library.path=/opt/hadoop/bin/../lib/native/Linux-amd64-64:/home/hadoop/DFS/MultiNode/mapred/local/taskTracker/jobcache/job_201012071556_0020/attempt_201012071556_0020_m_000000_0/work
> > > >> 2010-12-08 11:43:28,688 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>
> > >
> >
> environment:java.io.tmpdir=/home/hadoop/DFS/MultiNode/mapred/local/taskTracker/jobcache/job_201012071556_0020/attempt_201012071556_0020_m_000000_0/work/tmp
> > > >> 2010-12-08 11:43:28,688 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >> environment:java.compiler=<NA>
> > > >> 2010-12-08 11:43:28,688 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >> environment:os.name=Linux
> > > >> 2010-12-08 11:43:28,688 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >> environment:os.arch=amd64
> > > >> 2010-12-08 11:43:28,688 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >> environment:os.version=2.6.18-The information contained in this
> > > communication may be CONFIDENTIAL and is intended only for the use of
> the
> > > recipient(s) named above.  If you are not the intended recipient, you
> are
> > > hereby notified that any dissemination, distribution, or copying of
> this
> > > communication, or any of its contents, is strictly prohibited.  If you
> > have
> > > received this communication in error, please notify the sender and
> > > delete/destroy the original message and any copy of it from your
> computer
> > or
> > > paper files.
> > > >
> > >
> >
>