You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Henning Blohm <he...@zfabrik.de> on 2011/11/17 17:17:53 UTC

Re: hanging map reduce processes

 Hi Sudhan,

well that was just an example. In reality the situation is slightly more
complex, as its not a single thread but some serious amount of third party
libs that are not fully under my control. While there is a chance that I
can shut these things down in most cases, experience shows that some were
not meant to be shut down. Also, as the process is being reused several
times between map and combine the overhead of re-initializing frameworks is
not insignificant.

Thanks,
  Henning


On 10/28/2011 06:37 AM, Sudharsan Sampath wrote:

Hi Henning,

I feel it's the non-daemon thread that's causing the issue. A JVM will not
exit until all its non-daemon threads have finished. Is there a reason why
you want this thread to be non-daemon? If unavoidable, then can you exit
this thread when the reducer's job is completed?

Thanks
Sudhan S

On Thu, Oct 27, 2011 at 9:14 PM, Henning Blohm
<he...@zfabrik.de> <he...@zfabrik.de>wrote:


  Hi Harsh,

here's the simplest example I could come up with: Add

    protected void setup(Context context) throws IOException
,InterruptedException {
        // start some non-deamon thread
        Thread t = new Thread(new Runnable() {
            public void run() {
                while (true) {
                    try {
                        Thread.sleep(1000);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }
        });
        t.setDaemon(false);
        t.start();
        System.err.println("Started thread in reduce setup");
    };

to the Reduce inner class in the wordcount sample (source code attached).

Assuming its in wordcount.jar and files have been uploaded for counting (no
matter what content of course), running

hadoop jar wordcount.jar org.myorg.WordCount wordcount/input
wordcount/result

gives me, reproducibly, a hanging "Child" process. Interestingly, that does
not happen when starting a thread like above but in Map.setup.

One more note: In our case, some non-trivial infrastructure is started and
used in map, combine, and reduce. I believe it could be shutdown and started
again between map and reduce when run in the same JVM. That is however
expensive and brings no benefit otherwise. If there would be a way to know
that now the JVM will really not be used anymore, that would be a good time
to really cleanup. Unfortunately shutdown hooks don't work here as they will
not be run before non-daemon threads have stopped.

Thanks,
  Henning


On 10/27/2011 01:18 PM, Henning Blohm wrote:

Hi Harsh,

that would be 0.20.3. Will try to prepare a stripped down sample later today or
tomorrow.

Thanks,
Henning

On 10/27/2011 12:55 PM, Harsh J wrote:

 Hey Henning,

What version of Hadoop are you running, and can we have a dumbed down
sample to reproduce?

On Thu, Oct 27, 2011 at 3:28 PM, Henning
Blohm<he...@zfabrik.de> <he...@zfabrik.de>
<he...@zfabrik.de> <he...@zfabrik.de>  wrote:

 Hi,

found that several people have run into this issue, but I was not able to
find a solution yet.

We have reduce tasks that leave a hanging "child" process. The
implementation uses a lot of third party stuff and leave Timer threads
running (as you can readily see in thread dumps). Which is bad style - no
doubt. But eventually we don't really care - when the reduce is done, its
done and the process should be really just killed rather than hanging around
and eventually impacting the cluster.

Is there a way to force killing of child processes, e.g. based on job
configuration?

Thanks,
  Henning




--

*Henning Blohm*

*ZFabrik Software KG*

T: 	+49/62278399955
F: 	+49/62278399956
M: 	+49/1781891820

Bunsenstrasse 1
69190 Walldorf
henning.blohm@zfabrik.de <ma...@zfabrik.de>
<he...@zfabrik.de> <he...@zfabrik.de>
<he...@zfabrik.de>
Linkedin <http://de.linkedin.com/pub/henning-blohm/0/7b5/628>
<http://de.linkedin.com/pub/henning-blohm/0/7b5/628>
<http://de.linkedin.com/pub/henning-blohm/0/7b5/628>
<http://de.linkedin.com/pub/henning-blohm/0/7b5/628>www.zfabrik.de
<http://www.zfabrik.de> <http://www.zfabrik.de>
<http://www.zfabrik.de> <http://www.zfabrik.de>www.z2-environment.eu
<http://www.z2-environment.eu> <http://www.z2-environment.eu>
<http://www.z2-environment.eu> <http://www.z2-environment.eu>





-- 

*Henning Blohm*

*ZFabrik Software KG*

  T: +49/62278399955  F: +49/62278399956  M: +49/1781891820

 Bunsenstrasse 1
69190 Walldorf

henning.blohm@zfabrik.de
Linkedin <http://de.linkedin.com/pub/henning-blohm/0/7b5/628>
www.zfabrik.de
www.z2-environment.eu