You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Roman K (JIRA)" <ji...@apache.org> on 2012/05/03 08:55:51 UTC

[jira] [Commented] (GIRAPH-169) How to close all child when a job finished?

    [ https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267257#comment-13267257 ] 

Roman K commented on GIRAPH-169:
--------------------------------

I successfully reproduced the problem even on the simpler case with 1 worker only on pseudo distributed environment:
hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 10 -v -V 1000 -w 1

I took the full thread dump of the "hung" child process using jstack (this is the meaningful part without GC threads)
but didn't succeed to figure out the problem yet :

--------------------------------------------------------------------------------
"pool-1-thread-1" prio=10 tid=0x00007f0398539000 nid=0x2218 waiting on condition [0x00007f0356d87000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000000fe1613a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
	at java.lang.Thread.run(Thread.java:662)

"pool-2-thread-1" prio=10 tid=0x00007f03984ed000 nid=0x2213 runnable [0x00007f035728c000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
	- locked <0x00000000fe1880f0> (a sun.nio.ch.Util$2)
	- locked <0x00000000fe188100> (a java.util.Collections$UnmodifiableSet)
	- locked <0x00000000fe1880a8> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
	at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:333)
	- locked <0x00000000fe188110> (a org.apache.hadoop.ipc.Server$Listener$Reader)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

"LeaseChecker" daemon prio=10 tid=0x00007f039847a800 nid=0x21fa waiting on condition [0x00007f035758f000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376)
	at java.lang.Thread.run(Thread.java:662)

"Thread for syncLogs" daemon prio=10 tid=0x00007f0398479000 nid=0x21eb waiting on condition [0x00007f0357b9a000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.apache.hadoop.mapred.Child$3.run(Child.java:139)

"Low Memory Detector" daemon prio=10 tid=0x00007f039809c000 nid=0x21e2 runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x00007f0398099800 nid=0x21e1 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x00007f0398096800 nid=0x21e0 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x00007f0398094800 nid=0x21df runnable [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x00007f0398078000 nid=0x21de in Object.wait() [0x00007f0394af9000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000fe158540> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
	- locked <0x00000000fe158540> (a java.lang.ref.ReferenceQueue$Lock)
	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x00007f0398076000 nid=0x21dd in Object.wait() [0x00007f0394bfa000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x00000000fe160070> (a java.lang.ref.Reference$Lock)
	at java.lang.Object.wait(Object.java:485)
	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
	- locked <0x00000000fe160070> (a java.lang.ref.Reference$Lock)

---------------------------------------------------------------------------------------
                
> How to close all child when a job finished?
> -------------------------------------------
>
>                 Key: GIRAPH-169
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-169
>             Project: Giraph
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.2.0
>         Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 slaves,
>            Reporter: Jianfeng Qian
>            Priority: Minor
>
> I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in slaves didn't quit immediately and sometimes they never quit and I have to kill them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira