You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Roman K (JIRA)" <ji...@apache.org> on 2012/05/03 08:55:51 UTC
[jira] [Commented] (GIRAPH-169) How to close all child when a job
finished?
[ https://issues.apache.org/jira/browse/GIRAPH-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267257#comment-13267257 ]
Roman K commented on GIRAPH-169:
--------------------------------
I successfully reproduced the problem even on the simpler case with 1 worker only on pseudo distributed environment:
hadoop jar giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 10 -v -V 1000 -w 1
I took the full thread dump of the "hung" child process using jstack (this is the meaningful part without GC threads)
but didn't succeed to figure out the problem yet :
--------------------------------------------------------------------------------
"pool-1-thread-1" prio=10 tid=0x00007f0398539000 nid=0x2218 waiting on condition [0x00007f0356d87000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000fe1613a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:662)
"pool-2-thread-1" prio=10 tid=0x00007f03984ed000 nid=0x2213 runnable [0x00007f035728c000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x00000000fe1880f0> (a sun.nio.ch.Util$2)
- locked <0x00000000fe188100> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000fe1880a8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:333)
- locked <0x00000000fe188110> (a org.apache.hadoop.ipc.Server$Listener$Reader)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
"LeaseChecker" daemon prio=10 tid=0x00007f039847a800 nid=0x21fa waiting on condition [0x00007f035758f000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:1376)
at java.lang.Thread.run(Thread.java:662)
"Thread for syncLogs" daemon prio=10 tid=0x00007f0398479000 nid=0x21eb waiting on condition [0x00007f0357b9a000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.mapred.Child$3.run(Child.java:139)
"Low Memory Detector" daemon prio=10 tid=0x00007f039809c000 nid=0x21e2 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007f0398099800 nid=0x21e1 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00007f0398096800 nid=0x21e0 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00007f0398094800 nid=0x21df runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00007f0398078000 nid=0x21de in Object.wait() [0x00007f0394af9000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000fe158540> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)
- locked <0x00000000fe158540> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=10 tid=0x00007f0398076000 nid=0x21dd in Object.wait() [0x00007f0394bfa000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000000fe160070> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
- locked <0x00000000fe160070> (a java.lang.ref.Reference$Lock)
---------------------------------------------------------------------------------------
> How to close all child when a job finished?
> -------------------------------------------
>
> Key: GIRAPH-169
> URL: https://issues.apache.org/jira/browse/GIRAPH-169
> Project: Giraph
> Issue Type: Improvement
> Components: mapreduce
> Affects Versions: 0.2.0
> Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 slaves,
> Reporter: Jianfeng Qian
> Priority: Minor
>
> I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in slaves didn't quit immediately and sometimes they never quit and I have to kill them.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira