You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Mahadev konar (JIRA)" <ji...@apache.org> on 2006/11/10 01:16:39 UTC

[jira] Commented: (HADOOP-704) Reduce hangs at 33%

    [ http://issues.apache.org/jira/browse/HADOOP-704?page=comments#action_12448627 ] 
            
Mahadev konar commented on HADOOP-704:
--------------------------------------

should we have some max_retries to fetch the mapoutputs from a given tasktracker for a given mapoutput? In that case we could fail and ask the jobtracker to rexecute the map on a different node. Also can this crash be related to jetty 6 upgrade?

> Reduce hangs at 33%
> -------------------
>
>                 Key: HADOOP-704
>                 URL: http://issues.apache.org/jira/browse/HADOOP-704
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.8.0
>            Reporter: Nigel Daley
>
> I have a MR job that is hanging when the reduce reaches 33%.
> Both the map and reduce are no-ops.  The single reducer is continuously trying to retrieve output from a TaskTracker that seems to have a crashed "Acceptor 50060" thread.  (Note the thread crash does not seem to be logged anywhere).  The thread dump of the TaskTracker is as follows:
> "org.apache.hadoop.dfs.DFSClient$LeaseChecker@1329642" daemon prio=1 tid=0x085abd68 nid=0x5b37 waiting on condition [0x4e979000..0x4e979f30]
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.dfs.DFSClient$LeaseChecker.run(DFSClient.java:462)
>         at java.lang.Thread.run(Thread.java:595)
> "org.apache.hadoop.io.ObjectWritable Connection Culler" daemon prio=1 tid=0x0809fe18 nid=0x5b34 waiting on condition [0x4f1e5000..0x4f1e5eb0]
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.ipc.Client$ConnectionCuller.run(Client.java:388)
> "IPC Server handler 1 on 50050" daemon prio=1 tid=0x085b5d30 nid=0x57f8 in Object.wait() [0x4eafd000..0x4eafd130]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x5553ee10> (a java.util.LinkedList)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:490)
>         - locked <0x5553ee10> (a java.util.LinkedList)
> "IPC Server handler 0 on 50050" daemon prio=1 tid=0x085b57b0 nid=0x57f7 in Object.wait() [0x4eb7e000..0x4eb7e1b0]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x5553ee10> (a java.util.LinkedList)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:490)
>         - locked <0x5553ee10> (a java.util.LinkedList)
> "IPC Server listener on 50050" daemon prio=1 tid=0x083884d8 nid=0x57f6 runnable [0x4ebfe000..0x4ebff030]
>         at sun.nio.ch.PollArrayWrapper.poll0(Native Method)
>         at sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:100)
>         at sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:56)
>         at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>         - locked <0x5553f3e8> (a sun.nio.ch.Util$1)
>         - locked <0x5553f3d8> (a java.util.Collections$UnmodifiableSet)
>         - locked <0x5553f150> (a sun.nio.ch.PollSelectorImpl)
>         at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>         at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:84)
>         at org.apache.hadoop.ipc.Server$Listener.run(Server.java:224)
> "btpool0-1 - Invalidator - /" prio=1 tid=0x08239ac0 nid=0x57f2 waiting on condition [0x4edfe000..0x4edfef30]
>         at java.lang.Thread.sleep(Native Method)
>         at org.mortbay.jetty.servlet.AbstractSessionManager$SessionScavenger.run(AbstractSessionManager.java:933)
>         at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:475)
> "taskCleanup" daemon prio=1 tid=0x0810fd60 nid=0x57ed in Object.wait() [0x4f6c0000..0x4f6c0e30]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x554dc650> (a java.util.ArrayList)
>         at java.lang.Object.wait(Object.java:474)
>         at org.apache.hadoop.mapred.TaskTracker$BlockingQueue.take(TaskTracker.java:783)
>         - locked <0x554dc650> (a java.util.ArrayList)
>         at org.apache.hadoop.mapred.TaskTracker$1.run(TaskTracker.java:143)
>         at java.lang.Thread.run(Thread.java:595)
> "Low Memory Detector" daemon prio=1 tid=0x509a54a8 nid=0x57ea runnable [0x00000000..0x00000000]
> "CompilerThread1" daemon prio=1 tid=0x509a40c0 nid=0x57e9 waiting on condition [0x00000000..0x506793d8]
> "CompilerThread0" daemon prio=1 tid=0x509a3138 nid=0x57e8 waiting on condition [0x00000000..0x506fa258]
> "AdapterThread" daemon prio=1 tid=0x509a2170 nid=0x57e7 waiting on condition [0x00000000..0x00000000]
> "Signal Dispatcher" daemon prio=1 tid=0x509a13e0 nid=0x57e6 runnable [0x00000000..0x00000000]
> "Finalizer" daemon prio=1 tid=0x50998880 nid=0x57e5 in Object.wait() [0x5087d000..0x5087dfb0]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x554dca70> (a java.lang.ref.ReferenceQueue$Lock)
>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
>         - locked <0x554dca70> (a java.lang.ref.ReferenceQueue$Lock)
>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
>         at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
> "Reference Handler" daemon prio=1 tid=0x509983b8 nid=0x57e4 in Object.wait() [0x508fe000..0x508fee30]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x554c4450> (a java.lang.ref.Reference$Lock)
>         at java.lang.Object.wait(Object.java:474)
>         at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)
>         - locked <0x554c4450> (a java.lang.ref.Reference$Lock)
> "main" prio=1 tid=0x0805e608 nid=0x57d2 in Object.wait() [0xdfffc000..0xdfffcd08]
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x554dc7b0> (a [I)
>         at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:436)
>         - locked <0x554dc7b0> (a [I)
>         at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:720)
>         at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:1374)
> "VM Thread" prio=1 tid=0x50996028 nid=0x57e3 runnable
> "GC task thread#0 (ParallelGC)" prio=1 tid=0x08078cc8 nid=0x57df runnable
> "GC task thread#1 (ParallelGC)" prio=1 tid=0x080798d0 nid=0x57e0 runnable
> "GC task thread#2 (ParallelGC)" prio=1 tid=0x0807a4c0 nid=0x57e1 runnable
> "GC task thread#3 (ParallelGC)" prio=1 tid=0x0807b0b0 nid=0x57e2 runnable
> "VM Periodic Task Thread" prio=1 tid=0x509a6a10 nid=0x57eb waiting on condition

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira