You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Vinod Kumar Vavilapalli (Created) (JIRA)" <ji...@apache.org> on 2011/10/20 08:39:10 UTC

[jira] [Created] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Few reduce tasks hanging in a gridmix-run
-----------------------------------------

                 Key: MAPREDUCE-3226
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2, task
    Affects Versions: 0.23.0
            Reporter: Vinod Kumar Vavilapalli
            Priority: Blocker
             Fix For: 0.23.0


In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.

{code}
"EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)

"main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
        at java.lang.Thread.join(Thread.java:1143)
        - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
        at java.lang.Thread.join(Thread.java:1196)
        at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
{code}

Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira