You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Vinod Kumar Vavilapalli (Created) (JIRA)" <ji...@apache.org> on 2011/10/20 08:39:10 UTC

[jira] [Created] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Few reduce tasks hanging in a gridmix-run
-----------------------------------------

                 Key: MAPREDUCE-3226
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2, task
    Affects Versions: 0.23.0
            Reporter: Vinod Kumar Vavilapalli
            Priority: Blocker
             Fix For: 0.23.0


In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.

{code}
"EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)

"main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
        at java.lang.Thread.join(Thread.java:1143)
        - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
        at java.lang.Thread.join(Thread.java:1196)
        at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
{code}

Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131441#comment-13131441 ] 

Hadoop QA commented on MAPREDUCE-3226:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12499823/MAPREDUCE-3226-20111020.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1079//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1079//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1079//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1079//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1079//console

This message is automatically generated.
                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3226:
-----------------------------------------------

    Status: Patch Available  (was: Open)
    
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132654#comment-13132654 ] 

Hudson commented on MAPREDUCE-3226:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #837 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/837/])
    MAPREDUCE-3226. Fix shutdown of fetcher threads. Contributed by Vinod K V.

acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187116
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java

                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132169#comment-13132169 ] 

Hudson commented on MAPREDUCE-3226:
-----------------------------------

Integrated in Hadoop-Common-0.23-Commit #33 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/33/])
    Merge -c 1187116 from trunk to branch-0.23 to complete fix for MAPREDUCE-3226.

acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187119
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java

                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131427#comment-13131427 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3226:
----------------------------------------------------

More information, task logs have this exception:

{code}
2011-10-18 10:34:41,006 INFO org.apache.hadoop.mapred.Task: Communication exception: java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "host.name.com/$IP"; destination host is: ""host.name.com":48314;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:601)
        at org.apache.hadoop.ipc.Client.call(Client.java:1089)
        at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:193)
        at $Proxy6.statusUpdate(Unknown Source)
        at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:671)
        at java.lang.Thread.run(Thread.java:619)

Caused by: java.nio.channels.ClosedByInterruptException
        at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
        at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:60)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:151)
        at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:112)
        at org.apache.hadoop.security.SaslOutputStream.write(SaslOutputStream.java:168)
        at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
        at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
        at java.io.DataOutputStream.flush(DataOutputStream.java:106)
        at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:796)
        at org.apache.hadoop.ipc.Client.call(Client.java:1066)
        at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:193)
        at $Proxy6.getMapCompletionEvents(Unknown Source)
        at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.getMapCompletionEvents(EventFetcher.java:99)
        at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:65)
{code}
                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132648#comment-13132648 ] 

Hudson commented on MAPREDUCE-3226:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Build #58 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/58/])
    Merge -c 1187116 from trunk to branch-0.23 to complete fix for MAPREDUCE-3226.

acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187119
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java

                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Arun C Murthy (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-3226:
-------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks Vinod!
                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3226:
-----------------------------------------------

    Attachment: MAPREDUCE-3226-20111020.txt

This patch should fix it.

Also fixing Fetcher thread just in case.
                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132622#comment-13132622 ] 

Hudson commented on MAPREDUCE-3226:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #46 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/46/])
    Merge -c 1187116 from trunk to branch-0.23 to complete fix for MAPREDUCE-3226.

acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187119
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java

                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132156#comment-13132156 ] 

Hudson commented on MAPREDUCE-3226:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #1204 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1204/])
    MAPREDUCE-3226. Fix shutdown of fetcher threads. Contributed by Vinod K V.

acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187116
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java

                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132204#comment-13132204 ] 

Hudson commented on MAPREDUCE-3226:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #1142 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1142/])
    MAPREDUCE-3226. Fix shutdown of fetcher threads. Contributed by Vinod K V.

acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187116
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java

                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132642#comment-13132642 ] 

Hudson commented on MAPREDUCE-3226:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #867 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/867/])
    MAPREDUCE-3226. Fix shutdown of fetcher threads. Contributed by Vinod K V.

acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187116
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java

                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Vinod Kumar Vavilapalli (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli reassigned MAPREDUCE-3226:
--------------------------------------------------

    Assignee: Vinod Kumar Vavilapalli
    
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132177#comment-13132177 ] 

Hudson commented on MAPREDUCE-3226:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #1126 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1126/])
    MAPREDUCE-3226. Fix shutdown of fetcher threads. Contributed by Vinod K V.

acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187116
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java

                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132162#comment-13132162 ] 

Hudson commented on MAPREDUCE-3226:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Commit #33 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/33/])
    Merge -c 1187116 from trunk to branch-0.23 to complete fix for MAPREDUCE-3226.

acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187119
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java

                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132197#comment-13132197 ] 

Hudson commented on MAPREDUCE-3226:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Commit #33 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/33/])
    Merge -c 1187116 from trunk to branch-0.23 to complete fix for MAPREDUCE-3226.

acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187119
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java

                
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
>                 Key: MAPREDUCE-3226
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2, task
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1143)
>         - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
>         at java.lang.Thread.join(Thread.java:1196)
>         at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira