You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Vinod Kumar Vavilapalli (Created) (JIRA)" <ji...@apache.org> on 2011/10/20 08:39:10 UTC
[jira] [Created] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Few reduce tasks hanging in a gridmix-run
-----------------------------------------
Key: MAPREDUCE-3226
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2, task
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Priority: Blocker
Fix For: 0.23.0
In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
{code}
"EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
"main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
at java.lang.Thread.join(Thread.java:1143)
- locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
at java.lang.Thread.join(Thread.java:1196)
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
{code}
Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131441#comment-13131441 ]
Hadoop QA commented on MAPREDUCE-3226:
--------------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12499823/MAPREDUCE-3226-20111020.txt
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 160 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in .
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1079//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1079//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1079//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1079//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1079//console
This message is automatically generated.
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kumar Vavilapalli updated MAPREDUCE-3226:
-----------------------------------------------
Status: Patch Available (was: Open)
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132654#comment-13132654 ]
Hudson commented on MAPREDUCE-3226:
-----------------------------------
Integrated in Hadoop-Hdfs-trunk #837 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/837/])
MAPREDUCE-3226. Fix shutdown of fetcher threads. Contributed by Vinod K V.
acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187116
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132169#comment-13132169 ]
Hudson commented on MAPREDUCE-3226:
-----------------------------------
Integrated in Hadoop-Common-0.23-Commit #33 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/33/])
Merge -c 1187116 from trunk to branch-0.23 to complete fix for MAPREDUCE-3226.
acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187119
Files :
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131427#comment-13131427 ]
Vinod Kumar Vavilapalli commented on MAPREDUCE-3226:
----------------------------------------------------
More information, task logs have this exception:
{code}
2011-10-18 10:34:41,006 INFO org.apache.hadoop.mapred.Task: Communication exception: java.io.IOException: Failed on local exception: java.nio.channels.ClosedByInterruptException; Host Details : local host is: "host.name.com/$IP"; destination host is: ""host.name.com":48314;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:601)
at org.apache.hadoop.ipc.Client.call(Client.java:1089)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:193)
at $Proxy6.statusUpdate(Unknown Source)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:671)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:184)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:341)
at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:60)
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:151)
at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:112)
at org.apache.hadoop.security.SaslOutputStream.write(SaslOutputStream.java:168)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at java.io.DataOutputStream.flush(DataOutputStream.java:106)
at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:796)
at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:193)
at $Proxy6.getMapCompletionEvents(Unknown Source)
at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.getMapCompletionEvents(EventFetcher.java:99)
at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:65)
{code}
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132648#comment-13132648 ]
Hudson commented on MAPREDUCE-3226:
-----------------------------------
Integrated in Hadoop-Mapreduce-0.23-Build #58 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/58/])
Merge -c 1187116 from trunk to branch-0.23 to complete fix for MAPREDUCE-3226.
acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187119
Files :
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Arun C Murthy (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated MAPREDUCE-3226:
-------------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
I just committed this. Thanks Vinod!
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kumar Vavilapalli updated MAPREDUCE-3226:
-----------------------------------------------
Attachment: MAPREDUCE-3226-20111020.txt
This patch should fix it.
Also fixing Fetcher thread just in case.
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132622#comment-13132622 ]
Hudson commented on MAPREDUCE-3226:
-----------------------------------
Integrated in Hadoop-Hdfs-0.23-Build #46 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/46/])
Merge -c 1187116 from trunk to branch-0.23 to complete fix for MAPREDUCE-3226.
acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187119
Files :
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132156#comment-13132156 ]
Hudson commented on MAPREDUCE-3226:
-----------------------------------
Integrated in Hadoop-Hdfs-trunk-Commit #1204 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1204/])
MAPREDUCE-3226. Fix shutdown of fetcher threads. Contributed by Vinod K V.
acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187116
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132204#comment-13132204 ]
Hudson commented on MAPREDUCE-3226:
-----------------------------------
Integrated in Hadoop-Mapreduce-trunk-Commit #1142 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1142/])
MAPREDUCE-3226. Fix shutdown of fetcher threads. Contributed by Vinod K V.
acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187116
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132642#comment-13132642 ]
Hudson commented on MAPREDUCE-3226:
-----------------------------------
Integrated in Hadoop-Mapreduce-trunk #867 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/867/])
MAPREDUCE-3226. Fix shutdown of fetcher threads. Contributed by Vinod K V.
acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187116
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Vinod Kumar Vavilapalli (Assigned) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vinod Kumar Vavilapalli reassigned MAPREDUCE-3226:
--------------------------------------------------
Assignee: Vinod Kumar Vavilapalli
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132177#comment-13132177 ]
Hudson commented on MAPREDUCE-3226:
-----------------------------------
Integrated in Hadoop-Common-trunk-Commit #1126 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1126/])
MAPREDUCE-3226. Fix shutdown of fetcher threads. Contributed by Vinod K V.
acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187116
Files :
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132162#comment-13132162 ]
Hudson commented on MAPREDUCE-3226:
-----------------------------------
Integrated in Hadoop-Hdfs-0.23-Commit #33 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/33/])
Merge -c 1187116 from trunk to branch-0.23 to complete fix for MAPREDUCE-3226.
acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187119
Files :
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3226) Few reduce tasks hanging in a
gridmix-run
Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132197#comment-13132197 ]
Hudson commented on MAPREDUCE-3226:
-----------------------------------
Integrated in Hadoop-Mapreduce-0.23-Commit #33 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/33/])
Merge -c 1187116 from trunk to branch-0.23 to complete fix for MAPREDUCE-3226.
acmurthy : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187119
Files :
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/EventFetcher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
> Few reduce tasks hanging in a gridmix-run
> -----------------------------------------
>
> Key: MAPREDUCE-3226
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2, task
> Affects Versions: 0.23.0
> Reporter: Vinod Kumar Vavilapalli
> Assignee: Vinod Kumar Vavilapalli
> Priority: Blocker
> Fix For: 0.23.0
>
> Attachments: MAPREDUCE-3226-20111020.txt
>
>
> In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 hanging reducers. All of the them are stuck after downloading all map outputs and have the following thread dump.
> {code}
> "EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 nid=0x1ca4 waiting on condition [0xa315c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)
> "main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1143)
> - locked <0xa94b23d8> (a org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
> at java.lang.Thread.join(Thread.java:1196)
> at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
> {code}
> Thanks to [~karams] for helping track this down.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira