You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Vinod Kumar Vavilapalli (Created) (JIRA)" <ji...@apache.org> on 2011/11/02 15:01:33 UTC

[jira] [Created] (MAPREDUCE-3333) MR AM for sort-job going out of memory

MR AM for sort-job going out of memory
--------------------------------------

                 Key: MAPREDUCE-3333
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: applicationmaster, mrv2
    Affects Versions: 0.23.0
            Reporter: Vinod Kumar Vavilapalli
            Priority: Blocker


[~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
{code}
2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
_01_001434 : java.lang.reflect.UndeclaredThrowableException
        at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
        at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
        at $Proxy20.startContainer(Unknown Source)
        at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
        ... 4 more
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
        at org.apache.hadoop.ipc.Client.call(Client.java:1089)
        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
        ... 6 more
Caused by: java.io.IOException: Couldn't set up IO streams
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
        at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
        at org.apache.hadoop.ipc.Client.call(Client.java:1065)
        ... 7 more
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:597)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
        ... 10 more
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147047#comment-13147047 ] 

Hudson commented on MAPREDUCE-3333:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #1261 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1261/])
    MAPREDUCE-3333. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv)

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199751
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Siddharth Seth (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146896#comment-13146896 ] 

Siddharth Seth commented on MAPREDUCE-3333:
-------------------------------------------

Forgot to mention - nice clean workaround to the rpc stop not working :) Thought it'd be way more involved.
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3333:
-----------------------------------------------

    Attachment: MAPREDUCE-3333-20111108.txt

Tracked this down finally. With lots of help from Karam.

What was happening was that after MAPREDUCE-3256, we create one connection per container to a nodeManager and this per-container connection wasn't closed after its use. Soon, the number of threads created by Hadoop RPC per connection reaches the ulimit on the node's number of processes and java beautifully describes it as an out-of-memory error.

I put in a "RPC.stopProxy(obj)" call a couple of days back itself, but that didn't work because of the multiple layering of RPC in Yarn. It's time somebody cleanup that mess.

Attached patch should (finally) fix this. Cannot add in any automated tests. Testing on a big cluster only where this is reproducible consistently.

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147059#comment-13147059 ] 

Hudson commented on MAPREDUCE-3333:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #1283 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1283/])
    MAPREDUCE-3333. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv)

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199751
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147037#comment-13147037 ] 

Hadoop QA commented on MAPREDUCE-3333:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12503073/MAPREDUCE-3333-20111109.1.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1281//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1281//console

This message is automatically generated.
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3333:
-----------------------------------------------

    Attachment: MAPREDUCE-3333-20111109.txt

Attaching patch that should set the maxIdleTime to zero for connections to NodeManagers.

No automated tests still, but did some manual testing by hacking a test, putting in sleeps whereever needed to take thread dumps of MRAppMaster. Without the patch and setting a huge maxIdleTime, the connections linger for ever. After the explicit setting of maxIdleTime to zero, connections go away immediately.
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146214#comment-13146214 ] 

Hadoop QA commented on MAPREDUCE-3333:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12502903/MAPREDUCE-3333-20111108.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1270//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1270//console

This message is automatically generated.
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147008#comment-13147008 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3333:
----------------------------------------------------

bq. The close call shouldn't really be required with the idle time set to 0.
My idea was to actually remove the maxIdleTime setting once the root issue HADOOP-7317 is fixed. I'll let it be.
bq. Should RPCClientFactoryPBImpl call RPC.stopProxy ? instead of putting it in all the service client impls? It's a PB specific factory, so putting it here should be ok.
No, that isn't possible. We need access to the proxy object in each impl. Bane of multiple layering in this part of the code.
bq.Otherwise - the Exception in stopClient() should not be ignored.
Sure, I'll throw exception so that it is clear if somebody calles stopClient() for a protocol that doesn't implement it.
bq. The client cache (removed by the patch) in ContainerLauncherImpl would still be useful in non-secure mode. This works for both though - so isn't high priority. Maybe a separate jira.
Sure, but helps to have the same implementation. Separate JIRA if someone needs it.
bq. Forgot to mention - nice clean workaround to the rpc stop not working Thought it'd be way more involved.
Yeah, been running with this workaround since nearly a week but didn't put that in the patch in the hope of fixing the root cause. Turns out that is the only short term solution, alas.

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147046#comment-13147046 ] 

Hudson commented on MAPREDUCE-3333:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #1335 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1335/])
    MAPREDUCE-3333. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv)

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199751
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142234#comment-13142234 ] 

Hadoop QA commented on MAPREDUCE-3333:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12501972/MAPREDUCE-3333-20111102.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1240//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1240//console

This message is automatically generated.
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>         Attachments: MAPREDUCE-3333-20111102.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3333:
-----------------------------------------------

    Hadoop Flags: Reviewed
          Status: Patch Available  (was: Open)
    
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3333:
-----------------------------------------------

    Attachment: MAPREDUCE-3333-20111109.1.txt

Updated patch to address Sid's comment about the exception.

Karam pointed out a typo in the default-nm-timeout constant. Fixed that too.
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3333:
-----------------------------------------------

    Fix Version/s: 0.23.1
           Status: Patch Available  (was: Open)

Running it through His Highness Jenkins.
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147673#comment-13147673 ] 

Hudson commented on MAPREDUCE-3333:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #893 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/893/])
    MAPREDUCE-3333. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv)

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199751
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Karam Singh (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146998#comment-13146998 ] 

Karam Singh commented on MAPREDUCE-3333:
----------------------------------------

After applying lastest patch, Ran Sort twice and did not observe this issue anymore

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147322#comment-13147322 ] 

Hudson commented on MAPREDUCE-3333:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Build #87 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/87/])
    MAPREDUCE-3333. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv)  
svn merge -c r1199751 --ignore-ancestry ../../trunk/

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199757
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3333:
-----------------------------------------------

    Status: Patch Available  (was: Open)
    
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3333:
-----------------------------------------------

      Resolution: Fixed
    Release Note: Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes.
          Status: Resolved  (was: Patch Available)

I just committed this to trunk and branch-0.23.

Thanks to Sid for the review. And a ton of thanks to Karam who helped me test this on cluster.

That was such a long-drawn affair!
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147651#comment-13147651 ] 

Hudson commented on MAPREDUCE-3333:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #859 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/859/])
    MAPREDUCE-3333. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv)

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199751
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3333:
-----------------------------------------------

    Status: Patch Available  (was: Open)
    
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>         Attachments: MAPREDUCE-3333-20111102.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146836#comment-13146836 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3333:
----------------------------------------------------

More digging up through the RPC layer and I figured I am running into HADOOP-7317 or the related HDFS-1965. We can use the same RPC client to connect to different servers, reuse connections to the same server but cannot terminate connections individually.

Two options we have:
 - Modify ProtoOverHadoopRPCEngine to avoid caching of clients altogether depending on a configuration or
 - set the idle time for connections to zero.

Either is manageable, effort-wise, I am doing the later as it is simpler.
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147646#comment-13147646 ] 

Hudson commented on MAPREDUCE-3333:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #72 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/72/])
    MAPREDUCE-3333. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv)  
svn merge -c r1199751 --ignore-ancestry ../../trunk/

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199757
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Siddharth Seth (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146890#comment-13146890 ] 

Siddharth Seth commented on MAPREDUCE-3333:
-------------------------------------------

Went through the latest patch. Looks good mostly.
- The close call shouldn't really be required with the idle time set to 0.
- Should RPCClientFactoryPBImpl call RPC.stopProxy ? instead of putting it in all the service client impls? It's a PB specific factory, so putting it here should be ok. Otherwise - the Exception in stopClient() should not be ignored.
- The client cache (removed by the patch) in ContainerLauncherImpl would still be useful in non-secure mode. This works for both though - so isn't high priority. Maybe a separate jira.
Haven't tried the latest patch. Had tried the previous one on a single node. Jobs were running fine. The close call was getting to the ClientCache, but doing nothing due to refcount checks.

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3333:
-----------------------------------------------

    Status: Open  (was: Patch Available)

Still debugging..
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>         Attachments: MAPREDUCE-3333-20111102.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3333:
-----------------------------------------------

    Attachment: MAPREDUCE-3333-20111102.txt

The exception trace gave it away. It is not the pool of threads, but the RPC layer itself. For each client, RPC layer creates a thread for connection/communication etc. With MAPREDUCE-3256, we need one client per container because of per-container token. So, the number or RPC level threads blows up and you know the rest of the story.

Attaching patch. Taking Karam's help for testing.
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>         Attachments: MAPREDUCE-3333-20111102.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli reassigned MAPREDUCE-3333:
--------------------------------------------------

    Assignee: Vinod Kumar Vavilapalli
    
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142162#comment-13142162 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3333:
----------------------------------------------------

It wasn't so hard to track this down, given one of my earlier patches causes this - MAPREDUCE-3256.

My mistake. AM now tries to create one thread per container instead of the earlier and the correct behaviour of one thread per node.
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Priority: Blocker
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147052#comment-13147052 ] 

Hudson commented on MAPREDUCE-3333:
-----------------------------------

Integrated in Hadoop-Common-0.23-Commit #161 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/161/])
    MAPREDUCE-3333. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv)  
svn merge -c r1199751 --ignore-ancestry ../../trunk/

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199757
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147061#comment-13147061 ] 

Hudson commented on MAPREDUCE-3333:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Commit #172 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/172/])
    MAPREDUCE-3333. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv)  
svn merge -c r1199751 --ignore-ancestry ../../trunk/

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199757
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3333:
-----------------------------------------------

    Status: Open  (was: Patch Available)

Duh, still some problems at RPC layer which isn't stopping the client thread though I am making the correct calls all the way down.

Status: Still debugging.
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147050#comment-13147050 ] 

Hudson commented on MAPREDUCE-3333:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Commit #160 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/160/])
    MAPREDUCE-3333. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv)  
svn merge -c r1199751 --ignore-ancestry ../../trunk/

vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1199757
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java

                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142170#comment-13142170 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3333:
----------------------------------------------------

Actually the code does look right, it creates only one thread per node. This is deeper than my first suspicion, still debugging.
                
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3333) MR AM for sort-job going out of memory

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3333:
-----------------------------------------------

    Status: Open  (was: Patch Available)
    
> MR AM for sort-job going out of memory
> --------------------------------------
>
>                 Key: MAPREDUCE-3333
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3333
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Vinod Kumar Vavilapalli
>            Priority: Blocker
>             Fix For: 0.23.1
>
>         Attachments: MAPREDUCE-3333-20111102.txt, MAPREDUCE-3333-20111108.txt, MAPREDUCE-3333-20111109.1.txt, MAPREDUCE-3333-20111109.txt
>
>
> [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes.
> {code}
> 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002
> _01_001434 : java.lang.reflect.UndeclaredThrowableException
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
>         at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>         at $Proxy20.startContainer(Unknown Source)
>         at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
>         ... 4 more
> Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1089)
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>         ... 6 more
> Caused by: java.io.IOException: Couldn't set up IO streams
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1065)
>         ... 7 more
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:597)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
>         ... 10 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira