You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Vinod Kumar Vavilapalli (Created) (JIRA)" <ji...@apache.org> on 2012/02/27 23:31:49 UTC

[jira] [Created] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

MR tasks failing due to changing timestamps on Resources to download
--------------------------------------------------------------------

                 Key: MAPREDUCE-3931
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv1
    Affects Versions: 1.0.0
            Reporter: Vinod Kumar Vavilapalli
             Fix For: 1.0.1


[~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
{code}
2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
       at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
       at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
       at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
       at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:396)
       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
       at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
       at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       at java.lang.Thread.run(Thread.java:619)
 at LocalTrace:
       org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
       at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
       at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
       at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
       at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
       at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
       at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
       at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
       at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
       at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:396)
       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
       at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218917#comment-13218917 ] 

Hudson commented on MAPREDUCE-3931:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk-Commit #1876 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1876/])
    MAPREDUCE-3931. Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps. Contributed by Siddarth Seth. (Revision 1294970)

     Result = SUCCESS
vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294970
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LocalResourcePBImpl.java

                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217784#comment-13217784 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3931:
----------------------------------------------------

bq. Turns out, assigning to a long in Java may not be atomic.
wow.. Just wow.. great debugging!
                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218931#comment-13218931 ] 

Hudson commented on MAPREDUCE-3931:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #1810 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1810/])
    MAPREDUCE-3931. Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps. Contributed by Siddarth Seth. (Revision 1294970)

     Result = ABORTED
vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294970
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LocalResourcePBImpl.java

                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218918#comment-13218918 ] 

Hudson commented on MAPREDUCE-3931:
-----------------------------------

Integrated in Hadoop-Common-trunk-Commit #1802 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1802/])
    MAPREDUCE-3931. Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps. Contributed by Siddarth Seth. (Revision 1294970)

     Result = SUCCESS
vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294970
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LocalResourcePBImpl.java

                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219165#comment-13219165 ] 

Hudson commented on MAPREDUCE-3931:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Build #183 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/183/])
    MAPREDUCE-3931. Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps. Contributed by Siddarth Seth.
svn merge --ignore-ancestry -c 1294970 ../../trunk (Revision 1294973)

     Result = FAILURE
vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294973
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LocalResourcePBImpl.java

                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218915#comment-13218915 ] 

Hudson commented on MAPREDUCE-3931:
-----------------------------------

Integrated in Hadoop-Common-0.23-Commit #613 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/613/])
    MAPREDUCE-3931. Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps. Contributed by Siddarth Seth.
svn merge --ignore-ancestry -c 1294970 ../../trunk (Revision 1294973)

     Result = SUCCESS
vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294973
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LocalResourcePBImpl.java

                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3931:
-----------------------------------------------

      Resolution: Fixed
    Release Note: Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps.
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

I just committed this to trunk, branch-0.23 and branch-0.23.2. Thanks Sid!
                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219185#comment-13219185 ] 

Hudson commented on MAPREDUCE-3931:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Build #211 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/211/])
    MAPREDUCE-3931. Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps. Contributed by Siddarth Seth.
svn merge --ignore-ancestry -c 1294970 ../../trunk (Revision 1294973)

     Result = FAILURE
vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294973
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LocalResourcePBImpl.java

                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Siddharth Seth (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siddharth Seth updated MAPREDUCE-3931:
--------------------------------------

    Status: Patch Available  (was: Open)
    
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Vinod Kumar Vavilapalli (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kumar Vavilapalli updated MAPREDUCE-3931:
-----------------------------------------------

          Component/s:     (was: mrv1)
                       mrv2
    Affects Version/s:     (was: 1.0.0)
                       0.23.0
        Fix Version/s:     (was: 1.0.1)
                       0.23.2
             Assignee: Siddharth Seth

If my memory serves right, we've been observing this for ever but never could glean enough information to crack it.

[~sseth] was debugging this, assigning it to him.
                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Siddharth Seth (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siddharth Seth updated MAPREDUCE-3931:
--------------------------------------

    Attachment: MR3931.txt

Simple patch to synchronize access to LocalResourcePBImpl. Haven't seen the issue in several gridmix runs with the patch.
Also, went through other objects used by ContainerLaunchContext - LocalResource appears to be the only one which would run into such a race.
                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218932#comment-13218932 ] 

Hudson commented on MAPREDUCE-3931:
-----------------------------------

Integrated in Hadoop-Mapreduce-0.23-Commit #613 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/613/])
    MAPREDUCE-3931. Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps. Contributed by Siddarth Seth.
svn merge --ignore-ancestry -c 1294970 ../../trunk (Revision 1294973)

     Result = ABORTED
vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294973
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LocalResourcePBImpl.java

                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219214#comment-13219214 ] 

Hudson commented on MAPREDUCE-3931:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #1005 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1005/])
    MAPREDUCE-3931. Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps. Contributed by Siddarth Seth. (Revision 1294970)

     Result = SUCCESS
vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294970
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LocalResourcePBImpl.java

                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218914#comment-13218914 ] 

Hudson commented on MAPREDUCE-3931:
-----------------------------------

Integrated in Hadoop-Hdfs-0.23-Commit #601 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/601/])
    MAPREDUCE-3931. Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps. Contributed by Siddarth Seth.
svn merge --ignore-ancestry -c 1294970 ../../trunk (Revision 1294973)

     Result = SUCCESS
vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294973
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LocalResourcePBImpl.java

                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218746#comment-13218746 ] 

Hadoop QA commented on MAPREDUCE-3931:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12516463/MR3931.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 eclipse:eclipse.  The patch built with eclipse:eclipse.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1956//testReport/
Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1956//console

This message is automatically generated.
                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219147#comment-13219147 ] 

Hudson commented on MAPREDUCE-3931:
-----------------------------------

Integrated in Hadoop-Hdfs-trunk #970 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/970/])
    MAPREDUCE-3931. Changed PB implementation of LocalResource to take locks so that race conditions don't fail tasks by inadvertantly changing the timestamps. Contributed by Siddarth Seth. (Revision 1294970)

     Result = SUCCESS
vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1294970
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/LocalResourcePBImpl.java

                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Vinod Kumar Vavilapalli (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218907#comment-13218907 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3931:
----------------------------------------------------

+1. This looks fine. Pushing it in.
                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>         Attachments: MR3931.txt
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

Posted by "Siddharth Seth (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217773#comment-13217773 ] 

Siddharth Seth commented on MAPREDUCE-3931:
-------------------------------------------

This looks like other races we've seen where {{build}} is called on the same Proto builder object by multiple threads. In cases of JobId etc - values would end up getting reset to 0.
This is the same race in the LocalResource object - but the manifestation is more interesting. 2971811411 is 1330116705875 with the MS 4 bytes set to 0. Turns out, assigning to a long in Java may not be atomic.

There's multiple parallel startContainer calls - which access the same LocalReosurce object in the AM - which is where the value can get corrupt. Will attach a patch shortly which adds synchronization to LocalResourcePBImpl.
                
> MR tasks failing due to changing timestamps on Resources to download
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3931
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0
>            Reporter: Vinod Kumar Vavilapalli
>            Assignee: Siddharth Seth
>             Fix For: 0.23.2
>
>
> [~karams] reported this offline. Seems that tasks are randomly failing during gridmix runs:
> {code}
> 2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
> java.io.IOException: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
>        at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
>        at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
>        at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>  at LocalTrace:
>        org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Resource hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar changed on src filesystem (expected 2971811411, was 1330116705875
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>        at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
>        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
>        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>        at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>        at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira