You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Peter Bacsko (JIRA)" <ji...@apache.org> on 2019/02/22 13:47:00 UTC

[jira] [Assigned] (MAPREDUCE-7144) Speculative execution can cause race condition

     [ https://issues.apache.org/jira/browse/MAPREDUCE-7144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Bacsko reassigned MAPREDUCE-7144:
---------------------------------------

    Assignee: Peter Bacsko

> Speculative execution can cause race condition
> ----------------------------------------------
>
>                 Key: MAPREDUCE-7144
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7144
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>
> In our internal build environment, we observed that the test case {{TestMRIntermediateDataEncryption#testMultipleReducers}} was flaky and failed randomly on multiple branches.
> After a long investigation, it turned out that the problems were caused by speculative execution and timing issues around it.
> Detailed explanation:
> 1. AppMaster speculatively starts two reducers:
> {noformat}
> 2018-09-19 04:09:31,022 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: ATTEMPT_START task_1537355349087_0001_r_000001
> ...
> 2018-09-19 04:09:31,025 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: ATTEMPT_START task_1537355349087_0001_r_000000
> {noformat}
> 2. Both attempts are scheduled and run in parallel:
> {noformat}
> 2018-09-19 04:09:31,025 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1537355349087_0001_r_000000_0 TaskAttempt Transitioned from ASSIGNED to RUNNING
> ...
> 2018-09-19 04:09:46,036 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1537355349087_0001_r_000000_1 TaskAttempt Transitioned from ASSIGNED to RUNNING
> {noformat}
> 3.  attempt_1537355349087_0001_r_000000_1 is finished earlier and reached progress of 1.0
> {noformat}
> 2018-09-19 04:10:05,747 INFO [IPC Server handler 3 on 36796] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1537355349087_0001_r_000000_1 is : 1.0
> 2018-09-19 04:10:05,751 INFO [IPC Server handler 2 on 36796] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1537355349087_0001_r_000000_1
> {noformat}
> 4. There's no need for attempt_1537355349087_0001_r_000000_0, so the AppMaster decides to kill it:
> {noformat}
> 2018-09-19 04:10:05,755 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Issuing kill to other attempt attempt_1537355349087_0001_r_000000_0
> {noformat}
> 5.  Right after this, the MapReduce job transitions to COMMITTING phase, which involves moving files on HDFS, deleting the temporary directory and creating a file named _SUCCESS:
> {noformat}
> 2018-09-19 04:10:05,836 DEBUG [IPC Server handler 3 on 45026] hdfs.StateChange (FSNamesystem.java:deleteInt(4181)) - DIR* NameSystem.delete: /test/output/_temporary
> 2018-09-19 04:10:05,836 DEBUG [IPC Server handler 3 on 45026] hdfs.StateChange (FSDirectory.java:delete(1334)) - DIR* FSDirectory.delete: /test/output/_temporary
> 2018-09-19 04:10:05,837 DEBUG [IPC Server handler 3 on 45026] hdfs.StateChange (FSDirectory.java:unprotectedDelete(1480)) - DIR* FSDirectory.unprotectedDelete: _temporary is removed
> 2018-09-19 04:10:05,837 DEBUG [IPC Server handler 3 on 45026] hdfs.StateChange (FSNamesystem.java:deleteInternal(4251)) - DIR* Namesystem.delete: /test/output/_temporary is removed
> 2018-09-19 04:10:05,837 INFO  [IPC Server handler 3 on 45026] FSNamesystem.audit (FSNamesystem.java:logAuditMessage(9826)) - allowed=true	ugi=jenkins (auth:SIMPLE)	ip=/127.0.0.1	cmd=delete	src=/test/output/_temporary	dst=null	perm=null	proto=rpc
> 2018-09-19 04:10:05,837 DEBUG [IPC Server handler 3 on 45026] metrics.TopMetrics (TopMetrics.java:report(122)) - a metric is reported: cmd: delete user: jenkins (auth:SIMPLE)
> 2018-09-19 04:10:05,837 DEBUG [IPC Server handler 3 on 45026] top.TopAuditLogger (TopAuditLogger.java:logAuditEvent(78)) - ------------------- logged event for top service: allowed=true	ugi=jenkins (auth:SIMPLE)	ip=/127.0.0.1	cmd=delete	src=/test/output/_temporary	dst=null	perm=null
> 2018-09-19 04:10:05,839 DEBUG [IPC Server handler 2 on 45026] hdfs.StateChange (NameNodeRpcServer.java:create(596)) - *DIR* NameNode.create: file /test/output/_SUCCESS for DFSClient_NONMAPREDUCE_-188083900_1 at 127.0.0.1
> 2018-09-19 04:10:05,839 DEBUG [IPC Server handler 2 on 45026] hdfs.StateChange (FSNamesystem.java:startFileInt(2748)) - DIR* NameSystem.startFile: src=/test/output/_SUCCESS, holder=DFSClient_NONMAPREDUCE_-188083900_1, clientMachine=127.0.0.1, createParent=true, replication=2, createFlag=[CREATE, OVERWRITE], blockSize=134217728, supportedVersions=[CryptoProtocolVersion{description='Encryption zones', version=2, unknownValue=null}]
> 2018-09-19 04:10:05,839 DEBUG [IPC Server handler 2 on 45026] namenode.FSDirectory (FSDirectory.java:copyINodeDefaultAcl(2272)) - child: _SUCCESS, posixAclInheritanceEnabled: false, modes: { masked: rw-rw-rw-, unmasked: rw-rw-rw- }
> 2018-09-19 04:10:05,839 DEBUG [IPC Server handler 2 on 45026] hdfs.StateChange (FSDirectory.java:addFile(400)) - DIR* addFile: /test/output/_SUCCESS is added
> 2018-09-19 04:10:05,840 DEBUG [IPC Server handler 2 on 45026] hdfs.StateChange (FSNamesystem.java:startFileInternal(2973)) - DIR* NameSystem.startFile: added /test/output/_SUCCESS inode 16425 holder DFSClient_NONMAPREDUCE_-188083900_1
> {noformat}
> 6. However, the reducer attempt_1537355349087_0001_r_000000_0 is not killed instantly. In fact, it was still running at this point and re-created the temporary directory:
> {noformat}
> 2018-09-19 04:10:06,357 DEBUG [IPC Server handler 6 on 45026] hdfs.StateChange (NameNodeRpcServer.java:create(596)) - *DIR* NameNode.create: file /test/output/_temporary/1/_temporary/attempt_1537355349087_0001_r_000000_0/part-00000 for DFSClient_attempt_1537355349087_0001_r_000000_0_-603230467_1 at 127.0.0.1
> 2018-09-19 04:10:06,358 DEBUG [IPC Server handler 6 on 45026] hdfs.StateChange (FSNamesystem.java:startFileInt(2748)) - DIR* NameSystem.startFile: src=/test/output/_temporary/1/_temporary/attempt_1537355349087_0001_r_000000_0/part-00000, holder=DFSClient_attempt_1537355349087_0001_r_000000_0_-603230467_1, clientMachine=127.0.0.1, createParent=true, replication=2, createFlag=[CREATE, OVERWRITE], blockSize=134217728, supportedVersions=[CryptoProtocolVersion{description='Encryption zones', version=2, unknownValue=null}]
> 2018-09-19 04:10:06,358 DEBUG [IPC Server handler 6 on 45026] namenode.FSDirectory (FSDirectory.java:copyINodeDefaultAcl(2272)) - child: _temporary, posixAclInheritanceEnabled: false, modes: rwxrwxrwx
> 2018-09-19 04:10:06,358 DEBUG [IPC Server handler 6 on 45026] hdfs.StateChange (FSNamesystem.java:mkdirsRecursively(4614)) - mkdirs: created directory /test/output/_temporary
> 2018-09-19 04:10:06,358 DEBUG [IPC Server handler 6 on 45026] namenode.FSDirectory (FSDirectory.java:copyINodeDefaultAcl(2272)) - child: 1, posixAclInheritanceEnabled: false, modes: rwxrwxrwx
> 2018-09-19 04:10:06,359 DEBUG [IPC Server handler 6 on 45026] hdfs.StateChange (FSNamesystem.java:mkdirsRecursively(4614)) - mkdirs: created directory /test/output/_temporary/1
> 2018-09-19 04:10:06,359 DEBUG [IPC Server handler 6 on 45026] namenode.FSDirectory (FSDirectory.java:copyINodeDefaultAcl(2272)) - child: _temporary, posixAclInheritanceEnabled: false, modes: rwxrwxrwx
> 2018-09-19 04:10:06,359 DEBUG [IPC Server handler 6 on 45026] hdfs.StateChange (FSNamesystem.java:mkdirsRecursively(4614)) - mkdirs: created directory /test/output/_temporary/1/_temporary
> 2018-09-19 04:10:06,359 DEBUG [IPC Server handler 6 on 45026] namenode.FSDirectory (FSDirectory.java:copyINodeDefaultAcl(2272)) - child: attempt_1537355349087_0001_r_000000_0, posixAclInheritanceEnabled: false, modes: rwxrwxrwx
> 2018-09-19 04:10:06,359 DEBUG [IPC Server handler 6 on 45026] hdfs.StateChange (FSNamesystem.java:mkdirsRecursively(4614)) - mkdirs: created directory /test/output/_temporary/1/_temporary/attempt_1537355349087_0001_r_000000_0
> {noformat}
> So what happens is that the AppMaster deleted the temporary directory {{/user/test/_temporary}} at 04:10:05,837, the reducer created it again a second later at 04:10:06,358. Eventually the reducer attempt attempt_1537355349087_0001_r_000000_0 was terminated.
> The solution is to wait until all reducers have been killed and only then proceed to the committing phase.
> The bug itself is probably not severe, because the MR job itself succeeds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org