You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Scott Oaks (Jira)" <ji...@apache.org> on 2021/04/19 22:39:00 UTC

[jira] [Updated] (MAPREDUCE-7337) Task fails while deleting spill files on slow disk

     [ https://issues.apache.org/jira/browse/MAPREDUCE-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Oaks updated MAPREDUCE-7337:
----------------------------------
    Summary: Task fails while deleting spill files on slow disk  (was: Task files while deleting spill files on slow disk)

> Task fails while deleting spill files on slow disk
> --------------------------------------------------
>
>                 Key: MAPREDUCE-7337
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7337
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: performance
>            Reporter: Scott Oaks
>            Priority: Minor
>
> We sometimes have tasks fail when deleting spill files in this loop (line 2005 of MapTask.java):
> {code:java}
> for(int i = 0; i < numSpills; i++) {
>   rfs.delete(filename[i],true);
> }{code}
> During this loop, there is no communication back to the master server, and hence if the loop takes too long, the master server assumes the child has timed out and tells the nodeagent to kill the yarn child.
> Typically this is linked to storage issues, and we've seen it most often due to an underlying bug in the filesystem (where there is contention in the filesystem delete path when deleting several files). But while there are usually underlying issues, it still wouldn't hurt to mark progress in the task during this loop periodically.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org