You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Aaron Kimball (JIRA)" <ji...@apache.org> on 2009/09/11 03:36:00 UTC

[jira] Updated: (MAPREDUCE-972) distcp can timeout during rename operation to s3

     [ https://issues.apache.org/jira/browse/MAPREDUCE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-972:
------------------------------------

    Attachment: MAPREDUCE-972.patch

Attaching a patch which starts a background thread to increment mapper progress when the rename operation is running.

We benchmarked S3 copy performance at ~4 MB/sec, which means that files in the 3--5 GB size range may cause task timeouts during their renames into their final locations. This patch will fix this issue.

This patch was tested manually by running distcp to upload data to s3n and verifying that renames still worked as expected, and that log messages confirmed creation and destruction of the background progress thread.

> distcp can timeout during rename operation to s3
> ------------------------------------------------
>
>                 Key: MAPREDUCE-972
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-972
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: 0.20.1
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-972.patch
>
>
> rename() in S3 is implemented as copy + delete. The S3 copy operation can perform very slowly, which may cause task timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.