You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Aaron Kimball (JIRA)" <ji...@apache.org> on 2009/09/11 02:07:57 UTC

[jira] Created: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

distcp does not always remove distcp.tmp.dir
--------------------------------------------

                 Key: MAPREDUCE-971
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: distcp
            Reporter: Aaron Kimball
            Assignee: Aaron Kimball


Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772958#action_12772958 ] 

Hudson commented on MAPREDUCE-971:
----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #106 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/106/])
    . Document use of distcp when copying to s3, managing timeouts
in particular. Contributed by Aaron Kimball


> distcp does not always remove distcp.tmp.dir
> --------------------------------------------
>
>                 Key: MAPREDUCE-971
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756353#action_12756353 ] 

Todd Lipcon commented on MAPREDUCE-971:
---------------------------------------

+1, patch lgtm

> distcp does not always remove distcp.tmp.dir
> --------------------------------------------
>
>                 Key: MAPREDUCE-971
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-971:
------------------------------------

    Status: Patch Available  (was: Open)

> distcp does not always remove distcp.tmp.dir
> --------------------------------------------
>
>                 Key: MAPREDUCE-971
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

Posted by "gary murry (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758404#action_12758404 ] 

gary murry commented on MAPREDUCE-971:
--------------------------------------

Cool, thanks for the additional info.

> distcp does not always remove distcp.tmp.dir
> --------------------------------------------
>
>                 Key: MAPREDUCE-971
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

Posted by "gary murry (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758393#action_12758393 ] 

gary murry commented on MAPREDUCE-971:
--------------------------------------

It is good that this tested manually andit is appriciated that the manual test was outline here.  But why was no unit test added so that the fix can be verified automatically on future builds? 

> distcp does not always remove distcp.tmp.dir
> --------------------------------------------
>
>                 Key: MAPREDUCE-971
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753973#action_12753973 ] 

Hadoop QA commented on MAPREDUCE-971:
-------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419248/MAPREDUCE-971.patch
  against trunk revision 813585.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/26/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/26/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/26/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/26/console

This message is automatically generated.

> distcp does not always remove distcp.tmp.dir
> --------------------------------------------
>
>                 Key: MAPREDUCE-971
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773605#action_12773605 ] 

Hudson commented on MAPREDUCE-971:
----------------------------------

Integrated in Hadoop-Mapreduce-trunk #133 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/133/])
    

> distcp does not always remove distcp.tmp.dir
> --------------------------------------------
>
>                 Key: MAPREDUCE-971
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756664#action_12756664 ] 

Hudson commented on MAPREDUCE-971:
----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #46 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/46/])
    . distcp does not always remove distcp.tmp.dir. Contributed by Aaron Kimball.


> distcp does not always remove distcp.tmp.dir
> --------------------------------------------
>
>                 Key: MAPREDUCE-971
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758400#action_12758400 ] 

Aaron Kimball commented on MAPREDUCE-971:
-----------------------------------------

An automated unit test for an S3-based system would require hardcoding S3 access credentials and connecting to an S3 account (which is a for-pay resource).

> distcp does not always remove distcp.tmp.dir
> --------------------------------------------
>
>                 Key: MAPREDUCE-971
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated MAPREDUCE-971:
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Aaron!

> distcp does not always remove distcp.tmp.dir
> --------------------------------------------
>
>                 Key: MAPREDUCE-971
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-971) distcp does not always remove distcp.tmp.dir

Posted by "Aaron Kimball (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Kimball updated MAPREDUCE-971:
------------------------------------

    Attachment: MAPREDUCE-971.patch

This patch fixes the problem by explcitly creating the temp directory. File open operations in, e.g., hdfs, will auto-create the tmpdir. But in s3n, which expects an object with the name {{_somename_$folder$}}, this won't happen. As a result, the {{fullyDelete()}} call fails (silently) because the folder doesn't exist, even though there are objects with the tmpdir prefix in their object names.

I tested this patch manually by verifying temp dir creation during a distcp to s3n, and verifying that the temp dir object was removed at the end of the transfer.

> distcp does not always remove distcp.tmp.dir
> --------------------------------------------
>
>                 Key: MAPREDUCE-971
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-971
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Aaron Kimball
>            Assignee: Aaron Kimball
>         Attachments: MAPREDUCE-971.patch
>
>
> Sometimes distcp leaves behind its tmpdir when the target filesystem is s3n.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.