You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org> on 2008/08/12 21:59:44 UTC

[jira] Created: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

DistCp should support an option for deleting non-existing files.
----------------------------------------------------------------

                 Key: HADOOP-3939
                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
             Project: Hadoop Core
          Issue Type: New Feature
          Components: tools/distcp
            Reporter: Tsz Wo (Nicholas), SZE


One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

    Attachment: 3939_20080829.patch

3939_20080829.patch: fixed a bug for path checking.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636648#action_12636648 ] 

Hudson commented on HADOOP-3939:
--------------------------------

Integrated in Hadoop-trunk #622 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/622/])
    

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626168#action_12626168 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3939:
------------------------------------------------

3939_20080826.patch only changed DistCp and fixed a bug in FileStatus.hashCode().  The unit tests failed are not related.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

    Attachment: 3939_20080829b_0.18+3873_20080811b_0.18.patch

3939_20080829b_0.18+3873_20080811b_0.18.patch: for 0.18.  It also includes HADOOP-3873.  This patch won't be committed.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch, 3939_20080829b_0.18+3873_20080811b_0.18.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

    Status: Patch Available  (was: Open)

submit again.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621981#action_12621981 ] 

Koji Noguchi commented on HADOOP-3939:
--------------------------------------

I can see users mis-using this feature and deleting some of their important files. 
Can we use Trash if it's enabled ?


> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

    Attachment: 3939_20080826.patch

3939_20080826.patch: added a test.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

    Attachment: 3939_20080829b.patch

3939_20080829b.patch: updated the new unit test.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

    Attachment: 3939_20080828.patch

3939_20080828.patch: incorporated all comments from Chris. 

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

        Assignee: Tsz Wo (Nicholas), SZE
    Release Note: Added a new optopm -delete to DistCp so that if the files/directories exist in dst but not in src will be deleted.  It uses FsShell to do delete, so that it will use trash if  the trash is enable.
          Status: Patch Available  (was: Open)

Passed test-patch and all tests locally.  Submitting ...

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627518#action_12627518 ] 

Hadoop QA commented on HADOOP-3939:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12389205/3939_20080829b.patch
  against trunk revision 690641.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 4 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3150/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3150/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3150/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3150/console

This message is automatically generated.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

    Status: Open  (was: Patch Available)

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

    Attachment: 3939_20080825b.patch

3939_20080825b.patch: fixed some bugs.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627143#action_12627143 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3939:
------------------------------------------------

Tested locally.  3939_20080829b.patch is ready to be committed.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-3939:
----------------------------------

    Fix Version/s: 0.19.0
     Hadoop Flags: [Reviewed]

+1

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

    Status: Patch Available  (was: Open)

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

    Attachment: 3939_20080825.patch

3939_20080825.patch: first version.  Need some tests

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>         Attachments: 3939_20080825.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

    Status: Open  (was: Patch Available)

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Douglas updated HADOOP-3939:
----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thanks Nicholas

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626798#action_12626798 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3939:
------------------------------------------------

> Would it make sense to require either -update or -overwrite if -delete is specified?

We should enforce that.

> The fix to FileStatus makes sense, but when is the Path null?

I hit this when creating a FileStatus by the default constructor and then put is in some data structure (I forgot which data structure).  The current implementation does not need to this operation.  So I will revert this change.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12627653#action_12627653 ] 

Hudson commented on HADOOP-3939:
--------------------------------

Integrated in Hadoop-trunk #590 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/590/])

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12625997#action_12625997 ] 

Hadoop QA commented on HADOOP-3939:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12388941/3939_20080826.patch
  against trunk revision 689363.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 4 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3117/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3117/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3117/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3117/console

This message is automatically generated.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated HADOOP-3939:
-------------------------------------------

    Release Note: Added a new option -delete to DistCp so that if the files/directories exist in dst but not in src will be deleted.  It uses FsShell to do delete, so that it will use trash if  the trash is enable.  (was: Added a new optopm -delete to DistCp so that if the files/directories exist in dst but not in src will be deleted.  It uses FsShell to do delete, so that it will use trash if  the trash is enable.)

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch, 3939_20080829.patch, 3939_20080829b.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Chris Douglas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626739#action_12626739 ] 

Chris Douglas commented on HADOOP-3939:
---------------------------------------

* Would it make sense to require either \-update or \-overwrite if \-delete is specified? Without either of these options, the semantics are a little confusing. For example:
** In this case, the destination doesn't exist. Everything that isn't the source is deleted, which seems reasonable.
{noformat}
$ bin/hadoop fs -ls a b
Found 2 items
-rw-r--r--   1 someuser somegroup      92934 2008-08-11 21:42 /user/someuser/a/part-00000
Found 4 items
-rw-r--r--   1 someuser somegroup  105177784 2008-08-28 11:46 /user/someuser/b/part-00000
-rw-r--r--   1 someuser somegroup  105177884 2008-08-28 11:46 /user/someuser/b/part-00001
-rw-r--r--   1 someuser somegroup  105177754 2008-08-28 11:46 /user/someuser/b/part-00002
$ bin/hadoop distcp -delete hdfs://host:8020/user/someuser/a hdfs://host:8020/user/someuser/b
08/08/28 11:51:18 INFO tools.DistCp: srcPaths=[hdfs://host:8020/user/someuser/a]
08/08/28 11:51:18 INFO tools.DistCp: destPath=hdfs://host:8020/user/someuser/b
Deleted hdfs://host/user/someuser/b/part-00000
Deleted hdfs://host/user/someuser/b/part-00001
Deleted hdfs://host/user/someuser/b/part-00002
[snip]
$ bin/hadoop fs -ls a b
Found 2 items
-rw-r--r--   1 someuser somegroup      92934 2008-08-11 21:42 /user/someuser/a/part-00000
Found 2 items
drwxr-xr-x   - someuser somegroup          0 2008-08-28 11:51 /user/someuser/b/a
{noformat}
** Here, the destination does exist, but it is deleted anyway, as though \-overwrite were specified.
{noformat}
$ bin/hadoop fs -lsr a b
-rw-r--r--   1 someuser somegroup      92934 2008-08-11 21:42 /user/someuser/a/part-00000
-rw-r--r--   1 someuser somegroup  105177784 2008-08-28 11:51 /user/someuser/b/part-00000
-rw-r--r--   1 someuser somegroup  105177884 2008-08-28 11:51 /user/someuser/b/part-00001
-rw-r--r--   1 someuser somegroup  105177754 2008-08-28 11:51 /user/someuser/b/part-00002
drwxr-xr-x   - someuser somegroup          0 2008-08-28 13:34 /user/someuser/b/a
-rw-r--r--   1 someuser somegroup  105177784 2008-08-28 13:34 /user/someuser/b/a/part-00000
$ bin/hadoop distcp -delete hdfs://host:8020/user/someuser/a hdfs://host:8020/user/someuser/b
08/08/28 13:35:14 INFO tools.DistCp: srcPaths=[hdfs://host:8020/user/someuser/a]
08/08/28 13:35:14 INFO tools.DistCp: destPath=hdfs://host:8020/user/someuser/b
Deleted hdfs://host:8020/user/someuser/b/part-00000
Deleted hdfs://host:8020/user/someuser/b/part-00001
Deleted hdfs://host:8020/user/someuser/b/part-00002
Deleted hdfs://host:8020/user/someuser/b/a
[snip]
$ bin/hadoop fs -lsr a b
-rw-r--r--   1 someuser somegroup      92934 2008-08-11 21:42 /user/someuser/a/part-00000
drwxr-xr-x   - someuser somegroup          0 2008-08-28 13:35 /user/someuser/b/a
-rw-r--r--   1 someuser somegroup      92934 2008-08-28 13:35 /user/someuser/b/a/part-00000
{noformat}

Adding this dependency would also help prevent casual errors and potentially serious mistakes if the Trash is disabled.
* It might help to always add a message about FsShell failing, and set the cause rather than:
{noformat}
+            } catch(Exception e) {
+              throw e instanceof IOException? (IOException)e: new IOException(e);
+            }
{noformat}
* When \-delete is specified, the client is doing a lot of work to recursively list the destination, then to delete individual files there. In the future it might make sense to leave it to the maps to delete entries, since the source list is sorted. The client (or a reduce) would have to do some work on the boundaries, but it should scale well. The current patch is clearer given distcp's current organization, though.
* The fix to FileStatus makes sense, but when is the Path null?

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622002#action_12622002 ] 

Tsz Wo (Nicholas), SZE commented on HADOOP-3939:
------------------------------------------------

> Can we use Trash if it's enabled ? 

+1  I think this is a good idea.  It can be done by re-using the codes in FsShell.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-3939) DistCp should support an option for deleting non-existing files.

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-3939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626864#action_12626864 ] 

Hadoop QA commented on HADOOP-3939:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12389133/3939_20080828.patch
  against trunk revision 690096.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 4 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3141/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3141/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3141/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3141/console

This message is automatically generated.

> DistCp should support an option for deleting non-existing files.
> ----------------------------------------------------------------
>
>                 Key: HADOOP-3939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3939
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: tools/distcp
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.19.0
>
>         Attachments: 3939_20080825.patch, 3939_20080825b.patch, 3939_20080826.patch, 3939_20080828.patch
>
>
> One use case of DistCp is to sync two directories.  Currently, DistCp has an -update option for overwriting dst files if src is different from dst.  However, it is not enough for sync.  If there are some files in dst but not exist in src, there is no easy way to delete them.  We should add a new option, say -delete, so that DistCp will delete the non-existing in dst.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.