You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Ian Nowland (JIRA)" <ji...@apache.org> on 2009/05/22 01:09:47 UTC

[jira] Created: (HADOOP-5889) Allow writing to output directories that exist, as long as they are empty

Allow writing to output directories that exist, as long as they are empty
-------------------------------------------------------------------------

                 Key: HADOOP-5889
                 URL: https://issues.apache.org/jira/browse/HADOOP-5889
             Project: Hadoop Core
          Issue Type: Improvement
          Components: fs
    Affects Versions: 0.18.3
            Reporter: Ian Nowland


The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.

At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5889) Allow writing to output directories that exist, as long as they are empty

Posted by "Ian Nowland (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Nowland updated HADOOP-5889:
--------------------------------

    Fix Version/s: 0.21.0
           Status: Patch Available  (was: Open)

> Allow writing to output directories that exist, as long as they are empty
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-5889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5889
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Ian Nowland
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5889-0.patch
>
>
> The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.
> At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5889) Allow writing to output directories that exist, as long as they are empty

Posted by "Ian Nowland (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Nowland updated HADOOP-5889:
--------------------------------

    Status: Open  (was: Patch Available)

> Allow writing to output directories that exist, as long as they are empty
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-5889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5889
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Ian Nowland
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5889-0.patch
>
>
> The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.
> At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5889) Allow writing to output directories that exist, as long as they are empty

Posted by "Ian Nowland (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718679#action_12718679 ] 

Ian Nowland commented on HADOOP-5889:
-------------------------------------

Can do. One question I have is that there exists a org.apache.hadoop.mapred.TestFileOutputFormat but not a corresponding org.apache.hadoop.mapreduce.lib.output.TestFileOutputFormat. Should I copy over the existing test to the new location? Also, should I make my source change in both the old mapred as well as the newer mapreduce  files?

> Allow writing to output directories that exist, as long as they are empty
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-5889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5889
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Ian Nowland
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5889-0.patch
>
>
> The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.
> At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5889) Allow writing to output directories that exist, as long as they are empty

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718574#action_12718574 ] 

Tom White commented on HADOOP-5889:
-----------------------------------

This looks good to me. Would it be possible to have a unit test?

> Allow writing to output directories that exist, as long as they are empty
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-5889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5889
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Ian Nowland
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-5889-0.patch
>
>
> The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.
> At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-5889) Allow writing to output directories that exist, as long as they are empty

Posted by "Ian Nowland (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Nowland updated HADOOP-5889:
--------------------------------

    Attachment: HADOOP-5889-0.patch

Simple patch with additional check to not throw if existing directory is empty.

> Allow writing to output directories that exist, as long as they are empty
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-5889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5889
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.18.3
>            Reporter: Ian Nowland
>         Attachments: HADOOP-5889-0.patch
>
>
> The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.
> At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.