You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Ian Nowland (JIRA)" <ji...@apache.org> on 2009/05/22 01:09:47 UTC
[jira] Created: (HADOOP-5889) Allow writing to output directories
that exist, as long as they are empty
Allow writing to output directories that exist, as long as they are empty
-------------------------------------------------------------------------
Key: HADOOP-5889
URL: https://issues.apache.org/jira/browse/HADOOP-5889
Project: Hadoop Core
Issue Type: Improvement
Components: fs
Affects Versions: 0.18.3
Reporter: Ian Nowland
The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.
At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5889) Allow writing to output directories
that exist, as long as they are empty
Posted by "Ian Nowland (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Nowland updated HADOOP-5889:
--------------------------------
Fix Version/s: 0.21.0
Status: Patch Available (was: Open)
> Allow writing to output directories that exist, as long as they are empty
> -------------------------------------------------------------------------
>
> Key: HADOOP-5889
> URL: https://issues.apache.org/jira/browse/HADOOP-5889
> Project: Hadoop Core
> Issue Type: Improvement
> Components: fs
> Affects Versions: 0.18.3
> Reporter: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5889-0.patch
>
>
> The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.
> At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5889) Allow writing to output directories
that exist, as long as they are empty
Posted by "Ian Nowland (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Nowland updated HADOOP-5889:
--------------------------------
Status: Open (was: Patch Available)
> Allow writing to output directories that exist, as long as they are empty
> -------------------------------------------------------------------------
>
> Key: HADOOP-5889
> URL: https://issues.apache.org/jira/browse/HADOOP-5889
> Project: Hadoop Core
> Issue Type: Improvement
> Components: fs
> Affects Versions: 0.18.3
> Reporter: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5889-0.patch
>
>
> The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.
> At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5889) Allow writing to output directories
that exist, as long as they are empty
Posted by "Ian Nowland (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718679#action_12718679 ]
Ian Nowland commented on HADOOP-5889:
-------------------------------------
Can do. One question I have is that there exists a org.apache.hadoop.mapred.TestFileOutputFormat but not a corresponding org.apache.hadoop.mapreduce.lib.output.TestFileOutputFormat. Should I copy over the existing test to the new location? Also, should I make my source change in both the old mapred as well as the newer mapreduce files?
> Allow writing to output directories that exist, as long as they are empty
> -------------------------------------------------------------------------
>
> Key: HADOOP-5889
> URL: https://issues.apache.org/jira/browse/HADOOP-5889
> Project: Hadoop Core
> Issue Type: Improvement
> Components: fs
> Affects Versions: 0.18.3
> Reporter: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5889-0.patch
>
>
> The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.
> At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5889) Allow writing to output directories
that exist, as long as they are empty
Posted by "Tom White (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718574#action_12718574 ]
Tom White commented on HADOOP-5889:
-----------------------------------
This looks good to me. Would it be possible to have a unit test?
> Allow writing to output directories that exist, as long as they are empty
> -------------------------------------------------------------------------
>
> Key: HADOOP-5889
> URL: https://issues.apache.org/jira/browse/HADOOP-5889
> Project: Hadoop Core
> Issue Type: Improvement
> Components: fs
> Affects Versions: 0.18.3
> Reporter: Ian Nowland
> Fix For: 0.21.0
>
> Attachments: HADOOP-5889-0.patch
>
>
> The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.
> At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HADOOP-5889) Allow writing to output directories
that exist, as long as they are empty
Posted by "Ian Nowland (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Nowland updated HADOOP-5889:
--------------------------------
Attachment: HADOOP-5889-0.patch
Simple patch with additional check to not throw if existing directory is empty.
> Allow writing to output directories that exist, as long as they are empty
> -------------------------------------------------------------------------
>
> Key: HADOOP-5889
> URL: https://issues.apache.org/jira/browse/HADOOP-5889
> Project: Hadoop Core
> Issue Type: Improvement
> Components: fs
> Affects Versions: 0.18.3
> Reporter: Ian Nowland
> Attachments: HADOOP-5889-0.patch
>
>
> The current behavior in FileOutputFormat.checkOutputSpecs is to fail if the path specified by mapred.output.dir exists at the start of the job. This is to protect from accidentally overwriting existing data. There seems no harm then in slightly relaxing this check to allow the case for the output to exist if it is an empty directory.
> At a minimum this would allow outputting to the root of S3N buckets, which is currently impossible (https://issues.apache.org/jira/browse/HADOOP-5805).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.