You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/02/03 21:59:32 UTC

[jira] Created: (HADOOP-3) Output directories are not cleaned up before the reduces run

Output directories are not cleaned up before the reduces run
------------------------------------------------------------

         Key: HADOOP-3
         URL: http://issues.apache.org/jira/browse/HADOOP-3
     Project: Hadoop
        Type: Bug
  Components: mapred  
    Reporter: Owen O'Malley
    Priority: Minor


The output directory for the reduces is not cleaned up and therefore if you can see left overs from previous runs, if they had more reduces. For example, if you run the application once with reduces=10 and then rerun with reduces=8, your output directory will have frag00000 to frag00009 with the first 8 fragments from the second run and the last 2 fragments from the first run.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Assigned: (HADOOP-3) Output directories are not cleaned up before the reduces run

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-3?page=all ]

Sameer Paranjpye reassigned HADOOP-3:
-------------------------------------

    Assign To: Owen O'Malley

> Output directories are not cleaned up before the reduces run
> ------------------------------------------------------------
>
>          Key: HADOOP-3
>          URL: http://issues.apache.org/jira/browse/HADOOP-3
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Minor
>  Attachments: clean-out-dir.patch
>
> The output directory for the reduces is not cleaned up and therefore if you can see left overs from previous runs, if they had more reduces. For example, if you run the application once with reduces=10 and then rerun with reduces=8, your output directory will have frag00000 to frag00009 with the first 8 fragments from the second run and the last 2 fragments from the first run.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Re: [jira] Commented: (HADOOP-3) Output directories are not cleaned up before the reduces run

Posted by Owen O'Malley <ow...@yahoo-inc.com>.
On Feb 10, 2006, at 10:04 AM, Doug Cutting (JIRA) wrote:

>     [  
> http://issues.apache.org/jira/browse/HADOOP-3? 
> page=comments#action_12365934 ]
>
> Doug Cutting commented on HADOOP-3:
> -----------------------------------
>
> An even safer way to fix this would be to have JobClient throw an  
> exception if the output directory already exists.  That way folks  
> won't inadvertantly overwrite things.

Jira seems to be broken and not taking comments.

What about having a clobber configuration variable  
(mapred.output.clobber?) that defaults to false?

-- Owen


[jira] Commented: (HADOOP-3) Output directories are not cleaned up before the reduces run

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HADOOP-3?page=comments#action_12365934 ] 

Doug Cutting commented on HADOOP-3:
-----------------------------------

An even safer way to fix this would be to have JobClient throw an exception if the output directory already exists.  That way folks won't inadvertantly overwrite things.

> Output directories are not cleaned up before the reduces run
> ------------------------------------------------------------
>
>          Key: HADOOP-3
>          URL: http://issues.apache.org/jira/browse/HADOOP-3
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Reporter: Owen O'Malley
>     Priority: Minor
>  Attachments: clean-out-dir.patch
>
> The output directory for the reduces is not cleaned up and therefore if you can see left overs from previous runs, if they had more reduces. For example, if you run the application once with reduces=10 and then rerun with reduces=8, your output directory will have frag00000 to frag00009 with the first 8 fragments from the second run and the last 2 fragments from the first run.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (HADOOP-3) Output directories are not cleaned up before the reduces run

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-3?page=all ]
     
Doug Cutting resolved HADOOP-3:
-------------------------------

    Resolution: Fixed

I just committed this.  I converted under_scored variable names to camelCase.

> Output directories are not cleaned up before the reduces run
> ------------------------------------------------------------
>
>          Key: HADOOP-3
>          URL: http://issues.apache.org/jira/browse/HADOOP-3
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Minor
>      Fix For: 0.1
>  Attachments: clean-out-dir.patch, noclobber.patch
>
> The output directory for the reduces is not cleaned up and therefore if you can see left overs from previous runs, if they had more reduces. For example, if you run the application once with reduces=10 and then rerun with reduces=8, your output directory will have frag00000 to frag00009 with the first 8 fragments from the second run and the last 2 fragments from the first run.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-3) Output directories are not cleaned up before the reduces run

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-3?page=all ]

Sameer Paranjpye updated HADOOP-3:
----------------------------------

    Fix Version: 0.1
        Version: 0.1

> Output directories are not cleaned up before the reduces run
> ------------------------------------------------------------
>
>          Key: HADOOP-3
>          URL: http://issues.apache.org/jira/browse/HADOOP-3
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Minor
>      Fix For: 0.1
>  Attachments: clean-out-dir.patch
>
> The output directory for the reduces is not cleaned up and therefore if you can see left overs from previous runs, if they had more reduces. For example, if you run the application once with reduces=10 and then rerun with reduces=8, your output directory will have frag00000 to frag00009 with the first 8 fragments from the second run and the last 2 fragments from the first run.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-3) Output directories are not cleaned up before the reduces run

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-3?page=all ]

Owen O'Malley updated HADOOP-3:
-------------------------------

    Attachment: noclobber.patch

Ok, this patch ensures that the output directory is set and does not exist.
If the application wants to clobber old data, they need to delete the files themselves.
I added the check for the output directory being set, because otherwise the job doesn't fail until
the reduces try to run. With the added check, they fail before they are submitted. 

I wasn't sure we wanted to support the no reduces case, but it was pretty easy to handle here by
not requiring an output directory.

> Output directories are not cleaned up before the reduces run
> ------------------------------------------------------------
>
>          Key: HADOOP-3
>          URL: http://issues.apache.org/jira/browse/HADOOP-3
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Versions: 0.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>     Priority: Minor
>      Fix For: 0.1
>  Attachments: clean-out-dir.patch, noclobber.patch
>
> The output directory for the reduces is not cleaned up and therefore if you can see left overs from previous runs, if they had more reduces. For example, if you run the application once with reduces=10 and then rerun with reduces=8, your output directory will have frag00000 to frag00009 with the first 8 fragments from the second run and the last 2 fragments from the first run.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Updated: (HADOOP-3) Output directories are not cleaned up before the reduces run

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HADOOP-3?page=all ]

Owen O'Malley updated HADOOP-3:
-------------------------------

    Attachment: clean-out-dir.patch

This patch makes the driver process delete the output directory before submitting the job.

> Output directories are not cleaned up before the reduces run
> ------------------------------------------------------------
>
>          Key: HADOOP-3
>          URL: http://issues.apache.org/jira/browse/HADOOP-3
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Reporter: Owen O'Malley
>     Priority: Minor
>  Attachments: clean-out-dir.patch
>
> The output directory for the reduces is not cleaned up and therefore if you can see left overs from previous runs, if they had more reduces. For example, if you run the application once with reduces=10 and then rerun with reduces=8, your output directory will have frag00000 to frag00009 with the first 8 fragments from the second run and the last 2 fragments from the first run.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira