You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2007/05/22 20:49:16 UTC

[jira] Created: (HADOOP-1416) speculative reduce should touch output files only through OutputFormat

speculative reduce should touch output files only through OutputFormat
----------------------------------------------------------------------

                 Key: HADOOP-1416
                 URL: https://issues.apache.org/jira/browse/HADOOP-1416
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
            Reporter: Doug Cutting


HADOOP-1127 introduced speculative reduce.  This was implemented by having the MapReduce kernel directly manipulate a job's output files.  This is inconsistent with the architecture of the InputFormat and OutputFormat interfaces.  The kernel should never directly operate on job input or output, always instead deferring to these interfaces.

To correct this, we will need to add some new methods to OutputFormat, something like:

/** rename output generated by getRecordWriter(job, tempName) */
void completeOutput(JobConf job, String tempName, String finalName);

/** cleanup output generated by getRecordWriter(job, tempName).  called for unused outputs. */
void cleanupOutput(JobConf job, String tempName);

These should be implemented in OutputFormatBase, which should be renamed FileOutputFormat.

To prevent this happening again, we should also move JobConf#getInputPath(), #setInputPath(), #getOutputPath(), and #setOutputPath() to static methods on FileInputFormat and FileOutputFormat, since these methods are specific to jobs with file inputs and outputs (and not, e.g., HBase tables).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1416) speculative reduce should touch output files only through OutputFormat

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12498301 ] 

Doug Cutting commented on HADOOP-1416:
--------------------------------------

Arun notes in HADOOP-1226 that this may also need to be addressed when num.reduce.tasks is zero and final output is written directly by map tasks.  With speculative execution, map tasks should then use the same output protocol as reduce tasks.

> speculative reduce should touch output files only through OutputFormat
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-1416
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1416
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Doug Cutting
>
> HADOOP-1127 introduced speculative reduce.  This was implemented by having the MapReduce kernel directly manipulate a job's output files.  This is inconsistent with the architecture of the InputFormat and OutputFormat interfaces.  The kernel should never directly operate on job input or output, always instead deferring to these interfaces.
> To correct this, we will need to add some new methods to OutputFormat, something like:
> /** rename output generated by getRecordWriter(job, tempName) */
> void completeOutput(JobConf job, String tempName, String finalName);
> /** cleanup output generated by getRecordWriter(job, tempName).  called for unused outputs. */
> void cleanupOutput(JobConf job, String tempName);
> These should be implemented in OutputFormatBase, which should be renamed FileOutputFormat.
> To prevent this happening again, we should also move JobConf#getInputPath(), #setInputPath(), #getOutputPath(), and #setOutputPath() to static methods on FileInputFormat and FileOutputFormat, since these methods are specific to jobs with file inputs and outputs (and not, e.g., HBase tables).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.