You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by aliyeh saeedi <a1...@yahoo.com> on 2012/01/01 15:34:03 UTC

output files written by reducers




Hi
I have some questions and I would be really grateful to know the answer.
As I read in hadoop tutorial "the output files written by the Reducers are then left in HDFS for user use, either by another
MapReduce job, a separate program, for human inspection."

1- Does hadoop automatically use the content of the files written by reducers? I mean if 3 jobs are assigned to the hadoop, for example, and the 1st and 3rd job are the same, does hadoop do the 3rd job again or automatically use the results of first job? A more complicated Scenario is  as follows:
    A) 3 MapReduce jobs are assigned to hadoop
    B) hadoop after doing 3 MapReduce jobs return the final result 

    C) in next step, 2 other jobs are assigned to hadoop. Both are repetitive (hadoop have done them in step B)
Now, does hadoop automatically reuse the results of step A or do them again?

 
2-Are these files (files written by reducers) discarded? If so, when and how? 


3- How hadoop users can know about the address of these files (files written by reducers)?

Regards :-)

Re: output files written by reducers

Posted by Harsh J <ha...@cloudera.com>.

Aliyeh,

I do not fully understand your question. Why even write out files you do not want?

Like any other regular FS, HDFS does not automatically clean up your old files nor has a mechanism that sets its expiry date. Your processes or administrators should be able to build something over HDFS that cleans up 'older' files per your needs.

On 09-Jan-2012, at 6:58 PM, aliyeh saeedi wrote:

> 
> 
> Hi
> 
> 
> Are the files written by reducers discarded? If no, after a while we have a lot of files which are not useful and occupy disk space unduly. Is there any special parameter which can be set to save or discard these files?  
> 
> 
> Regards :-)
> 
> 
> 
>

output files written by reducers

Posted by aliyeh saeedi <a1...@yahoo.com>.



Hi

 Are the files written by reducers discarded? If no, after a while we have a lot of files which are not useful and occupy disk space unduly. Is there any special parameter which can be set to save or discard these files?  



Regards :-)

Re: output files written by reducers

Posted by Praveen Sripati <pr...@gmail.com>.

1- Does hadoop automatically use the content of the files written by
reducers?

No. If Job1 and Job2 are run in sequence, then the o/p of Job1 can be i/p
to Job2. This has to be done programatically.

2-Are these files (files written by reducers) discarded? If so, when and
how?

No, if the o/p of the reducers is dicarded, then there is no purpose of
running the Job.

3- How hadoop users can know about the address of these files (files
written by reducers)?

The source/destination are set on the InputFormat/OutputFormat while
defining the Job.

        FileInputFormat.addInputPath(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

Praveen

On Sun, Jan 1, 2012 at 8:04 PM, aliyeh saeedi <a1...@yahoo.com> wrote:

>
>
> Hi
> I have some questions and I would be really grateful to know the answer.
> As I read in hadoop tutorial "the output files written by the Reducers are
> then left in HDFS for user use, either by another
> MapReduce job, a separate program, for human inspection."
>
> 1- Does hadoop automatically use the content of the files written by
> reducers? I mean if 3 jobs are assigned to the hadoop, for example, and the
> 1st and 3rd job are the same, does hadoop do the 3rd job again or
> automatically use the results of first job? A more complicated Scenario is
> as follows:
>     A) 3 MapReduce jobs are assigned to hadoop
>     B) hadoop after doing 3 MapReduce jobs return the final result
>     C) in next step, 2 other jobs are assigned to hadoop. Both are
> repetitive (hadoop have done them in step B)
> Now, does hadoop automatically reuse the results of step A or do them
> again?
>
> 2-Are these files (files written by reducers) discarded? If so, when and
> how?
>
> 3- How hadoop users can know about the address of these files (files
> written by reducers)?
>
> Regards :-)
>
>
>