You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by aliyeh saeedi <a1...@yahoo.com> on 2012/01/29 07:05:02 UTC

reducers outputs

Hi

I want to save reducers outputs like other files in Hadoop. Does NameNode keep any information about them? How can I do this?
Or can I add a new component to Hadoop like NameNode and make JobTracker to consult with it too (I mean I want to make JobTracker to consult with NameNode AND myNewComponent both)?

Re: Fw: reducers outputs

Posted by Harsh J <ha...@cloudera.com>.
Aliyeh,

You may be complicating things here.

The HDFS and MapReduce are two separate components of Hadoop. HDFS
provides a distributed FileSystem, MapReduce provides a distributed
processing layer. They aren't glued.

A reducer creates an output file on a 'filesystem'. It does not know
nor care if its talking to HDFS or not, all it cares about is to run
the users' reducer functions, and persist the output to a filesystem
provided to it (may be HDFS, may be local, it does not matter to the
reducer who its talking to).

Have you gone over a regular tutorial of Hadoop to understand how
things work? Try taking a look at
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html.

For overriding output filenames, in case you are looking for something
other than "part-xxxxx"  names, the easiest way is to use
MultipleOutputs with your custom named output, documented here:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html

On Mon, Jan 30, 2012 at 11:39 AM, aliyeh saeedi <a1...@yahoo.com> wrote:
> I studied it, but I could not get the point. I mean if I save reducer's
> output with my own selected names, does NameNode behave with them like other
> files?
> regards.
>
> ________________________________
> From: Ashwanth Kumar <as...@googlemail.com>
>
> To: mapreduce-user@hadoop.apache.org; aliyeh saeedi <a1...@yahoo.com>
> Sent: Monday, 30 January 2012, 9:25
> Subject: Re: Fw: reducers outputs
>
> You should have a look at this -
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/FileOutputFormat.html
>
>  - Ashwanth Kumar
>
> On Mon, Jan 30, 2012 at 11:17 AM, aliyeh saeedi <a1...@yahoo.com> wrote:
>
>
>
>
> I want to save them with my own names, How NameNode will keep their names?
>
> ________________________________
> From: Joey Echeverria <jo...@cloudera.com>
> To: mapreduce-user@hadoop.apache.org; aliyeh saeedi <a1...@yahoo.com>
> Sent: Sunday, 29 January 2012, 17:10
> Subject: Re: reducers outputs
>
> Reduce output is normally stored in HDFS, just like your other files.
> Are you seeing different behavior?
>
> -Joey
>
> On Sun, Jan 29, 2012 at 1:05 AM, aliyeh saeedi <a1...@yahoo.com> wrote:
>> Hi
>> I want to save reducers outputs like other files in Hadoop. Does NameNode
>> keep any information about them? How can I do this?
>> Or can I add a new component to Hadoop like NameNode and make JobTracker
>> to
>> consult with it too (I mean I want to make JobTracker to consult with
>> NameNode AND myNewComponent both)?
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>
>
>
>
>
>
>



-- 
Harsh J
Customer Ops. Engineer, Cloudera

Re: reducers outputs

Posted by Arun C Murthy <ac...@hortonworks.com>.
The NameNode doesn't care how you wrote the file i.e. either via 'bin/hadoop dfs -put <>' or via a MR job.

Arun

On Jan 29, 2012, at 10:09 PM, aliyeh saeedi wrote:

> I studied it, but I could not get the point. I mean if I save reducer's output with my own selected names, does NameNode behave with them like other files?
> regards.
> 
> From: Ashwanth Kumar <as...@googlemail.com>
> To: mapreduce-user@hadoop.apache.org; aliyeh saeedi <a1...@yahoo.com> 
> Sent: Monday, 30 January 2012, 9:25
> Subject: Re: Fw: reducers outputs
> 
> You should have a look at this -  http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/FileOutputFormat.html 
> 
>  - Ashwanth Kumar
> 
> On Mon, Jan 30, 2012 at 11:17 AM, aliyeh saeedi <a1...@yahoo.com> wrote:
> 
> 
> 
> I want to save them with my own names, How NameNode will keep their names?
> 
> From: Joey Echeverria <jo...@cloudera.com>
> To: mapreduce-user@hadoop.apache.org; aliyeh saeedi <a1...@yahoo.com> 
> Sent: Sunday, 29 January 2012, 17:10
> Subject: Re: reducers outputs
> 
> Reduce output is normally stored in HDFS, just like your other files.
> Are you seeing different behavior?
> 
> -Joey
> 
> On Sun, Jan 29, 2012 at 1:05 AM, aliyeh saeedi <a1...@yahoo.com> wrote:
> > Hi
> > I want to save reducers outputs like other files in Hadoop. Does NameNode
> > keep any information about them? How can I do this?
> > Or can I add a new component to Hadoop like NameNode and make JobTracker to
> > consult with it too (I mean I want to make JobTracker to consult with
> > NameNode AND myNewComponent both)?
> 
> 
> 
> -- 
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
> 
> 
> 
> 
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Fw: reducers outputs

Posted by aliyeh saeedi <a1...@yahoo.com>.
I studied it, but I could not get the point. I mean if I save reducer's output with my own selected names, does NameNode behave with them like other files?
regards.



________________________________
 From: Ashwanth Kumar <as...@googlemail.com>
To: mapreduce-user@hadoop.apache.org; aliyeh saeedi <a1...@yahoo.com> 
Sent: Monday, 30 January 2012, 9:25
Subject: Re: Fw: reducers outputs
 

You should have a look at this -  http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/FileOutputFormat.html 

 - Ashwanth Kumar


On Mon, Jan 30, 2012 at 11:17 AM, aliyeh saeedi <a1...@yahoo.com> wrote:


>
>
> 
>
>I want to save them with my own names, How NameNode will keep their names?
>
>
>
>
>________________________________
> From: Joey Echeverria <jo...@cloudera.com>
>To: mapreduce-user@hadoop.apache.org; aliyeh saeedi <a1...@yahoo.com> 
>Sent: Sunday, 29 January 2012, 17:10
>Subject: Re: reducers outputs
> 
>
>Reduce output is normally stored in HDFS, just like your other files.
>Are you seeing
 different behavior?
>
>-Joey
>
>On Sun, Jan 29, 2012 at 1:05 AM, aliyeh saeedi <a1...@yahoo.com> wrote:
>> Hi
>> I want to save reducers outputs like other files in Hadoop. Does NameNode
>> keep any information about them? How can I do this?
>> Or can I add a new component to Hadoop like NameNode and make JobTracker to
>> consult with it too (I mean I want to make JobTracker to consult with
>> NameNode AND myNewComponent both)?
>
>
>
>-- 
>Joseph Echeverria
>Cloudera, Inc.
>443.305.9434
>
>
>
>
>

Re: Fw: reducers outputs

Posted by Ashwanth Kumar <as...@googlemail.com>.
You should have a look at this -
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/FileOutputFormat.html


 - Ashwanth Kumar

On Mon, Jan 30, 2012 at 11:17 AM, aliyeh saeedi <a1...@yahoo.com> wrote:

>
>
>
> I want to save them with my own names, How NameNode will keep their names?
>
>    ------------------------------
> *From:* Joey Echeverria <jo...@cloudera.com>
> *To:* mapreduce-user@hadoop.apache.org; aliyeh saeedi <a1...@yahoo.com>
>
> *Sent:* Sunday, 29 January 2012, 17:10
> *Subject:* Re: reducers outputs
>
> Reduce output is normally stored in HDFS, just like your other files.
> Are you seeing different behavior?
>
> -Joey
>
> On Sun, Jan 29, 2012 at 1:05 AM, aliyeh saeedi <a1...@yahoo.com>
> wrote:
> > Hi
> > I want to save reducers outputs like other files in Hadoop. Does NameNode
> > keep any information about them? How can I do this?
> > Or can I add a new component to Hadoop like NameNode and make JobTracker
> to
> > consult with it too (I mean I want to make JobTracker to consult with
> > NameNode AND myNewComponent both)?
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>
>
>
>
>

Re: Fw: reducers outputs

Posted by Ioan Eugen Stan <st...@gmail.com>.
Pe 30.01.2012 07:47, aliyeh saeedi a scris:
>
>
>
>
>
>
> I want to save them with my own names, How NameNode will keep their names?
>
>
>
> ________________________________
>   From: Joey Echeverria<jo...@cloudera.com>
> To: mapreduce-user@hadoop.apache.org; aliyeh saeedi<a1...@yahoo.com>
> Sent: Sunday, 29 January 2012, 17:10
> Subject: Re: reducers outputs
>
> Reduce output is normally stored in HDFS, just like your other files.
> Are you seeing
>   different behavior?
>
> -Joey
>
> On Sun, Jan 29, 2012 at 1:05 AM, aliyeh saeedi<a1...@yahoo.com>  wrote:
>> Hi
>> I want to save reducers outputs like other files in Hadoop. Does NameNode
>> keep any information about them? How can I do this?
>> Or can I add a new component to Hadoop like NameNode and make JobTracker to
>> consult with it too (I mean I want to make JobTracker to consult with
>> NameNode AND myNewComponent both)?
>
>
>

You aren't making a lot of sens, at least to me :). But if tou wish to 
save reducer output somehow different you will have to implement your 
own class that implements 
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/OutputFormat.html. 
It's easier to extend 
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/FileOutputFormat.html 
and override the parts that you need.

The framework will call methods from the above mentioned class to 
persist the data from reducers. You instruct the framework to use your 
class when you call 
job.setOutputFormatClass(SequenceFileOutputFormat.class); (this makes 
the output a SequenceFile).

Example to save under a different name:

public static class RenamedSequenceFile extends SequenceFileOutputFormat {

         @Override
         public Path getDefaultWorkFile(TaskAttemptContext context, 
String extension) throws IOException {
             FileOutputCommitter committer = (FileOutputCommitter) 
getOutputCommitter(context);
             return new Path(committer.getWorkPath(), "myBetterName");
         }
     }

This will output your reducer data into "myBetterName" file as key 
values pairs (behaviour inherited from SequanceFileOutputFormat).

I hope this helps,

-- 
Ioan Eugen Stan
http://ieugen.blogspot.com

Fw: reducers outputs

Posted by aliyeh saeedi <a1...@yahoo.com>.



 

I want to save them with my own names, How NameNode will keep their names?



________________________________
 From: Joey Echeverria <jo...@cloudera.com>
To: mapreduce-user@hadoop.apache.org; aliyeh saeedi <a1...@yahoo.com> 
Sent: Sunday, 29 January 2012, 17:10
Subject: Re: reducers outputs
 
Reduce output is normally stored in HDFS, just like your other files.
Are you seeing
 different behavior?

-Joey

On Sun, Jan 29, 2012 at 1:05 AM, aliyeh saeedi <a1...@yahoo.com> wrote:
> Hi
> I want to save reducers outputs like other files in Hadoop. Does NameNode
> keep any information about them? How can I do this?
> Or can I add a new component to Hadoop like NameNode and make JobTracker to
> consult with it too (I mean I want to make JobTracker to consult with
> NameNode AND myNewComponent both)?



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: reducers outputs

Posted by Joey Echeverria <jo...@cloudera.com>.
Reduce output is normally stored in HDFS, just like your other files.
Are you seeing different behavior?

-Joey

On Sun, Jan 29, 2012 at 1:05 AM, aliyeh saeedi <a1...@yahoo.com> wrote:
> Hi
> I want to save reducers outputs like other files in Hadoop. Does NameNode
> keep any information about them? How can I do this?
> Or can I add a new component to Hadoop like NameNode and make JobTracker to
> consult with it too (I mean I want to make JobTracker to consult with
> NameNode AND myNewComponent both)?



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434