You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by ilyal levin <ni...@gmail.com> on 2011/09/05 17:49:07 UTC

How to Create an effective chained MapReduce program.

Hi

I'm trying to write a chained mapreduce program. i'm doing so with a simple
loop where in each iteration i

create a job ,execute it and every time the current job's output is the next
job's input.

how can i configure the outputFormat of the current job and the inputFormat
of the next job so that

i will not use the TextInputFormat (TextOutputFormat), because if i do use
it, i need to parse the input file in the Map function?

i.e if possible i want the next job to "consider" the input file as
<key,value> and not plain Text.

Thanks a lot.

Re: How to Create an effective chained MapReduce program.

Posted by Joey Echeverria <jo...@cloudera.com>.

Why do you need to see the intermediate data as text?

What are the types of your key and values?

-Joey
On Sep 5, 2011 6:54 PM, "ilyal levin" <ni...@gmail.com> wrote:
> o.k , so now i'm using SequenceFileInputFormat and
SequenceFileOutputFormat
> and it works fine but the output of the reducer is
> now a binary file (not txt) so i can't understand the data. how can i
solve
> this? i need the data (in txt form ) of the Intermediate stages in the
> chain.
>
> Thanks
>
> On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <ni...@gmail.com> wrote:
>
>> Thanks for the help.
>>
>>
>> On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <ro...@ucdavis.edu> wrote:
>>
>>> The binary file will allow you to pass the output from the first reducer
>>> to the second mapper. For example, if you outputed Text, IntWritable
from
>>> the first one in SequenceFileOutputFormat, then you are able to retrieve
>>> Text, IntWritable input at the head of the second mapper. The idea of
>>> chaining is that you know what kind of output the first reducer is going
to
>>> give already, and that you want to perform some secondary operation on
it.
>>>
>>> One last thing on chaining jobs: it's often worth looking to see if you
>>> can consolidate all of your separate map and reduce tasks into a single
>>> map/reduce operation. There are many situations where it is more
intuitive
>>> to write a number of map/reduce operations and chain them together, but
more
>>> efficient to have just a single operation.
>>>
>>>
>>>
>>> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <nipponilyal@gmail.com
>wrote:
>>>
>>>> Thanks for the reply.
>>>> I tried it but it creates a binary file which i can not understand (i
>>>> need the result of the first job).
>>>> The other thing is how can i use this file in the next chained mapper?
>>>> i.e how can i retrieve the keys and the values in the map function?
>>>>
>>>>
>>>> Ilyal
>>>>
>>>>
>>>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <joey@cloudera.com
>wrote:
>>>>
>>>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat?
>>>>>
>>>>> -Joey
>>>>>
>>>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <ni...@gmail.com>
>>>>> wrote:
>>>>> > Hi
>>>>> > I'm trying to write a chained mapreduce program. i'm doing so with a
>>>>> simple
>>>>> > loop where in each iteration i
>>>>> > create a job ,execute it and every time the current job's output is
>>>>> the next
>>>>> > job's input.
>>>>> > how can i configure the outputFormat of the current job and the
>>>>> inputFormat
>>>>> > of the next job so that
>>>>> > i will not use the TextInputFormat (TextOutputFormat), because if i
do
>>>>> use
>>>>> > it, i need to parse the input file in the Map function?
>>>>> > i.e if possible i want the next job to "consider" the input file as
>>>>> > <key,value> and not plain Text.
>>>>> > Thanks a lot.
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Joseph Echeverria
>>>>> Cloudera, Inc.
>>>>> 443.305.9434
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Roger Chen
>>> UC Davis Genome Center
>>>
>>
>>

Re: How to Create an effective chained MapReduce program.

Posted by ilyal levin <ni...@gmail.com>.

I need it because the intermediate data is also part of the solution to the
problem my algorithm solve.
i somehow need to log this information.
The key is Text and the value is ArrayWritable (TextArrayWritable).



On Tue, Sep 6, 2011 at 8:57 AM, Niels Basjes <ni...@basj.es> wrote:

> Hi,
>
> In the past i've had the same situation where I needed the data for
> debugging. Back then I chose to create a second job with simply
> SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally
> TextOutputFormat.
>
> In my situation that worked great for my purpose.
>
> --
> Met vriendelijke groet,
> Niels Basjes
>
> Op 6 sep. 2011 01:54 schreef "ilyal levin" <ni...@gmail.com> het
> volgende:
>
> >
> > o.k , so now i'm using SequenceFileInputFormat
> and SequenceFileOutputFormat and it works fine but the output of the reducer
> is
> > now a binary file (not txt) so i can't understand the data. how can i
> solve this? i need the data (in txt form ) of the Intermediate stages in the
> chain.
> >
> > Thanks
> >
> >
> > On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <ni...@gmail.com>
> wrote:
> >>
> >> Thanks for the help.
> >>
> >>
> >> On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <ro...@ucdavis.edu>
> wrote:
> >>>
> >>> The binary file will allow you to pass the output from the first
> reducer to the second mapper. For example, if you outputed Text, IntWritable
> from the first one in SequenceFileOutputFormat, then you are able to
> retrieve Text, IntWritable input at the head of the second mapper. The idea
> of chaining is that you know what kind of output the first reducer is going
> to give already, and that you want to perform some secondary operation on
> it.
> >>>
> >>> One last thing on chaining jobs: it's often worth looking to see if you
> can consolidate all of your separate map and reduce tasks into a single
> map/reduce operation. There are many situations where it is more intuitive
> to write a number of map/reduce operations and chain them together, but more
> efficient to have just a single operation.
> >>>
> >>>
> >>>
> >>> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <ni...@gmail.com>
> wrote:
> >>>>
> >>>> Thanks for the reply.
> >>>> I tried it but it creates a binary file which i can not understand (i
> need the result of the first job).
> >>>> The other thing is how can i use this file in the next chained mapper?
> i.e how can i retrieve the keys and the values in the map function?
> >>>>
> >>>>
> >>>> Ilyal
> >>>>
> >>>>
> >>>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <jo...@cloudera.com>
> wrote:
> >>>>>
> >>>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat?
> >>>>>
> >>>>> -Joey
> >>>>>
> >>>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <ni...@gmail.com>
> wrote:
> >>>>> > Hi
> >>>>> > I'm trying to write a chained mapreduce program. i'm doing so with
> a simple
> >>>>> > loop where in each iteration i
> >>>>> > create a job ,execute it and every time the current job's output is
> the next
> >>>>> > job's input.
> >>>>> > how can i configure the outputFormat of the current job and the
> inputFormat
> >>>>> > of the next job so that
> >>>>> > i will not use the TextInputFormat (TextOutputFormat), because if i
> do use
> >>>>> > it, i need to parse the input file in the Map function?
> >>>>> > i.e if possible i want the next job to "consider" the input file as
> >>>>> > <key,value> and not plain Text.
> >>>>> > Thanks a lot.
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Joseph Echeverria
> >>>>> Cloudera, Inc.
> >>>>> 443.305.9434
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Roger Chen
> >>> UC Davis Genome Center
> >>
> >>
> >
>
>

Re: How to Create an effective chained MapReduce program.

Posted by ilyal levin <ni...@gmail.com>.

* open a SequenceFile.Reader on the sequence file
* in a loop, call next(key,val) on the reader to read the next key/val pair
in the file (see: http://hadoop.apache.org/**common/docs/current/api/org/**
apache/hadoop/io/SequenceFile.**Reader.html#next(org.apache.**
hadoop.io.Writable,%20org.**apache.hadoop.io.Writable)<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html#next(org.apache.hadoop.io.Writable,%20org.apache.hadoop.io.Writable)>
 )
* write code to format the key & val into whatever appropriate format you
want, and write them to the console
* when next(key,val) returns false, exit the loop

did you have some kind of a permission problems when you tried reading the
file with SequenceFile.Reader ?
i keep getting this error:

    Exception in thread "main" java.io.FileNotFoundException:
C:\cygwin\home\closedItemSet\output0 (Access is denied)

and just can't work around it.
Thanks


On Thu, Sep 8, 2011 at 1:52 AM, Lance Norskog <go...@gmail.com> wrote:

> You might find it more easy to understand this if you use one of the
> low-level job-scripting languages like Oozie or Hamake. They put the whole
> assemblage of stuff into one file.
>
> On Wed, Sep 7, 2011 at 3:17 PM, David Rosenstrauch <da...@darose.net>wrote:
>
>> * open a SequenceFile.Reader on the sequence file
>> * in a loop, call next(key,val) on the reader to read the next key/val
>> pair in the file (see: http://hadoop.apache.org/**
>> common/docs/current/api/org/**apache/hadoop/io/SequenceFile.**
>> Reader.html#next(org.apache.**hadoop.io.Writable,%20org.**
>> apache.hadoop.io.Writable)<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html#next%28org.apache.hadoop.io.Writable,%20org.apache.hadoop.io.Writable%29>)
>> * write code to format the key & val into whatever appropriate format you
>> want, and write them to the console
>> * when next(key,val) returns false, exit the loop
>>
>> HTH,
>>
>> DR
>>
>>
>> On 09/07/2011 06:10 PM, ilyal levin wrote:
>>
>>> Can you be more specific on how to perform this. In general is there a
>>> way
>>> to convert the binary files i have to text files?
>>>
>>>
>>>
>>> On Tue, Sep 6, 2011 at 11:26 PM, David Rosenstrauch<darose@darose.net**
>>> >wrote:
>>>
>>>  On 09/06/2011 01:57 AM, Niels Basjes wrote:
>>>>
>>>>  Hi,
>>>>>
>>>>> In the past i've had the same situation where I needed the data for
>>>>> debugging. Back then I chose to create a second job with simply
>>>>> SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally
>>>>> TextOutputFormat.
>>>>>
>>>>> In my situation that worked great for my purpose.
>>>>>
>>>>>
>>>> I did similar at my last job, but rather than writing a 2nd map/reduce
>>>> job
>>>> for this, we just wrote a simple command line app that used the Hadoop
>>>> Java
>>>> API to dump the contents of the binary file as text (JSON) to the
>>>> console.
>>>>
>>>> HTH,
>>>>
>>>> DR
>>>>
>>>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
>

Re: How to Create an effective chained MapReduce program.

Posted by Lance Norskog <go...@gmail.com>.

You might find it more easy to understand this if you use one of the
low-level job-scripting languages like Oozie or Hamake. They put the whole
assemblage of stuff into one file.

On Wed, Sep 7, 2011 at 3:17 PM, David Rosenstrauch <da...@darose.net>wrote:

> * open a SequenceFile.Reader on the sequence file
> * in a loop, call next(key,val) on the reader to read the next key/val pair
> in the file (see: http://hadoop.apache.org/**common/docs/current/api/org/*
> *apache/hadoop/io/SequenceFile.**Reader.html#next(org.apache.**
> hadoop.io.Writable,%20org.**apache.hadoop.io.Writable)<http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html#next%28org.apache.hadoop.io.Writable,%20org.apache.hadoop.io.Writable%29>)
> * write code to format the key & val into whatever appropriate format you
> want, and write them to the console
> * when next(key,val) returns false, exit the loop
>
> HTH,
>
> DR
>
>
> On 09/07/2011 06:10 PM, ilyal levin wrote:
>
>> Can you be more specific on how to perform this. In general is there a way
>> to convert the binary files i have to text files?
>>
>>
>>
>> On Tue, Sep 6, 2011 at 11:26 PM, David Rosenstrauch<darose@darose.net**
>> >wrote:
>>
>>  On 09/06/2011 01:57 AM, Niels Basjes wrote:
>>>
>>>  Hi,
>>>>
>>>> In the past i've had the same situation where I needed the data for
>>>> debugging. Back then I chose to create a second job with simply
>>>> SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally
>>>> TextOutputFormat.
>>>>
>>>> In my situation that worked great for my purpose.
>>>>
>>>>
>>> I did similar at my last job, but rather than writing a 2nd map/reduce
>>> job
>>> for this, we just wrote a simple command line app that used the Hadoop
>>> Java
>>> API to dump the contents of the binary file as text (JSON) to the
>>> console.
>>>
>>> HTH,
>>>
>>> DR
>>>
>>


-- 
Lance Norskog
goksron@gmail.com

Re: How to Create an effective chained MapReduce program.

Posted by David Rosenstrauch <da...@darose.net>.

* open a SequenceFile.Reader on the sequence file
* in a loop, call next(key,val) on the reader to read the next key/val 
pair in the file (see: 
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html#next(org.apache.hadoop.io.Writable,%20org.apache.hadoop.io.Writable) 
)
* write code to format the key & val into whatever appropriate format 
you want, and write them to the console
* when next(key,val) returns false, exit the loop

HTH,

DR

On 09/07/2011 06:10 PM, ilyal levin wrote:
> Can you be more specific on how to perform this. In general is there a way
> to convert the binary files i have to text files?
>
>
>
> On Tue, Sep 6, 2011 at 11:26 PM, David Rosenstrauch<da...@darose.net>wrote:
>
>> On 09/06/2011 01:57 AM, Niels Basjes wrote:
>>
>>> Hi,
>>>
>>> In the past i've had the same situation where I needed the data for
>>> debugging. Back then I chose to create a second job with simply
>>> SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally
>>> TextOutputFormat.
>>>
>>> In my situation that worked great for my purpose.
>>>
>>
>> I did similar at my last job, but rather than writing a 2nd map/reduce job
>> for this, we just wrote a simple command line app that used the Hadoop Java
>> API to dump the contents of the binary file as text (JSON) to the console.
>>
>> HTH,
>>
>> DR

Re: How to Create an effective chained MapReduce program.

Posted by ilyal levin <ni...@gmail.com>.

Can you be more specific on how to perform this. In general is there a way
to convert the binary files i have to text files?



On Tue, Sep 6, 2011 at 11:26 PM, David Rosenstrauch <da...@darose.net>wrote:

> On 09/06/2011 01:57 AM, Niels Basjes wrote:
>
>> Hi,
>>
>> In the past i've had the same situation where I needed the data for
>> debugging. Back then I chose to create a second job with simply
>> SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally
>> TextOutputFormat.
>>
>> In my situation that worked great for my purpose.
>>
>
> I did similar at my last job, but rather than writing a 2nd map/reduce job
> for this, we just wrote a simple command line app that used the Hadoop Java
> API to dump the contents of the binary file as text (JSON) to the console.
>
> HTH,
>
> DR
>

Re: How to Create an effective chained MapReduce program.

Posted by David Rosenstrauch <da...@darose.net>.

On 09/06/2011 01:57 AM, Niels Basjes wrote:
> Hi,
>
> In the past i've had the same situation where I needed the data for
> debugging. Back then I chose to create a second job with simply
> SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally
> TextOutputFormat.
>
> In my situation that worked great for my purpose.

I did similar at my last job, but rather than writing a 2nd map/reduce 
job for this, we just wrote a simple command line app that used the 
Hadoop Java API to dump the contents of the binary file as text (JSON) 
to the console.

HTH,

DR

Re: How to Create an effective chained MapReduce program.

Posted by Niels Basjes <ni...@basj.es>.

Hi,

In the past i've had the same situation where I needed the data for
debugging. Back then I chose to create a second job with simply
SequenceFileInputFormat, IdentityMapper, IdentityReducer and finally
TextOutputFormat.

In my situation that worked great for my purpose.

-- 
Met vriendelijke groet,
Niels Basjes

Op 6 sep. 2011 01:54 schreef "ilyal levin" <ni...@gmail.com> het
volgende:
>
> o.k , so now i'm using SequenceFileInputFormat
and SequenceFileOutputFormat and it works fine but the output of the reducer
is
> now a binary file (not txt) so i can't understand the data. how can i
solve this? i need the data (in txt form ) of the Intermediate stages in the
chain.
>
> Thanks
>
>
> On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <ni...@gmail.com> wrote:
>>
>> Thanks for the help.
>>
>>
>> On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <ro...@ucdavis.edu> wrote:
>>>
>>> The binary file will allow you to pass the output from the first reducer
to the second mapper. For example, if you outputed Text, IntWritable from
the first one in SequenceFileOutputFormat, then you are able to retrieve
Text, IntWritable input at the head of the second mapper. The idea of
chaining is that you know what kind of output the first reducer is going to
give already, and that you want to perform some secondary operation on it.
>>>
>>> One last thing on chaining jobs: it's often worth looking to see if you
can consolidate all of your separate map and reduce tasks into a single
map/reduce operation. There are many situations where it is more intuitive
to write a number of map/reduce operations and chain them together, but more
efficient to have just a single operation.
>>>
>>>
>>>
>>> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <ni...@gmail.com>
wrote:
>>>>
>>>> Thanks for the reply.
>>>> I tried it but it creates a binary file which i can not understand (i
need the result of the first job).
>>>> The other thing is how can i use this file in the next chained mapper?
i.e how can i retrieve the keys and the values in the map function?
>>>>
>>>>
>>>> Ilyal
>>>>
>>>>
>>>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <jo...@cloudera.com>
wrote:
>>>>>
>>>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat?
>>>>>
>>>>> -Joey
>>>>>
>>>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <ni...@gmail.com>
wrote:
>>>>> > Hi
>>>>> > I'm trying to write a chained mapreduce program. i'm doing so with a
simple
>>>>> > loop where in each iteration i
>>>>> > create a job ,execute it and every time the current job's output is
the next
>>>>> > job's input.
>>>>> > how can i configure the outputFormat of the current job and the
inputFormat
>>>>> > of the next job so that
>>>>> > i will not use the TextInputFormat (TextOutputFormat), because if i
do use
>>>>> > it, i need to parse the input file in the Map function?
>>>>> > i.e if possible i want the next job to "consider" the input file as
>>>>> > <key,value> and not plain Text.
>>>>> > Thanks a lot.
>>>>> >
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Joseph Echeverria
>>>>> Cloudera, Inc.
>>>>> 443.305.9434
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Roger Chen
>>> UC Davis Genome Center
>>
>>
>

Re: How to Create an effective chained MapReduce program.

Posted by ilyal levin <ni...@gmail.com>.

o.k , so now i'm using SequenceFileInputFormat and SequenceFileOutputFormat
and it works fine but the output of the reducer is
now a binary file (not txt) so i can't understand the data. how can i solve
this? i need the data (in txt form ) of the Intermediate stages in the
chain.

Thanks

On Tue, Sep 6, 2011 at 1:33 AM, ilyal levin <ni...@gmail.com> wrote:

> Thanks for the help.
>
>
> On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <ro...@ucdavis.edu> wrote:
>
>> The binary file will allow you to pass the output from the first reducer
>> to the second mapper. For example, if you outputed Text, IntWritable from
>> the first one in SequenceFileOutputFormat, then you are able to retrieve
>> Text, IntWritable input at the head of the second mapper. The idea of
>> chaining is that you know what kind of output the first reducer is going to
>> give already, and that you want to perform some secondary operation on it.
>>
>> One last thing on chaining jobs: it's often worth looking to see if you
>> can consolidate all of your separate map and reduce tasks into a single
>> map/reduce operation. There are many situations where it is more intuitive
>> to write a number of map/reduce operations and chain them together, but more
>> efficient to have just a single operation.
>>
>>
>>
>> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <ni...@gmail.com>wrote:
>>
>>> Thanks for the reply.
>>> I tried it but it creates a binary file which i can not understand (i
>>> need the result of the first job).
>>> The other thing is how can i use this file in the next chained mapper?
>>> i.e how can i retrieve the keys and the values in the map function?
>>>
>>>
>>> Ilyal
>>>
>>>
>>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <jo...@cloudera.com>wrote:
>>>
>>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat?
>>>>
>>>> -Joey
>>>>
>>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <ni...@gmail.com>
>>>> wrote:
>>>> > Hi
>>>> > I'm trying to write a chained mapreduce program. i'm doing so with a
>>>> simple
>>>> > loop where in each iteration i
>>>> > create a job ,execute it and every time the current job's output is
>>>> the next
>>>> > job's input.
>>>> > how can i configure the outputFormat of the current job and the
>>>> inputFormat
>>>> > of the next job so that
>>>> > i will not use the TextInputFormat (TextOutputFormat), because if i do
>>>> use
>>>> > it, i need to parse the input file in the Map function?
>>>> > i.e if possible i want the next job to "consider" the input file as
>>>> > <key,value> and not plain Text.
>>>> > Thanks a lot.
>>>> >
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Joseph Echeverria
>>>> Cloudera, Inc.
>>>> 443.305.9434
>>>>
>>>
>>>
>>
>>
>> --
>> Roger Chen
>> UC Davis Genome Center
>>
>
>

Re: How to Create an effective chained MapReduce program.

Posted by ilyal levin <ni...@gmail.com>.

Thanks for the help.

On Mon, Sep 5, 2011 at 10:50 PM, Roger Chen <ro...@ucdavis.edu> wrote:

> The binary file will allow you to pass the output from the first reducer to
> the second mapper. For example, if you outputed Text, IntWritable from the
> first one in SequenceFileOutputFormat, then you are able to retrieve Text,
> IntWritable input at the head of the second mapper. The idea of chaining is
> that you know what kind of output the first reducer is going to give
> already, and that you want to perform some secondary operation on it.
>
> One last thing on chaining jobs: it's often worth looking to see if you can
> consolidate all of your separate map and reduce tasks into a single
> map/reduce operation. There are many situations where it is more intuitive
> to write a number of map/reduce operations and chain them together, but more
> efficient to have just a single operation.
>
>
>
> On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <ni...@gmail.com>wrote:
>
>> Thanks for the reply.
>> I tried it but it creates a binary file which i can not understand (i need
>> the result of the first job).
>> The other thing is how can i use this file in the next chained mapper? i.e
>> how can i retrieve the keys and the values in the map function?
>>
>>
>> Ilyal
>>
>>
>> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <jo...@cloudera.com>wrote:
>>
>>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat?
>>>
>>> -Joey
>>>
>>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <ni...@gmail.com>
>>> wrote:
>>> > Hi
>>> > I'm trying to write a chained mapreduce program. i'm doing so with a
>>> simple
>>> > loop where in each iteration i
>>> > create a job ,execute it and every time the current job's output is the
>>> next
>>> > job's input.
>>> > how can i configure the outputFormat of the current job and the
>>> inputFormat
>>> > of the next job so that
>>> > i will not use the TextInputFormat (TextOutputFormat), because if i do
>>> use
>>> > it, i need to parse the input file in the Map function?
>>> > i.e if possible i want the next job to "consider" the input file as
>>> > <key,value> and not plain Text.
>>> > Thanks a lot.
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Joseph Echeverria
>>> Cloudera, Inc.
>>> 443.305.9434
>>>
>>
>>
>
>
> --
> Roger Chen
> UC Davis Genome Center
>

Re: How to Create an effective chained MapReduce program.

Posted by Roger Chen <ro...@ucdavis.edu>.

The binary file will allow you to pass the output from the first reducer to
the second mapper. For example, if you outputed Text, IntWritable from the
first one in SequenceFileOutputFormat, then you are able to retrieve Text,
IntWritable input at the head of the second mapper. The idea of chaining is
that you know what kind of output the first reducer is going to give
already, and that you want to perform some secondary operation on it.

One last thing on chaining jobs: it's often worth looking to see if you can
consolidate all of your separate map and reduce tasks into a single
map/reduce operation. There are many situations where it is more intuitive
to write a number of map/reduce operations and chain them together, but more
efficient to have just a single operation.

On Mon, Sep 5, 2011 at 12:21 PM, ilyal levin <ni...@gmail.com> wrote:

> Thanks for the reply.
> I tried it but it creates a binary file which i can not understand (i need
> the result of the first job).
> The other thing is how can i use this file in the next chained mapper? i.e
> how can i retrieve the keys and the values in the map function?
>
>
> Ilyal
>
>
> On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <jo...@cloudera.com> wrote:
>
>> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat?
>>
>> -Joey
>>
>> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <ni...@gmail.com>
>> wrote:
>> > Hi
>> > I'm trying to write a chained mapreduce program. i'm doing so with a
>> simple
>> > loop where in each iteration i
>> > create a job ,execute it and every time the current job's output is the
>> next
>> > job's input.
>> > how can i configure the outputFormat of the current job and the
>> inputFormat
>> > of the next job so that
>> > i will not use the TextInputFormat (TextOutputFormat), because if i do
>> use
>> > it, i need to parse the input file in the Map function?
>> > i.e if possible i want the next job to "consider" the input file as
>> > <key,value> and not plain Text.
>> > Thanks a lot.
>> >
>> >
>> >
>>
>>
>>
>> --
>> Joseph Echeverria
>> Cloudera, Inc.
>> 443.305.9434
>>
>
>

-- 
Roger Chen
UC Davis Genome Center

Re: How to Create an effective chained MapReduce program.

Posted by ilyal levin <ni...@gmail.com>.

Thanks for the reply.
I tried it but it creates a binary file which i can not understand (i need
the result of the first job).
The other thing is how can i use this file in the next chained mapper? i.e
how can i retrieve the keys and the values in the map function?


Ilyal

On Mon, Sep 5, 2011 at 7:41 PM, Joey Echeverria <jo...@cloudera.com> wrote:

> Have you tried SequenceFileOutputFormat and SequenceFileInputFormat?
>
> -Joey
>
> On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <ni...@gmail.com>
> wrote:
> > Hi
> > I'm trying to write a chained mapreduce program. i'm doing so with a
> simple
> > loop where in each iteration i
> > create a job ,execute it and every time the current job's output is the
> next
> > job's input.
> > how can i configure the outputFormat of the current job and the
> inputFormat
> > of the next job so that
> > i will not use the TextInputFormat (TextOutputFormat), because if i do
> use
> > it, i need to parse the input file in the Map function?
> > i.e if possible i want the next job to "consider" the input file as
> > <key,value> and not plain Text.
> > Thanks a lot.
> >
> >
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Re: How to Create an effective chained MapReduce program.

Posted by Joey Echeverria <jo...@cloudera.com>.

Have you tried SequenceFileOutputFormat and SequenceFileInputFormat?

-Joey

On Mon, Sep 5, 2011 at 11:49 AM, ilyal levin <ni...@gmail.com> wrote:
> Hi
> I'm trying to write a chained mapreduce program. i'm doing so with a simple
> loop where in each iteration i
> create a job ,execute it and every time the current job's output is the next
> job's input.
> how can i configure the outputFormat of the current job and the inputFormat
> of the next job so that
> i will not use the TextInputFormat (TextOutputFormat), because if i do use
> it, i need to parse the input file in the Map function?
> i.e if possible i want the next job to "consider" the input file as
> <key,value> and not plain Text.
> Thanks a lot.
>
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434