You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by madhu phatak <ph...@gmail.com> on 2011/02/03 12:16:10 UTC

Re: Map->Reduce->Reduce

Reducer will get the <Key,Value> pair in sorted manner.If you can generate
key in order of required sort you can process in map reduce job

On Tue, Jan 25, 2011 at 6:21 PM, Harsh J <qw...@gmail.com> wrote:

> Vanilla Hadoop does not support this without the intermediate I/O
> cost. You can checkout the Hadoop Online Project at
> http://code.google.com/p/hop, as that does support letting a Reducer's
> output go directly to the next job's mapper (as in, a pipeline).
>
> In this topic of pipelining, also checkout what's being done in Plume
> (Based on Google's FlumeJava): http://github.com/tdunning/Plume
>
> On Tue, Jan 25, 2011 at 5:16 PM, Matthew John
> <tm...@gmail.com> wrote:
> > Hi all,
> >
> >
> > I was working on a MapReduce program which does BytesWritable
> > dataprocessing. But currently I am basically running two MapReduces
> > consecutively to get the final output :
> >
> > Input  ----(MapReduce1)---> Intermediate ----(MapReduce2)---> Output
> >
> > Here I am running MapReduce2 only to sort the intermediate data on the
> basis
> > of a Key comparator logic.
> >
> > I wanted to cut short the number of MapReduces to just one. I have
> figured
> > out a logic to do the same. But the only problem is that in my  logic I
> need
> > to run a sort on the Reduce output to get the  final output. the flow
> looks
> > like this :
> >
> > Input ----(MapReduce1)----> Output (not sorted)
> >
> > I want to know if its possible to attach one more Reduce module to the
> > dataflow so that it can perform the inherent sort before the 2nd reduce
> > call. It would look like :
> >
> > Input --(Map)---> MapOutput ---(Reduce1)-->Output (not sorted)
> ---(Reduce2 -
> > for which Reduce 1 acts as a Mapper)---> Output
> >
> > Please let me know  if  there can be some means of sorting the output
> > without invoking a separate MapReduce just for the sake of sorting it .
> >
> > Thanks ,
> > Matthew
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>

Re: Command Line Arguments for Client

Posted by Harsh J <qw...@gmail.com>.

Hey,

On Wed, Feb 23, 2011 at 6:22 AM, C.V.Krishnakumar Iyer
<f2...@gmail.com> wrote:
> Hi,
>
> Could anyone tell how we could set the commandline arguments ( like -Xmx and -Xms) for the  client (not for the map/reduce tasks) from the command  that is usually used to launch the job?

You can set the HADOOP_CLIENT_OPTS environment variable to apply
additional JVM opts to all client side commands alone (fs, jar, etc.).

-- 
Harsh J
www.harshj.com

Command Line Arguments for Client

Posted by "C.V.Krishnakumar Iyer" <f2...@gmail.com>.

Hi,

Could anyone tell how we could set the commandline arguments ( like -Xmx and -Xms) for the  client (not for the map/reduce tasks) from the command  that is usually used to launch the job? 

Thanks,
Krishnakumar