You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by madhu phatak <ph...@gmail.com> on 2011/02/03 12:16:10 UTC
Re: Map->Reduce->Reduce
Reducer will get the <Key,Value> pair in sorted manner.If you can generate
key in order of required sort you can process in map reduce job
On Tue, Jan 25, 2011 at 6:21 PM, Harsh J <qw...@gmail.com> wrote:
> Vanilla Hadoop does not support this without the intermediate I/O
> cost. You can checkout the Hadoop Online Project at
> http://code.google.com/p/hop, as that does support letting a Reducer's
> output go directly to the next job's mapper (as in, a pipeline).
>
> In this topic of pipelining, also checkout what's being done in Plume
> (Based on Google's FlumeJava): http://github.com/tdunning/Plume
>
> On Tue, Jan 25, 2011 at 5:16 PM, Matthew John
> <tm...@gmail.com> wrote:
> > Hi all,
> >
> >
> > I was working on a MapReduce program which does BytesWritable
> > dataprocessing. But currently I am basically running two MapReduces
> > consecutively to get the final output :
> >
> > Input ----(MapReduce1)---> Intermediate ----(MapReduce2)---> Output
> >
> > Here I am running MapReduce2 only to sort the intermediate data on the
> basis
> > of a Key comparator logic.
> >
> > I wanted to cut short the number of MapReduces to just one. I have
> figured
> > out a logic to do the same. But the only problem is that in my logic I
> need
> > to run a sort on the Reduce output to get the final output. the flow
> looks
> > like this :
> >
> > Input ----(MapReduce1)----> Output (not sorted)
> >
> > I want to know if its possible to attach one more Reduce module to the
> > dataflow so that it can perform the inherent sort before the 2nd reduce
> > call. It would look like :
> >
> > Input --(Map)---> MapOutput ---(Reduce1)-->Output (not sorted)
> ---(Reduce2 -
> > for which Reduce 1 acts as a Mapper)---> Output
> >
> > Please let me know if there can be some means of sorting the output
> > without invoking a separate MapReduce just for the sake of sorting it .
> >
> > Thanks ,
> > Matthew
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>
Re: Command Line Arguments for Client
Posted by Harsh J <qw...@gmail.com>.
Hey,
On Wed, Feb 23, 2011 at 6:22 AM, C.V.Krishnakumar Iyer
<f2...@gmail.com> wrote:
> Hi,
>
> Could anyone tell how we could set the commandline arguments ( like -Xmx and -Xms) for the client (not for the map/reduce tasks) from the command that is usually used to launch the job?
You can set the HADOOP_CLIENT_OPTS environment variable to apply
additional JVM opts to all client side commands alone (fs, jar, etc.).
--
Harsh J
www.harshj.com
Command Line Arguments for Client
Posted by "C.V.Krishnakumar Iyer" <f2...@gmail.com>.
Hi,
Could anyone tell how we could set the commandline arguments ( like -Xmx and -Xms) for the client (not for the map/reduce tasks) from the command that is usually used to launch the job?
Thanks,
Krishnakumar