You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@chukwa.apache.org by Oded Rosen <od...@legolas-media.com> on 2010/03/28 15:52:25 UTC
Chukwa customized ReduceProcessor keys
Hey everyone,
Thanks to your help (especially by Eric & Jerome), I've managed to write my
own little demux processor, including a customized mapper & reducer, for my
data type.
For now, all of my map output is sent to only reduce process (although
Chukwa opens 8 different reduce processes in each demux run).
I would like to exploit the whole cluster, and to have multiple reduce
processes (same reducer class, of course, just many instances of them).
I've tried to do it by setting different values to ChukwaRecordKey.setKey()
in my mapper:
protected void parse(String recordEntry,
OutputCollector<ChukwaRecordKey, ChukwaRecord> output,
Reporter reporter) throws Throwable {
key = new ChukwaRecordKey();
String *keyStr* = DATA_TYPE + Math.floor((NUM_OF_REDUCERS*Math.random()))+1;
ChukwaRecord record = new ChukwaRecord();
this.buildGenericRecord(record, null, timestamp, *keyStr*);
key.setKey(*keyStr*);
key.setReduceType(ReducerName);
.... (record logic)....
output.collect(key, record);
}
Although I have multiple keys, all of the records are still sent to the same
reducer process.
How can I send records to different processes?
Thanks a lot,
--
Oded
Re: Chukwa customized ReduceProcessor keys
Posted by Jerome Boulon <jb...@netflix.com>.
Current Demux if i remember well is using reducerType for the
partitionning.
If you don't need a global sort per datatype then you can write your
own partionner based on the key instead of the dataType.
/Jerome
On Mar 28, 2010, at 6:53, "Oded Rosen" <od...@legolas-media.com> wrote:
> Hey everyone,
>
> Thanks to your help (especially by Eric & Jerome), I've managed to
> write my own little demux processor, including a customized mapper &
> reducer, for my data type.
> For now, all of my map output is sent to only reduce process
> (although Chukwa opens 8 different reduce processes in each demux
> run).
>
> I would like to exploit the whole cluster, and to have multiple
> reduce processes (same reducer class, of course, just many instances
> of them).
> I've tried to do it by setting different values to
> ChukwaRecordKey.setKey() in my mapper:
>
> protected void parse(String recordEntry,
> OutputCollector<ChukwaRecordKey, ChukwaRecord> output,
> Reporter reporter) throws Throwable {
>
> key = new ChukwaRecordKey();
> String keyStr = DATA_TYPE + Math.floor((NUM_OF_REDUCERS*Math.random
> ()))+1;
> ChukwaRecord record = new ChukwaRecord();
> this.buildGenericRecord(record, null, timestamp, keyStr);
> key.setKey(keyStr);
> key.setReduceType(ReducerName);
>
> .... (record logic)....
>
> output.collect(key, record);
>
> }
>
> Although I have multiple keys, all of the records are still sent to
> the same reducer process.
> How can I send records to different processes?
>
> Thanks a lot,
> --
> Oded
Re: Chukwa customized ReduceProcessor keys
Posted by Eric Yang <ey...@yahoo-inc.com>.
Hi Oded,
The current Demux implementation is a prototype, and it is not using
map/reduce effectively. I would like to refine the partition algorithm to
be partitioned by Time partition + record type + Sampled resolution.
Time partition, and record type are obvious to use multiple reducers to
process different time partitions. For sampled resolution, this is for
calculating down sampled data resolution concurrently in Demux. Hence, we
can output at hourly, daily, weekly, and monthly resolutions at the same
time.
This probably should be a subtask of CHUKWA-444.
Regards,
Eric
On 3/28/10 6:52 AM, "Oded Rosen" <od...@legolas-media.com> wrote:
> Hey everyone,
>
> Thanks to your help (especially by Eric & Jerome), I've managed to write my
> own little demux processor, including a customized mapper & reducer, for my
> data type.
> For now, all of my map output is sent to only reduce process (although Chukwa
> opens 8 different reduce processes in each demux run).
>
> I would like to exploit the whole cluster, and to have multiple reduce
> processes (same reducer class, of course, just many instances of them).
> I've tried to do it by setting different values to ChukwaRecordKey.setKey() in
> my mapper:
>
> protected void parse(String recordEntry,
> OutputCollector<ChukwaRecordKey, ChukwaRecord> output,
> Reporter reporter) throws Throwable {
>
> key = new ChukwaRecordKey();
> String keyStr = DATA_TYPE + Math.floor((NUM_OF_REDUCERS*Math.random()))+1;
> ChukwaRecord record = new ChukwaRecord();
> this.buildGenericRecord(record, null, timestamp, keyStr);
> key.setKey(keyStr);
> key.setReduceType(ReducerName);
>
> .... (record logic)....
>
> output.collect(key, record);
>
> }
>
> Although I have multiple keys, all of the records are still sent to the same
> reducer process.
> How can I send records to different processes?
>
> Thanks a lot,