You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Tom White <to...@cloudera.com> on 2009/01/22 16:40:59 UTC

Re: Set the Order of the Keys in Reduce

Hi Brian,

The CAT_A and CAT_B keys will be processed by different reducer
instances, so they run independently and may run in any order. What's
the output that you're trying to get?

Cheers,
Tom

On Thu, Jan 22, 2009 at 3:25 PM, Brian MacKay
<Br...@medecision.com> wrote:
> Hello,
>
>
>
> Any tips would be greatly appreciated.
>
>
>
> Is there a way to set the order of the keys in reduce as shown below, no
> matter what order the collection in MAP occurs in.
>
>
>
> Thanks, Brian
>
>
>
>
>
>    public void map(WritableComparable key, Text values,
>
>            OutputCollector<Text, Text> output, Reporter reporter)
> throws IOException {
>
>
>
>                        //collect many CAT_A and CAT_B in random order
>
>                        output.collect(CAT_A, details);
>
>                        output.collect(CAT_B, details);
>
>
>
>     }
>
>
>
>
>
>
>
>   public void reduce(Text key, Iterator<Text> values,
>
>                    OutputCollector<Text, Text> output, Reporter
> reporter) throws IOException {
>
>
>
>            //always reduce CAT_A first, then reduce CAT_B
>
>
>
>  }
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this message in error, please contact the sender and delete the material
> from any computer.
>
>

RE: Set the Order of the Keys in Reduce

Posted by Brian MacKay <Br...@MEDecision.com>.
Owen, Thanks for joining in..

I suppose what is needed is a new config setting called
"SequenceReducer".  In it you would specify multiple reducer classes in
the order you would like executed by JobTracker.   When Map completes,
MyReducerA.class would run, and in it would be specified the keys it
should reduce, not all existing. In Owen's example, this could be "CAT".
When all instances of the MyReducerA complete reducing "CAT", JobTracker
would move on to the next reducer in the list. MyReducerB could then
retrieve the values reduced down from "CAT" in HDFS  as a filter to
reduce "DOG".

List list = new ArrayList();

List.add( MyReducerA.class ) //Reduces "CAT"
List.add( MyReducerB.class ) //Reduces "DOG"

conf.setSequenceReducer (list);


I agree with the previous posts and appreciate everyone insights and
participation.  What I proposed above is not simple. But when one
considers the size of the job, running it twice doesn't make a lot of
sense.  Should one rerun a 40 gb job file because the values reduced in
"CAT" are needed to filter the reduce of "DOG"? A better way must exist!

Owen, maybe I misunderstood your message, but it seems like even with
the addition of a partitioner and raw comparator Tom's post would still
prevent what I'm trying to do without having what is suggested above in
some fashion.

"you can't get one reducer to depend on the output of another."


Thanks, Brian



-----Original Message-----
From: Tom White [mailto:tom@cloudera.com] 
Sent: Thursday, January 22, 2009 11:04 AM
To: core-user@hadoop.apache.org
Subject: Re: Set the Order of the Keys in Reduce

Reducers run independently and without knowledge of one another, so
you can't get one reducer to depend on the output of another. I think
having two jobs is the simplest way to achieve what you're trying to
do.

Tom

On Thu, Jan 22, 2009 at 3:48 PM, Brian MacKay
<Br...@medecision.com> wrote:
> Hello Tom,
>
> Would like to apply some rules To CAT_A, then use the output of CAT_A
to
> reduce CAT_B.   I'd rather not run two JOBS, so perhaps I need two
> reducers?
>
>
> First Reducer processes CAT_A, then when complete second reducer does
> CAT_B?
>
> I suppose this would accomplish the same thing?
>
>
>
> -----Original Message-----
> From: Tom White [mailto:tom@cloudera.com]
> Sent: Thursday, January 22, 2009 10:41 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Set the Order of the Keys in Reduce
>
> Hi Brian,
>
> The CAT_A and CAT_B keys will be processed by different reducer
> instances, so they run independently and may run in any order. What's
> the output that you're trying to get?
>
> Cheers,
> Tom
>
> On Thu, Jan 22, 2009 at 3:25 PM, Brian MacKay
> <Br...@medecision.com> wrote:
>> Hello,
>>
>>
>>
>> Any tips would be greatly appreciated.
>>
>>
>>
>> Is there a way to set the order of the keys in reduce as shown below,
> no
>> matter what order the collection in MAP occurs in.
>>
>>
>>
>> Thanks, Brian
>>
>>
>>
>>
>>
>>    public void map(WritableComparable key, Text values,
>>
>>            OutputCollector<Text, Text> output, Reporter reporter)
>> throws IOException {
>>
>>
>>
>>                        //collect many CAT_A and CAT_B in random order
>>
>>                        output.collect(CAT_A, details);
>>
>>                        output.collect(CAT_B, details);
>>
>>
>>
>>     }
>>
>>
>>
>>
>>
>>
>>
>>   public void reduce(Text key, Iterator<Text> values,
>>
>>                    OutputCollector<Text, Text> output, Reporter
>> reporter) throws IOException {
>>
>>
>>
>>            //always reduce CAT_A first, then reduce CAT_B
>>
>>
>>
>>  }
>>
>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _
>>
>> The information transmitted is intended only for the person or entity
> to
>> which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of,
> or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipient is prohibited. If you
> received
>> this message in error, please contact the sender and delete the
> material
>> from any computer.
>>
>>
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _
>
> The information transmitted is intended only for the person or entity
to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of,
or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you
received
> this message in error, please contact the sender and delete the
material
> from any computer.
>
>
>

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received 
this message in error, please contact the sender and delete the material 
from any computer.



Re: Set the Order of the Keys in Reduce

Posted by Tom White <to...@cloudera.com>.
Reducers run independently and without knowledge of one another, so
you can't get one reducer to depend on the output of another. I think
having two jobs is the simplest way to achieve what you're trying to
do.

Tom

On Thu, Jan 22, 2009 at 3:48 PM, Brian MacKay
<Br...@medecision.com> wrote:
> Hello Tom,
>
> Would like to apply some rules To CAT_A, then use the output of CAT_A to
> reduce CAT_B.   I'd rather not run two JOBS, so perhaps I need two
> reducers?
>
>
> First Reducer processes CAT_A, then when complete second reducer does
> CAT_B?
>
> I suppose this would accomplish the same thing?
>
>
>
> -----Original Message-----
> From: Tom White [mailto:tom@cloudera.com]
> Sent: Thursday, January 22, 2009 10:41 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Set the Order of the Keys in Reduce
>
> Hi Brian,
>
> The CAT_A and CAT_B keys will be processed by different reducer
> instances, so they run independently and may run in any order. What's
> the output that you're trying to get?
>
> Cheers,
> Tom
>
> On Thu, Jan 22, 2009 at 3:25 PM, Brian MacKay
> <Br...@medecision.com> wrote:
>> Hello,
>>
>>
>>
>> Any tips would be greatly appreciated.
>>
>>
>>
>> Is there a way to set the order of the keys in reduce as shown below,
> no
>> matter what order the collection in MAP occurs in.
>>
>>
>>
>> Thanks, Brian
>>
>>
>>
>>
>>
>>    public void map(WritableComparable key, Text values,
>>
>>            OutputCollector<Text, Text> output, Reporter reporter)
>> throws IOException {
>>
>>
>>
>>                        //collect many CAT_A and CAT_B in random order
>>
>>                        output.collect(CAT_A, details);
>>
>>                        output.collect(CAT_B, details);
>>
>>
>>
>>     }
>>
>>
>>
>>
>>
>>
>>
>>   public void reduce(Text key, Iterator<Text> values,
>>
>>                    OutputCollector<Text, Text> output, Reporter
>> reporter) throws IOException {
>>
>>
>>
>>            //always reduce CAT_A first, then reduce CAT_B
>>
>>
>>
>>  }
>>
>> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> _ _ _
>>
>> The information transmitted is intended only for the person or entity
> to
>> which it is addressed and may contain confidential and/or privileged
>> material. Any review, retransmission, dissemination or other use of,
> or
>> taking of any action in reliance upon, this information by persons or
>> entities other than the intended recipient is prohibited. If you
> received
>> this message in error, please contact the sender and delete the
> material
>> from any computer.
>>
>>
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this message in error, please contact the sender and delete the material
> from any computer.
>
>
>

RE: Set the Order of the Keys in Reduce

Posted by Brian MacKay <Br...@MEDecision.com>.
Hello Tom,

Would like to apply some rules To CAT_A, then use the output of CAT_A to
reduce CAT_B.   I'd rather not run two JOBS, so perhaps I need two
reducers?


First Reducer processes CAT_A, then when complete second reducer does
CAT_B?

I suppose this would accomplish the same thing?



-----Original Message-----
From: Tom White [mailto:tom@cloudera.com] 
Sent: Thursday, January 22, 2009 10:41 AM
To: core-user@hadoop.apache.org
Subject: Re: Set the Order of the Keys in Reduce

Hi Brian,

The CAT_A and CAT_B keys will be processed by different reducer
instances, so they run independently and may run in any order. What's
the output that you're trying to get?

Cheers,
Tom

On Thu, Jan 22, 2009 at 3:25 PM, Brian MacKay
<Br...@medecision.com> wrote:
> Hello,
>
>
>
> Any tips would be greatly appreciated.
>
>
>
> Is there a way to set the order of the keys in reduce as shown below,
no
> matter what order the collection in MAP occurs in.
>
>
>
> Thanks, Brian
>
>
>
>
>
>    public void map(WritableComparable key, Text values,
>
>            OutputCollector<Text, Text> output, Reporter reporter)
> throws IOException {
>
>
>
>                        //collect many CAT_A and CAT_B in random order
>
>                        output.collect(CAT_A, details);
>
>                        output.collect(CAT_B, details);
>
>
>
>     }
>
>
>
>
>
>
>
>   public void reduce(Text key, Iterator<Text> values,
>
>                    OutputCollector<Text, Text> output, Reporter
> reporter) throws IOException {
>
>
>
>            //always reduce CAT_A first, then reduce CAT_B
>
>
>
>  }
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _
>
> The information transmitted is intended only for the person or entity
to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of,
or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you
received
> this message in error, please contact the sender and delete the
material
> from any computer.
>
>

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received 
this message in error, please contact the sender and delete the material 
from any computer.