You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Sam Garrett <sa...@actionx.com> on 2013/08/13 21:51:13 UTC

Reduce Task Clarification

I am working on a MapReduce job where I would like to have the output
sorted by a LongWritable value. I read the Anatomy of a MapReduce Run in
the Definitive Guide and it didn't say explicitly whether reduce() gets
called only once per map output key. If it does get called only once I was
thinking that I could use this:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)to
do the sorting.

Thank you for your time.

-- 
Sam Garrett
ActionX, NYC

Re: Reduce Task Clarification

Posted by Harsh J <ha...@cloudera.com>.
Are you looking to do a secondary sort under a grouped key?

A reduce() is called once for each globally unique map() emitted key,
along with all values grouped for it. To sort the grouped data, you
need to use a separate sort comparator and perform the 'secondary
sort'.

On Wed, Aug 14, 2013 at 1:21 AM, Sam Garrett <sa...@actionx.com> wrote:
> I am working on a MapReduce job where I would like to have the output sorted
> by a LongWritable value. I read the Anatomy of a MapReduce Run in the
> Definitive Guide and it didn't say explicitly whether reduce() gets called
> only once per map output key. If it does get called only once I was thinking
> that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)
> to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC



-- 
Harsh J

Re: Reduce Task Clarification

Posted by Raj K Singh <ra...@gmail.com>.
Implement raw comparator for your emitted keys to sort the output at the
reducer.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Wed, Aug 14, 2013 at 1:21 AM, Sam Garrett <sa...@actionx.com> wrote:

> I am working on a MapReduce job where I would like to have the output
> sorted by a LongWritable value. I read the Anatomy of a MapReduce Run in
> the Definitive Guide and it didn't say explicitly whether reduce() gets
> called only once per map output key. If it does get called only once I was
> thinking that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC
>

Re: Reduce Task Clarification

Posted by Shahab Yunus <sh...@gmail.com>.
Also Sam, following are is a link giving example about how to implement
secondary sort and what it is...

http://codingjunkie.net/secondary-sort/

Regards,
Shahab


On Tue, Aug 13, 2013 at 3:51 PM, Sam Garrett <sa...@actionx.com> wrote:

> I am working on a MapReduce job where I would like to have the output
> sorted by a LongWritable value. I read the Anatomy of a MapReduce Run in
> the Definitive Guide and it didn't say explicitly whether reduce() gets
> called only once per map output key. If it does get called only once I was
> thinking that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC
>

Re: Reduce Task Clarification

Posted by Raj K Singh <ra...@gmail.com>.
Implement raw comparator for your emitted keys to sort the output at the
reducer.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Wed, Aug 14, 2013 at 1:21 AM, Sam Garrett <sa...@actionx.com> wrote:

> I am working on a MapReduce job where I would like to have the output
> sorted by a LongWritable value. I read the Anatomy of a MapReduce Run in
> the Definitive Guide and it didn't say explicitly whether reduce() gets
> called only once per map output key. If it does get called only once I was
> thinking that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC
>

Re: Reduce Task Clarification

Posted by Harsh J <ha...@cloudera.com>.
Are you looking to do a secondary sort under a grouped key?

A reduce() is called once for each globally unique map() emitted key,
along with all values grouped for it. To sort the grouped data, you
need to use a separate sort comparator and perform the 'secondary
sort'.

On Wed, Aug 14, 2013 at 1:21 AM, Sam Garrett <sa...@actionx.com> wrote:
> I am working on a MapReduce job where I would like to have the output sorted
> by a LongWritable value. I read the Anatomy of a MapReduce Run in the
> Definitive Guide and it didn't say explicitly whether reduce() gets called
> only once per map output key. If it does get called only once I was thinking
> that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)
> to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC



-- 
Harsh J

Re: Reduce Task Clarification

Posted by Shahab Yunus <sh...@gmail.com>.
Also Sam, following are is a link giving example about how to implement
secondary sort and what it is...

http://codingjunkie.net/secondary-sort/

Regards,
Shahab


On Tue, Aug 13, 2013 at 3:51 PM, Sam Garrett <sa...@actionx.com> wrote:

> I am working on a MapReduce job where I would like to have the output
> sorted by a LongWritable value. I read the Anatomy of a MapReduce Run in
> the Definitive Guide and it didn't say explicitly whether reduce() gets
> called only once per map output key. If it does get called only once I was
> thinking that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC
>

Re: Reduce Task Clarification

Posted by Harsh J <ha...@cloudera.com>.
Are you looking to do a secondary sort under a grouped key?

A reduce() is called once for each globally unique map() emitted key,
along with all values grouped for it. To sort the grouped data, you
need to use a separate sort comparator and perform the 'secondary
sort'.

On Wed, Aug 14, 2013 at 1:21 AM, Sam Garrett <sa...@actionx.com> wrote:
> I am working on a MapReduce job where I would like to have the output sorted
> by a LongWritable value. I read the Anatomy of a MapReduce Run in the
> Definitive Guide and it didn't say explicitly whether reduce() gets called
> only once per map output key. If it does get called only once I was thinking
> that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)
> to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC



-- 
Harsh J

Re: Reduce Task Clarification

Posted by Raj K Singh <ra...@gmail.com>.
Implement raw comparator for your emitted keys to sort the output at the
reducer.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Wed, Aug 14, 2013 at 1:21 AM, Sam Garrett <sa...@actionx.com> wrote:

> I am working on a MapReduce job where I would like to have the output
> sorted by a LongWritable value. I read the Anatomy of a MapReduce Run in
> the Definitive Guide and it didn't say explicitly whether reduce() gets
> called only once per map output key. If it does get called only once I was
> thinking that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC
>

Re: Reduce Task Clarification

Posted by Shahab Yunus <sh...@gmail.com>.
Also Sam, following are is a link giving example about how to implement
secondary sort and what it is...

http://codingjunkie.net/secondary-sort/

Regards,
Shahab


On Tue, Aug 13, 2013 at 3:51 PM, Sam Garrett <sa...@actionx.com> wrote:

> I am working on a MapReduce job where I would like to have the output
> sorted by a LongWritable value. I read the Anatomy of a MapReduce Run in
> the Definitive Guide and it didn't say explicitly whether reduce() gets
> called only once per map output key. If it does get called only once I was
> thinking that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC
>

Re: Reduce Task Clarification

Posted by Harsh J <ha...@cloudera.com>.
Are you looking to do a secondary sort under a grouped key?

A reduce() is called once for each globally unique map() emitted key,
along with all values grouped for it. To sort the grouped data, you
need to use a separate sort comparator and perform the 'secondary
sort'.

On Wed, Aug 14, 2013 at 1:21 AM, Sam Garrett <sa...@actionx.com> wrote:
> I am working on a MapReduce job where I would like to have the output sorted
> by a LongWritable value. I read the Anatomy of a MapReduce Run in the
> Definitive Guide and it didn't say explicitly whether reduce() gets called
> only once per map output key. If it does get called only once I was thinking
> that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)
> to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC



-- 
Harsh J

Re: Reduce Task Clarification

Posted by Raj K Singh <ra...@gmail.com>.
Implement raw comparator for your emitted keys to sort the output at the
reducer.

::::::::::::::::::::::::::::::::::::::::
Raj K Singh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Wed, Aug 14, 2013 at 1:21 AM, Sam Garrett <sa...@actionx.com> wrote:

> I am working on a MapReduce job where I would like to have the output
> sorted by a LongWritable value. I read the Anatomy of a MapReduce Run in
> the Definitive Guide and it didn't say explicitly whether reduce() gets
> called only once per map output key. If it does get called only once I was
> thinking that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC
>

Re: Reduce Task Clarification

Posted by Shahab Yunus <sh...@gmail.com>.
Also Sam, following are is a link giving example about how to implement
secondary sort and what it is...

http://codingjunkie.net/secondary-sort/

Regards,
Shahab


On Tue, Aug 13, 2013 at 3:51 PM, Sam Garrett <sa...@actionx.com> wrote:

> I am working on a MapReduce job where I would like to have the output
> sorted by a LongWritable value. I read the Anatomy of a MapReduce Run in
> the Definitive Guide and it didn't say explicitly whether reduce() gets
> called only once per map output key. If it does get called only once I was
> thinking that I could use this:
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setSortComparatorClass(java.lang.Class)to do the sorting.
>
> Thank you for your time.
>
> --
> Sam Garrett
> ActionX, NYC
>