You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Lin Ma <li...@gmail.com> on 2012/12/22 14:25:55 UTC

reducer tasks start time issue

Hi guys,

Supposing in a Hadoop job, there are both mappers and reducers. My question
is, reducer tasks cannot begin until all mapper tasks complete? If so, why
designed in this way?

thanks in advance,
Lin

Re: reducer tasks start time issue

Posted by Lin Ma <li...@gmail.com>.

Thanks for answering my question with not only the answer, but also
detailed description. :-)

regards,
Lin

On Sun, Dec 23, 2012 at 12:15 AM, Harsh J <ha...@cloudera.com> wrote:

> A reduce can't process the complete data set until it has fetched all
> partitions. And any map may produce a partition for any reducer.
> Hence, we generally wait before all maps have terminated, and their
> partition outputs ready and copied over to reduces, before we begin to
> group and process the keys.
>
> However, given that you began thinking about this, this paper on
> "Online" Hadoop may interest you:
> http://www.neilconway.org/docs/nsdi2010_hop.pdf
>
> On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <li...@gmail.com> wrote:
> > Hi guys,
> >
> > Supposing in a Hadoop job, there are both mappers and reducers. My
> question
> > is, reducer tasks cannot begin until all mapper tasks complete? If so,
> why
> > designed in this way?
> >
> > thanks in advance,
> > Lin
>
>
>
> --
> Harsh J
>

Re: reducer tasks start time issue

Posted by Lin Ma <li...@gmail.com>.

Thanks for answering my question with not only the answer, but also
detailed description. :-)

regards,
Lin

On Sun, Dec 23, 2012 at 12:15 AM, Harsh J <ha...@cloudera.com> wrote:

> A reduce can't process the complete data set until it has fetched all
> partitions. And any map may produce a partition for any reducer.
> Hence, we generally wait before all maps have terminated, and their
> partition outputs ready and copied over to reduces, before we begin to
> group and process the keys.
>
> However, given that you began thinking about this, this paper on
> "Online" Hadoop may interest you:
> http://www.neilconway.org/docs/nsdi2010_hop.pdf
>
> On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <li...@gmail.com> wrote:
> > Hi guys,
> >
> > Supposing in a Hadoop job, there are both mappers and reducers. My
> question
> > is, reducer tasks cannot begin until all mapper tasks complete? If so,
> why
> > designed in this way?
> >
> > thanks in advance,
> > Lin
>
>
>
> --
> Harsh J
>

Re: reducer tasks start time issue

Posted by Lin Ma <li...@gmail.com>.

Thanks for answering my question with not only the answer, but also
detailed description. :-)

regards,
Lin

On Sun, Dec 23, 2012 at 12:15 AM, Harsh J <ha...@cloudera.com> wrote:

> A reduce can't process the complete data set until it has fetched all
> partitions. And any map may produce a partition for any reducer.
> Hence, we generally wait before all maps have terminated, and their
> partition outputs ready and copied over to reduces, before we begin to
> group and process the keys.
>
> However, given that you began thinking about this, this paper on
> "Online" Hadoop may interest you:
> http://www.neilconway.org/docs/nsdi2010_hop.pdf
>
> On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <li...@gmail.com> wrote:
> > Hi guys,
> >
> > Supposing in a Hadoop job, there are both mappers and reducers. My
> question
> > is, reducer tasks cannot begin until all mapper tasks complete? If so,
> why
> > designed in this way?
> >
> > thanks in advance,
> > Lin
>
>
>
> --
> Harsh J
>

Re: reducer tasks start time issue

Posted by Lin Ma <li...@gmail.com>.

Thanks for answering my question with not only the answer, but also
detailed description. :-)

regards,
Lin

On Sun, Dec 23, 2012 at 12:15 AM, Harsh J <ha...@cloudera.com> wrote:

> A reduce can't process the complete data set until it has fetched all
> partitions. And any map may produce a partition for any reducer.
> Hence, we generally wait before all maps have terminated, and their
> partition outputs ready and copied over to reduces, before we begin to
> group and process the keys.
>
> However, given that you began thinking about this, this paper on
> "Online" Hadoop may interest you:
> http://www.neilconway.org/docs/nsdi2010_hop.pdf
>
> On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <li...@gmail.com> wrote:
> > Hi guys,
> >
> > Supposing in a Hadoop job, there are both mappers and reducers. My
> question
> > is, reducer tasks cannot begin until all mapper tasks complete? If so,
> why
> > designed in this way?
> >
> > thanks in advance,
> > Lin
>
>
>
> --
> Harsh J
>

Re: reducer tasks start time issue

Posted by Harsh J <ha...@cloudera.com>.

A reduce can't process the complete data set until it has fetched all
partitions. And any map may produce a partition for any reducer.
Hence, we generally wait before all maps have terminated, and their
partition outputs ready and copied over to reduces, before we begin to
group and process the keys.

However, given that you began thinking about this, this paper on
"Online" Hadoop may interest you:
http://www.neilconway.org/docs/nsdi2010_hop.pdf

On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <li...@gmail.com> wrote:
> Hi guys,
>
> Supposing in a Hadoop job, there are both mappers and reducers. My question
> is, reducer tasks cannot begin until all mapper tasks complete? If so, why
> designed in this way?
>
> thanks in advance,
> Lin

-- 
Harsh J

Re: reducer tasks start time issue

Posted by Harsh J <ha...@cloudera.com>.

A reduce can't process the complete data set until it has fetched all
partitions. And any map may produce a partition for any reducer.
Hence, we generally wait before all maps have terminated, and their
partition outputs ready and copied over to reduces, before we begin to
group and process the keys.

However, given that you began thinking about this, this paper on
"Online" Hadoop may interest you:
http://www.neilconway.org/docs/nsdi2010_hop.pdf

On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <li...@gmail.com> wrote:
> Hi guys,
>
> Supposing in a Hadoop job, there are both mappers and reducers. My question
> is, reducer tasks cannot begin until all mapper tasks complete? If so, why
> designed in this way?
>
> thanks in advance,
> Lin

-- 
Harsh J

Re: reducer tasks start time issue

Posted by Lin Ma <li...@gmail.com>.

Thanks Rishi,

My question is answered.

regards,
Lin

On Sun, Dec 23, 2012 at 12:09 AM, Rishi Yadav <ri...@infoobjects.com> wrote:

> Hi Lin,
>
> Reduce task starts as soon as output is ready from Mappers. The reduce
> method does not get called until all Mappers are done. If that's not the
> case, all operations which are not commutative and associative will yield
> incorrect result.
>
>
>
> Thanks and Regards,
>
> Rishi Yadav
>
> (o) 408.988.2000x113 ||  (f) 408.716.2726
>
> InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*
>
> *INC 500 Fastest growing company in 2012 || 2011*
>
> *Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
> Valley / San Jose Business Journal
>
> 2041 Mission College Boulevard, #280 || Santa Clara, CA 95054
>
>
>
>
> On Sat, Dec 22, 2012 at 5:25 AM, Lin Ma <li...@gmail.com> wrote:
>
>> Hi guys,
>>
>> Supposing in a Hadoop job, there are both mappers and reducers. My
>> question is, reducer tasks cannot begin until all mapper tasks complete? If
>> so, why designed in this way?
>>
>> thanks in advance,
>> Lin
>>
>
>

Re: reducer tasks start time issue

Posted by Lin Ma <li...@gmail.com>.

Thanks Rishi,

My question is answered.

regards,
Lin

On Sun, Dec 23, 2012 at 12:09 AM, Rishi Yadav <ri...@infoobjects.com> wrote:

> Hi Lin,
>
> Reduce task starts as soon as output is ready from Mappers. The reduce
> method does not get called until all Mappers are done. If that's not the
> case, all operations which are not commutative and associative will yield
> incorrect result.
>
>
>
> Thanks and Regards,
>
> Rishi Yadav
>
> (o) 408.988.2000x113 ||  (f) 408.716.2726
>
> InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*
>
> *INC 500 Fastest growing company in 2012 || 2011*
>
> *Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
> Valley / San Jose Business Journal
>
> 2041 Mission College Boulevard, #280 || Santa Clara, CA 95054
>
>
>
>
> On Sat, Dec 22, 2012 at 5:25 AM, Lin Ma <li...@gmail.com> wrote:
>
>> Hi guys,
>>
>> Supposing in a Hadoop job, there are both mappers and reducers. My
>> question is, reducer tasks cannot begin until all mapper tasks complete? If
>> so, why designed in this way?
>>
>> thanks in advance,
>> Lin
>>
>
>

Re: reducer tasks start time issue

Posted by Lin Ma <li...@gmail.com>.

Thanks Rishi,

My question is answered.

regards,
Lin

On Sun, Dec 23, 2012 at 12:09 AM, Rishi Yadav <ri...@infoobjects.com> wrote:

> Hi Lin,
>
> Reduce task starts as soon as output is ready from Mappers. The reduce
> method does not get called until all Mappers are done. If that's not the
> case, all operations which are not commutative and associative will yield
> incorrect result.
>
>
>
> Thanks and Regards,
>
> Rishi Yadav
>
> (o) 408.988.2000x113 ||  (f) 408.716.2726
>
> InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*
>
> *INC 500 Fastest growing company in 2012 || 2011*
>
> *Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
> Valley / San Jose Business Journal
>
> 2041 Mission College Boulevard, #280 || Santa Clara, CA 95054
>
>
>
>
> On Sat, Dec 22, 2012 at 5:25 AM, Lin Ma <li...@gmail.com> wrote:
>
>> Hi guys,
>>
>> Supposing in a Hadoop job, there are both mappers and reducers. My
>> question is, reducer tasks cannot begin until all mapper tasks complete? If
>> so, why designed in this way?
>>
>> thanks in advance,
>> Lin
>>
>
>

Re: reducer tasks start time issue

Posted by Lin Ma <li...@gmail.com>.

Thanks Rishi,

My question is answered.

regards,
Lin

On Sun, Dec 23, 2012 at 12:09 AM, Rishi Yadav <ri...@infoobjects.com> wrote:

> Hi Lin,
>
> Reduce task starts as soon as output is ready from Mappers. The reduce
> method does not get called until all Mappers are done. If that's not the
> case, all operations which are not commutative and associative will yield
> incorrect result.
>
>
>
> Thanks and Regards,
>
> Rishi Yadav
>
> (o) 408.988.2000x113 ||  (f) 408.716.2726
>
> InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*
>
> *INC 500 Fastest growing company in 2012 || 2011*
>
> *Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
> Valley / San Jose Business Journal
>
> 2041 Mission College Boulevard, #280 || Santa Clara, CA 95054
>
>
>
>
> On Sat, Dec 22, 2012 at 5:25 AM, Lin Ma <li...@gmail.com> wrote:
>
>> Hi guys,
>>
>> Supposing in a Hadoop job, there are both mappers and reducers. My
>> question is, reducer tasks cannot begin until all mapper tasks complete? If
>> so, why designed in this way?
>>
>> thanks in advance,
>> Lin
>>
>
>

Re: reducer tasks start time issue

Posted by Rishi Yadav <ri...@infoobjects.com>.

Hi Lin,

Reduce task starts as soon as output is ready from Mappers. The reduce
method does not get called until all Mappers are done. If that's not the
case, all operations which are not commutative and associative will yield
incorrect result.

Thanks and Regards,

Rishi Yadav

(o) 408.988.2000x113 ||  (f) 408.716.2726

InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*

*INC 500 Fastest growing company in 2012 || 2011*

*Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
Valley / San Jose Business Journal

2041 Mission College Boulevard, #280 || Santa Clara, CA 95054

On Sat, Dec 22, 2012 at 5:25 AM, Lin Ma <li...@gmail.com> wrote:

> Hi guys,
>
> Supposing in a Hadoop job, there are both mappers and reducers. My
> question is, reducer tasks cannot begin until all mapper tasks complete? If
> so, why designed in this way?
>
> thanks in advance,
> Lin
>

Re: reducer tasks start time issue

Posted by Rishi Yadav <ri...@infoobjects.com>.

Hi Lin,

Reduce task starts as soon as output is ready from Mappers. The reduce
method does not get called until all Mappers are done. If that's not the
case, all operations which are not commutative and associative will yield
incorrect result.

Thanks and Regards,

Rishi Yadav

(o) 408.988.2000x113 ||  (f) 408.716.2726

InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*

*INC 500 Fastest growing company in 2012 || 2011*

*Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
Valley / San Jose Business Journal

2041 Mission College Boulevard, #280 || Santa Clara, CA 95054

On Sat, Dec 22, 2012 at 5:25 AM, Lin Ma <li...@gmail.com> wrote:

> Hi guys,
>
> Supposing in a Hadoop job, there are both mappers and reducers. My
> question is, reducer tasks cannot begin until all mapper tasks complete? If
> so, why designed in this way?
>
> thanks in advance,
> Lin
>

Re: reducer tasks start time issue

Posted by Harsh J <ha...@cloudera.com>.

A reduce can't process the complete data set until it has fetched all
partitions. And any map may produce a partition for any reducer.
Hence, we generally wait before all maps have terminated, and their
partition outputs ready and copied over to reduces, before we begin to
group and process the keys.

However, given that you began thinking about this, this paper on
"Online" Hadoop may interest you:
http://www.neilconway.org/docs/nsdi2010_hop.pdf

On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <li...@gmail.com> wrote:
> Hi guys,
>
> Supposing in a Hadoop job, there are both mappers and reducers. My question
> is, reducer tasks cannot begin until all mapper tasks complete? If so, why
> designed in this way?
>
> thanks in advance,
> Lin

-- 
Harsh J

Re: reducer tasks start time issue

Posted by Rishi Yadav <ri...@infoobjects.com>.

Hi Lin,

Reduce task starts as soon as output is ready from Mappers. The reduce
method does not get called until all Mappers are done. If that's not the
case, all operations which are not commutative and associative will yield
incorrect result.

Thanks and Regards,

Rishi Yadav

(o) 408.988.2000x113 ||  (f) 408.716.2726

InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*

*INC 500 Fastest growing company in 2012 || 2011*

*Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
Valley / San Jose Business Journal

2041 Mission College Boulevard, #280 || Santa Clara, CA 95054

On Sat, Dec 22, 2012 at 5:25 AM, Lin Ma <li...@gmail.com> wrote:

> Hi guys,
>
> Supposing in a Hadoop job, there are both mappers and reducers. My
> question is, reducer tasks cannot begin until all mapper tasks complete? If
> so, why designed in this way?
>
> thanks in advance,
> Lin
>

Re: reducer tasks start time issue

Posted by Rishi Yadav <ri...@infoobjects.com>.

Hi Lin,

Reduce task starts as soon as output is ready from Mappers. The reduce
method does not get called until all Mappers are done. If that's not the
case, all operations which are not commutative and associative will yield
incorrect result.

Thanks and Regards,

Rishi Yadav

(o) 408.988.2000x113 ||  (f) 408.716.2726

InfoObjects Inc || http://www.infoobjects.com *(Big Data Solutions)*

*INC 500 Fastest growing company in 2012 || 2011*

*Best Place to work in Bay Area 2012 - *SF Business Times and the Silicon
Valley / San Jose Business Journal

2041 Mission College Boulevard, #280 || Santa Clara, CA 95054

On Sat, Dec 22, 2012 at 5:25 AM, Lin Ma <li...@gmail.com> wrote:

> Hi guys,
>
> Supposing in a Hadoop job, there are both mappers and reducers. My
> question is, reducer tasks cannot begin until all mapper tasks complete? If
> so, why designed in this way?
>
> thanks in advance,
> Lin
>

Re: reducer tasks start time issue

Posted by Harsh J <ha...@cloudera.com>.

A reduce can't process the complete data set until it has fetched all
partitions. And any map may produce a partition for any reducer.
Hence, we generally wait before all maps have terminated, and their
partition outputs ready and copied over to reduces, before we begin to
group and process the keys.

However, given that you began thinking about this, this paper on
"Online" Hadoop may interest you:
http://www.neilconway.org/docs/nsdi2010_hop.pdf

On Sat, Dec 22, 2012 at 6:55 PM, Lin Ma <li...@gmail.com> wrote:
> Hi guys,
>
> Supposing in a Hadoop job, there are both mappers and reducers. My question
> is, reducer tasks cannot begin until all mapper tasks complete? If so, why
> designed in this way?
>
> thanks in advance,
> Lin

-- 
Harsh J