You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Han JU <ju...@gmail.com> on 2013/05/10 17:19:41 UTC

question about combiner

Hi,

For a MapReduce job with lots of intermediate results between mapper and
reducer, I implement a combiner function with a more compact representation
of the result data and I verified the final result is good when using
combiner. But when I look at the job counter "FILE_BYTES_WRITTEN" or
"Reduce shuffle bytes", the number with combiner is twice bigger than
without combiner. In my comprehension, these two counters represent the
output size of mapper. And with a combiner, the size of mapper output
should decrease, but it's not the case here.

So it means that my combiner doesn't work and it actually increase the size
of mapper output?

Thanks!
-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Re: question about combiner

Posted by Shahab Yunus <sh...@gmail.com>.

@Kishore, Agreed but but shouldn't 'Reduce shuffle bytes' count decrease
with the use of Combiners?

Regards,
Shahab


On Fri, May 10, 2013 at 2:00 PM, Kishore <al...@gmail.com> wrote:

> Combiner will be used between mapper and reduce, so the mapper output for
> both with used combiner and without used combiner are same.
>
> Thanks,
> Kishore.
>
> Sent from my iPhone
>
> On 10-May-2013, at 8:49 PM, Han JU <ju...@gmail.com> wrote:
>
> Hi,
>
> For a MapReduce job with lots of intermediate results between mapper and
> reducer, I implement a combiner function with a more compact representation
> of the result data and I verified the final result is good when using
> combiner. But when I look at the job counter "FILE_BYTES_WRITTEN" or
> "Reduce shuffle bytes", the number with combiner is twice bigger than
> without combiner. In my comprehension, these two counters represent the
> output size of mapper. And with a combiner, the size of mapper output
> should decrease, but it's not the case here.
>
> So it means that my combiner doesn't work and it actually increase the
> size of mapper output?
>
> Thanks!
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>
>

Re: question about combiner

Posted by Shahab Yunus <sh...@gmail.com>.

@Kishore, Agreed but but shouldn't 'Reduce shuffle bytes' count decrease
with the use of Combiners?

Regards,
Shahab


On Fri, May 10, 2013 at 2:00 PM, Kishore <al...@gmail.com> wrote:

> Combiner will be used between mapper and reduce, so the mapper output for
> both with used combiner and without used combiner are same.
>
> Thanks,
> Kishore.
>
> Sent from my iPhone
>
> On 10-May-2013, at 8:49 PM, Han JU <ju...@gmail.com> wrote:
>
> Hi,
>
> For a MapReduce job with lots of intermediate results between mapper and
> reducer, I implement a combiner function with a more compact representation
> of the result data and I verified the final result is good when using
> combiner. But when I look at the job counter "FILE_BYTES_WRITTEN" or
> "Reduce shuffle bytes", the number with combiner is twice bigger than
> without combiner. In my comprehension, these two counters represent the
> output size of mapper. And with a combiner, the size of mapper output
> should decrease, but it's not the case here.
>
> So it means that my combiner doesn't work and it actually increase the
> size of mapper output?
>
> Thanks!
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>
>

Re: question about combiner

Posted by Shahab Yunus <sh...@gmail.com>.

@Kishore, Agreed but but shouldn't 'Reduce shuffle bytes' count decrease
with the use of Combiners?

Regards,
Shahab


On Fri, May 10, 2013 at 2:00 PM, Kishore <al...@gmail.com> wrote:

> Combiner will be used between mapper and reduce, so the mapper output for
> both with used combiner and without used combiner are same.
>
> Thanks,
> Kishore.
>
> Sent from my iPhone
>
> On 10-May-2013, at 8:49 PM, Han JU <ju...@gmail.com> wrote:
>
> Hi,
>
> For a MapReduce job with lots of intermediate results between mapper and
> reducer, I implement a combiner function with a more compact representation
> of the result data and I verified the final result is good when using
> combiner. But when I look at the job counter "FILE_BYTES_WRITTEN" or
> "Reduce shuffle bytes", the number with combiner is twice bigger than
> without combiner. In my comprehension, these two counters represent the
> output size of mapper. And with a combiner, the size of mapper output
> should decrease, but it's not the case here.
>
> So it means that my combiner doesn't work and it actually increase the
> size of mapper output?
>
> Thanks!
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>
>

Re: question about combiner

Posted by Shahab Yunus <sh...@gmail.com>.

@Kishore, Agreed but but shouldn't 'Reduce shuffle bytes' count decrease
with the use of Combiners?

Regards,
Shahab


On Fri, May 10, 2013 at 2:00 PM, Kishore <al...@gmail.com> wrote:

> Combiner will be used between mapper and reduce, so the mapper output for
> both with used combiner and without used combiner are same.
>
> Thanks,
> Kishore.
>
> Sent from my iPhone
>
> On 10-May-2013, at 8:49 PM, Han JU <ju...@gmail.com> wrote:
>
> Hi,
>
> For a MapReduce job with lots of intermediate results between mapper and
> reducer, I implement a combiner function with a more compact representation
> of the result data and I verified the final result is good when using
> combiner. But when I look at the job counter "FILE_BYTES_WRITTEN" or
> "Reduce shuffle bytes", the number with combiner is twice bigger than
> without combiner. In my comprehension, these two counters represent the
> output size of mapper. And with a combiner, the size of mapper output
> should decrease, but it's not the case here.
>
> So it means that my combiner doesn't work and it actually increase the
> size of mapper output?
>
> Thanks!
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>
>

Re: question about combiner

Posted by Kishore <al...@gmail.com>.

Combiner will be used between mapper and reduce, so the mapper output for both with used combiner and without used combiner are same.

Thanks,
Kishore.

Sent from my iPhone

On 10-May-2013, at 8:49 PM, Han JU <ju...@gmail.com> wrote:

> Hi,
> 
> For a MapReduce job with lots of intermediate results between mapper and reducer, I implement a combiner function with a more compact representation of the result data and I verified the final result is good when using combiner. But when I look at the job counter "FILE_BYTES_WRITTEN" or "Reduce shuffle bytes", the number with combiner is twice bigger than without combiner. In my comprehension, these two counters represent the output size of mapper. And with a combiner, the size of mapper output should decrease, but it's not the case here.
> 
> So it means that my combiner doesn't work and it actually increase the size of mapper output? 
> 
> Thanks!
> -- 
> JU Han
> 
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
>      GI06 - Fouille de Données et Décisionnel
> 
> +33 0619608888

Re: question about combiner

Posted by Kishore <al...@gmail.com>.

Combiner will be used between mapper and reduce, so the mapper output for both with used combiner and without used combiner are same.

Thanks,
Kishore.

Sent from my iPhone

On 10-May-2013, at 8:49 PM, Han JU <ju...@gmail.com> wrote:

> Hi,
> 
> For a MapReduce job with lots of intermediate results between mapper and reducer, I implement a combiner function with a more compact representation of the result data and I verified the final result is good when using combiner. But when I look at the job counter "FILE_BYTES_WRITTEN" or "Reduce shuffle bytes", the number with combiner is twice bigger than without combiner. In my comprehension, these two counters represent the output size of mapper. And with a combiner, the size of mapper output should decrease, but it's not the case here.
> 
> So it means that my combiner doesn't work and it actually increase the size of mapper output? 
> 
> Thanks!
> -- 
> JU Han
> 
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
>      GI06 - Fouille de Données et Décisionnel
> 
> +33 0619608888

Re: question about combiner

Posted by Kishore <al...@gmail.com>.

Combiner will be used between mapper and reduce, so the mapper output for both with used combiner and without used combiner are same.

Thanks,
Kishore.

Sent from my iPhone

On 10-May-2013, at 8:49 PM, Han JU <ju...@gmail.com> wrote:

> Hi,
> 
> For a MapReduce job with lots of intermediate results between mapper and reducer, I implement a combiner function with a more compact representation of the result data and I verified the final result is good when using combiner. But when I look at the job counter "FILE_BYTES_WRITTEN" or "Reduce shuffle bytes", the number with combiner is twice bigger than without combiner. In my comprehension, these two counters represent the output size of mapper. And with a combiner, the size of mapper output should decrease, but it's not the case here.
> 
> So it means that my combiner doesn't work and it actually increase the size of mapper output? 
> 
> Thanks!
> -- 
> JU Han
> 
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
>      GI06 - Fouille de Données et Décisionnel
> 
> +33 0619608888

Re: question about combiner

Posted by Kishore <al...@gmail.com>.

Combiner will be used between mapper and reduce, so the mapper output for both with used combiner and without used combiner are same.

Thanks,
Kishore.

Sent from my iPhone

On 10-May-2013, at 8:49 PM, Han JU <ju...@gmail.com> wrote:

> Hi,
> 
> For a MapReduce job with lots of intermediate results between mapper and reducer, I implement a combiner function with a more compact representation of the result data and I verified the final result is good when using combiner. But when I look at the job counter "FILE_BYTES_WRITTEN" or "Reduce shuffle bytes", the number with combiner is twice bigger than without combiner. In my comprehension, these two counters represent the output size of mapper. And with a combiner, the size of mapper output should decrease, but it's not the case here.
> 
> So it means that my combiner doesn't work and it actually increase the size of mapper output? 
> 
> Thanks!
> -- 
> JU Han
> 
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
>      GI06 - Fouille de Données et Décisionnel
> 
> +33 0619608888