You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/04/05 22:30:17 UTC
Difference between combiner and aggregator
Hi,
I am trying to understand the difference between combiner and aggregator.
Based on my readings:
Wordcount example (mapper)
aggregator
class Mapper
method MAP
H <-- Associative array
for all term t in document:
H{t} = H{t} + 1
for all term t ele H do
EMIT(term t, count H{t})
combiner:
class Mapper
method INITIALIZE
H <-- Associative array
method MAP
for all term t in document:
H{t} = H{t} + 1
method CLOSE
for all term t ele H do
EMIT(term t, count H{t})
So, second method is how combiner is implemented.
But 1 seems much simpler?
What are the gains I get using combiner instead of local aggregations?
Re: Difference between combiner and aggregator
Posted by Jens Scheidtmann <je...@gmail.com>.
Dear jamal sasha,
The usual example goes like this:
class Mapper
method MAP (Line l)
document <- split l in Terms t
for all Terms t in document
EMIT(Term t, one)
class Combiner
method REDUCE(Term t, List of Counts lc)
cnt <- sum lc
EMIT(Term t, Count cnt)
class Reducer
method REDUCE(Term t, List of Counts lc)
cnt <- sum lc
EMIT(Term t, Count cnt)
The combiner is run node local on mapper output (before the shuffle). It's
output is used as input to the reducers (after the shuffle). A combiner is
an I/O optimization. There are no guarantees by the framework, if a
combiner will be called at all, one or more times on the output.
Best regards,
Jens
Re: Difference between combiner and aggregator
Posted by Jens Scheidtmann <je...@gmail.com>.
Dear jamal sasha,
The usual example goes like this:
class Mapper
method MAP (Line l)
document <- split l in Terms t
for all Terms t in document
EMIT(Term t, one)
class Combiner
method REDUCE(Term t, List of Counts lc)
cnt <- sum lc
EMIT(Term t, Count cnt)
class Reducer
method REDUCE(Term t, List of Counts lc)
cnt <- sum lc
EMIT(Term t, Count cnt)
The combiner is run node local on mapper output (before the shuffle). It's
output is used as input to the reducers (after the shuffle). A combiner is
an I/O optimization. There are no guarantees by the framework, if a
combiner will be called at all, one or more times on the output.
Best regards,
Jens
Re: Difference between combiner and aggregator
Posted by Jens Scheidtmann <je...@gmail.com>.
Dear jamal sasha,
The usual example goes like this:
class Mapper
method MAP (Line l)
document <- split l in Terms t
for all Terms t in document
EMIT(Term t, one)
class Combiner
method REDUCE(Term t, List of Counts lc)
cnt <- sum lc
EMIT(Term t, Count cnt)
class Reducer
method REDUCE(Term t, List of Counts lc)
cnt <- sum lc
EMIT(Term t, Count cnt)
The combiner is run node local on mapper output (before the shuffle). It's
output is used as input to the reducers (after the shuffle). A combiner is
an I/O optimization. There are no guarantees by the framework, if a
combiner will be called at all, one or more times on the output.
Best regards,
Jens
Re: Difference between combiner and aggregator
Posted by Jens Scheidtmann <je...@gmail.com>.
Dear jamal sasha,
The usual example goes like this:
class Mapper
method MAP (Line l)
document <- split l in Terms t
for all Terms t in document
EMIT(Term t, one)
class Combiner
method REDUCE(Term t, List of Counts lc)
cnt <- sum lc
EMIT(Term t, Count cnt)
class Reducer
method REDUCE(Term t, List of Counts lc)
cnt <- sum lc
EMIT(Term t, Count cnt)
The combiner is run node local on mapper output (before the shuffle). It's
output is used as input to the reducers (after the shuffle). A combiner is
an I/O optimization. There are no guarantees by the framework, if a
combiner will be called at all, one or more times on the output.
Best regards,
Jens