You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Pratap M <mc...@gmail.com> on 2014/03/14 10:34:54 UTC

Partitioner Can any one please clarify my question?

Hi,

I understand that the mapper produces 1 partition per reducer. How does the
reducer know which partition to copy? Lets say there are 2 nodes running
mapper for word count program and there are 2 reducers configured. If each
map node produces 2 partitions, with the possibility of partitions in both
the nodes containing same word as key, how will the reducer work correctly?

For ex:

If node 1 produces partition 1 and partition 2, and partition 1 contains a
key named "WHO".

If node 2 produces partition 3 and partition 4, and partition 3 contains a
key named "WHO".

If Partition 1 and Partition 4 went to reducer 1 (and remaining to reducer
2), how does the reducer 1 compute the correct word count?

If this is not a possibility, and partition 1 and 3 would be made to go to
reducer 1, how Hadoop does this? Does it make sure a given key-value pair
from different nodes always go to a same reducer? If so, how it does this?

Re: Partitioner Can any one please clarify my question?

Posted by Bertrand Dechoux <de...@gmail.com>.

By using the default partitioner the key "who" will go to the partitions
with the same number and the reducer will collect all partition for a given
number. Reducer 0 all partitions 0 and so on. But yes, you can mess up with
the partitioner logic for good or evil, if that's your question. With great
power, comes great responsibility.

Regards

Bertrand

Bertrand Dechoux


On Fri, Mar 14, 2014 at 10:34 AM, Pratap M <mc...@gmail.com> wrote:

> Hi,
>
> I understand that the mapper produces 1 partition per reducer. How does
> the reducer know which partition to copy? Lets say there are 2 nodes
> running mapper for word count program and there are 2 reducers configured.
> If each map node produces 2 partitions, with the possibility of partitions
> in both the nodes containing same word as key, how will the reducer work
> correctly?
>
> For ex:
>
> If node 1 produces partition 1 and partition 2, and partition 1 contains a
> key named "WHO".
>
> If node 2 produces partition 3 and partition 4, and partition 3 contains a
> key named "WHO".
>
> If Partition 1 and Partition 4 went to reducer 1 (and remaining to reducer
> 2), how does the reducer 1 compute the correct word count?
>
> If this is not a possibility, and partition 1 and 3 would be made to go to
> reducer 1, how Hadoop does this? Does it make sure a given key-value pair
> from different nodes always go to a same reducer? If so, how it does this?
>

Re: Partitioner Can any one please clarify my question?

Posted by Bertrand Dechoux <de...@gmail.com>.

By using the default partitioner the key "who" will go to the partitions
with the same number and the reducer will collect all partition for a given
number. Reducer 0 all partitions 0 and so on. But yes, you can mess up with
the partitioner logic for good or evil, if that's your question. With great
power, comes great responsibility.

Regards

Bertrand

Bertrand Dechoux


On Fri, Mar 14, 2014 at 10:34 AM, Pratap M <mc...@gmail.com> wrote:

> Hi,
>
> I understand that the mapper produces 1 partition per reducer. How does
> the reducer know which partition to copy? Lets say there are 2 nodes
> running mapper for word count program and there are 2 reducers configured.
> If each map node produces 2 partitions, with the possibility of partitions
> in both the nodes containing same word as key, how will the reducer work
> correctly?
>
> For ex:
>
> If node 1 produces partition 1 and partition 2, and partition 1 contains a
> key named "WHO".
>
> If node 2 produces partition 3 and partition 4, and partition 3 contains a
> key named "WHO".
>
> If Partition 1 and Partition 4 went to reducer 1 (and remaining to reducer
> 2), how does the reducer 1 compute the correct word count?
>
> If this is not a possibility, and partition 1 and 3 would be made to go to
> reducer 1, how Hadoop does this? Does it make sure a given key-value pair
> from different nodes always go to a same reducer? If so, how it does this?
>

Re: Partitioner Can any one please clarify my question?

Posted by Bertrand Dechoux <de...@gmail.com>.

By using the default partitioner the key "who" will go to the partitions
with the same number and the reducer will collect all partition for a given
number. Reducer 0 all partitions 0 and so on. But yes, you can mess up with
the partitioner logic for good or evil, if that's your question. With great
power, comes great responsibility.

Regards

Bertrand

Bertrand Dechoux


On Fri, Mar 14, 2014 at 10:34 AM, Pratap M <mc...@gmail.com> wrote:

> Hi,
>
> I understand that the mapper produces 1 partition per reducer. How does
> the reducer know which partition to copy? Lets say there are 2 nodes
> running mapper for word count program and there are 2 reducers configured.
> If each map node produces 2 partitions, with the possibility of partitions
> in both the nodes containing same word as key, how will the reducer work
> correctly?
>
> For ex:
>
> If node 1 produces partition 1 and partition 2, and partition 1 contains a
> key named "WHO".
>
> If node 2 produces partition 3 and partition 4, and partition 3 contains a
> key named "WHO".
>
> If Partition 1 and Partition 4 went to reducer 1 (and remaining to reducer
> 2), how does the reducer 1 compute the correct word count?
>
> If this is not a possibility, and partition 1 and 3 would be made to go to
> reducer 1, how Hadoop does this? Does it make sure a given key-value pair
> from different nodes always go to a same reducer? If so, how it does this?
>

Re: Partitioner Can any one please clarify my question?

Posted by Bertrand Dechoux <de...@gmail.com>.

By using the default partitioner the key "who" will go to the partitions
with the same number and the reducer will collect all partition for a given
number. Reducer 0 all partitions 0 and so on. But yes, you can mess up with
the partitioner logic for good or evil, if that's your question. With great
power, comes great responsibility.

Regards

Bertrand

Bertrand Dechoux


On Fri, Mar 14, 2014 at 10:34 AM, Pratap M <mc...@gmail.com> wrote:

> Hi,
>
> I understand that the mapper produces 1 partition per reducer. How does
> the reducer know which partition to copy? Lets say there are 2 nodes
> running mapper for word count program and there are 2 reducers configured.
> If each map node produces 2 partitions, with the possibility of partitions
> in both the nodes containing same word as key, how will the reducer work
> correctly?
>
> For ex:
>
> If node 1 produces partition 1 and partition 2, and partition 1 contains a
> key named "WHO".
>
> If node 2 produces partition 3 and partition 4, and partition 3 contains a
> key named "WHO".
>
> If Partition 1 and Partition 4 went to reducer 1 (and remaining to reducer
> 2), how does the reducer 1 compute the correct word count?
>
> If this is not a possibility, and partition 1 and 3 would be made to go to
> reducer 1, how Hadoop does this? Does it make sure a given key-value pair
> from different nodes always go to a same reducer? If so, how it does this?
>