You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by 静行 <xi...@taobao.com> on 2012/07/05 06:23:19 UTC

答复: How To Distribute One Map Data To All Reduce Tasks?

Thanks!
But what I really want to know is how can I distribute one map data to every reduce task, not one of reduce tasks.
Do you have some ideas?

发件人: Devaraj k [mailto:devaraj.k@huawei.com]
发送时间: 2012年7月5日 12:12
收件人: mapreduce-user@hadoop.apache.org
主题: RE: How To Distribute One Map Data To All Reduce Tasks?


You can distribute the map data to the reduce tasks using Partitioner.  By default Job uses the HashPartitioner. You can use custom Partitioner it according to your need.



http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Partitioner.html



Thanks

Devaraj

________________________________
From: 静行 [xiaoyong.dengxy@taobao.com]
Sent: Thursday, July 05, 2012 9:00 AM
To: mapreduce-user@hadoop.apache.org
Subject: How To Distribute One Map Data To All Reduce Tasks?
Hi all:
         How can I distribute one map data to all reduce tasks?

________________________________

This email (including any attachments) is confidential and may be legally privileged. If you received this email in error, please delete it immediately and do not copy it or use it for any purpose or disclose its contents to any other person. Thank you.

本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。

Re: 答复: How To Distribute One Map Data To All Reduce Tasks?

Posted by Karthik Kambatla <ka...@cloudera.com>.
One way to achieve this would be to:

   1. Emit the same value multiple times, each time with a different key.
   2. Use these different keys, in conjunction with the partitioner, to
   achieve the desired distribution.

Hope that helps!

Karthik

On Thu, Jul 5, 2012 at 12:19 AM, 静行 <xi...@taobao.com> wrote:

>  I have different key values to join two tables, but only a few key
> values have large data to join and cost the most time, so I want to
> distribute these key values to every reduce to join****
>
> ** **
>
> *发件人:* Devaraj k [mailto:devaraj.k@huawei.com]
> *发送时间:* 2012年7月5日 14:06
> *收件人:* mapreduce-user@hadoop.apache.org
> *主题:* RE: How To Distribute One Map Data To All Reduce Tasks?****
>
>  ** **
>
> Can you explain your usecase with some more details?****
>
>  ****
>
> Thanks****
>
> Devaraj****
>  ------------------------------
>
> *From:* 静行 [xiaoyong.dengxy@taobao.com]
> *Sent:* Thursday, July 05, 2012 9:53 AM
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* 答复: How To Distribute One Map Data To All Reduce Tasks?****
>
> Thanks!****
>
> But what I really want to know is how can I distribute one map data to
> every reduce task, not one of reduce tasks.****
>
> Do you have some ideas?****
>
>  ****
>
> *发件人:* Devaraj k [mailto:devaraj.k@huawei.com]
> *发送时间:* 2012年7月5日 12:12
> *收件人:* mapreduce-user@hadoop.apache.org
> *主题:* RE: How To Distribute One Map Data To All Reduce Tasks?****
>
>  ****
>
> You can distribute the map data to the reduce tasks using Partitioner.  By
> default Job uses the HashPartitioner. You can use custom Partitioner it
> according to your need.****
>
>  ****
>
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Partitioner.html
> ****
>
>  ****
>
> Thanks****
>
> Devaraj****
>  ------------------------------
>
> *From:* 静行 [xiaoyong.dengxy@taobao.com]
> *Sent:* Thursday, July 05, 2012 9:00 AM
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* How To Distribute One Map Data To All Reduce Tasks?****
>
> Hi all:****
>
>          How can I distribute one map data to all reduce tasks?****
>
>  ****
>  ------------------------------
>
>
> This email (including any attachments) is confidential and may be legally
> privileged. If you received this email in error, please delete it
> immediately and do not copy it or use it for any purpose or disclose its
> contents to any other person. Thank you.
>
> 本电邮(包括任何附件)
> 可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。****
>

答复: How To Distribute One Map Data To All Reduce Tasks?

Posted by 静行 <xi...@taobao.com>.
I have different key values to join two tables, but only a few key values have large data to join and cost the most time, so I want to distribute these key values to every reduce to join

发件人: Devaraj k [mailto:devaraj.k@huawei.com]
发送时间: 2012年7月5日 14:06
收件人: mapreduce-user@hadoop.apache.org
主题: RE: How To Distribute One Map Data To All Reduce Tasks?


Can you explain your usecase with some more details?



Thanks

Devaraj

________________________________
From: 静行 [xiaoyong.dengxy@taobao.com]
Sent: Thursday, July 05, 2012 9:53 AM
To: mapreduce-user@hadoop.apache.org
Subject: 答复: How To Distribute One Map Data To All Reduce Tasks?
Thanks!
But what I really want to know is how can I distribute one map data to every reduce task, not one of reduce tasks.
Do you have some ideas?

发件人: Devaraj k [mailto:devaraj.k@huawei.com]
发送时间: 2012年7月5日 12:12
收件人: mapreduce-user@hadoop.apache.org
主题: RE: How To Distribute One Map Data To All Reduce Tasks?


You can distribute the map data to the reduce tasks using Partitioner.  By default Job uses the HashPartitioner. You can use custom Partitioner it according to your need.



http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Partitioner.html



Thanks

Devaraj

________________________________
From: 静行 [xiaoyong.dengxy@taobao.com]
Sent: Thursday, July 05, 2012 9:00 AM
To: mapreduce-user@hadoop.apache.org
Subject: How To Distribute One Map Data To All Reduce Tasks?
Hi all:
         How can I distribute one map data to all reduce tasks?

________________________________

This email (including any attachments) is confidential and may be legally privileged. If you received this email in error, please delete it immediately and do not copy it or use it for any purpose or disclose its contents to any other person. Thank you.

本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。

RE: How To Distribute One Map Data To All Reduce Tasks?

Posted by Devaraj k <de...@huawei.com>.
Can you explain your usecase with some more details?



Thanks

Devaraj

________________________________
From: 静行 [xiaoyong.dengxy@taobao.com]
Sent: Thursday, July 05, 2012 9:53 AM
To: mapreduce-user@hadoop.apache.org
Subject: 答复: How To Distribute One Map Data To All Reduce Tasks?

Thanks!
But what I really want to know is how can I distribute one map data to every reduce task, not one of reduce tasks.
Do you have some ideas?

发件人: Devaraj k [mailto:devaraj.k@huawei.com]
发送时间: 2012年7月5日 12:12
收件人: mapreduce-user@hadoop.apache.org
主题: RE: How To Distribute One Map Data To All Reduce Tasks?


You can distribute the map data to the reduce tasks using Partitioner.  By default Job uses the HashPartitioner. You can use custom Partitioner it according to your need.



http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Partitioner.html



Thanks

Devaraj

________________________________
From: 静行 [xiaoyong.dengxy@taobao.com]
Sent: Thursday, July 05, 2012 9:00 AM
To: mapreduce-user@hadoop.apache.org
Subject: How To Distribute One Map Data To All Reduce Tasks?
Hi all:
         How can I distribute one map data to all reduce tasks?

________________________________

This email (including any attachments) is confidential and may be legally privileged. If you received this email in error, please delete it immediately and do not copy it or use it for any purpose or disclose its contents to any other person. Thank you.

本电邮(包括任何附件)可能含有机密资料并受法律保护。如您不是正确的收件人,请您立即删除本邮件。请不要将本电邮进行复制并用作任何其他用途、或透露本邮件之内容。谢谢。