You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by abc xyz <fa...@yahoo.com> on 2010/08/09 17:30:11 UTC
Total order partitioner [Modified]
1) The input splits are sampled when we use the total order partitioner provided
in Hadoop 0.19. I want to
know how and when this sampling is done. Is this sampling done before Master
allocates tasks to the nodes since the sampling file has to be added to
distributed cache as well. If it is so, is this sampling carried out at master
node? Then master has to access the input splits for getting the samples?
2) Also, does total order partitioner allow such ranges where a key can belong
to more than one ranges? I mean something like this, A, C, D, D, H, Y where
keys from A and C sent to one partition, Keys from C to D sent to 2nd
partition, Keys with value D can be sent randomly either to 2nd or 3rd
partition, and so on. or are these ranges mutually exclusive?
Re: Total order partitioner [Modified]
Posted by Gang Luo <lg...@yahoo.com.cn>.
the sampling is done at the master node by accessing the splits before the job
is submitted. The partitioner, by default, should only sent one key to one
partition exclusively, unless you modify it.
-Gang
----- 原始邮件 ----
发件人: abc xyz <fa...@yahoo.com>
收件人: common-user@hadoop.apache.org
发送日期: 2010/8/9 (周一) 11:30:11 上午
主 题: Total order partitioner [Modified]
1) The input splits are sampled when we use the total order partitioner provided
in Hadoop 0.19. I want to
know how and when this sampling is done. Is this sampling done before Master
allocates tasks to the nodes since the sampling file has to be added to
distributed cache as well. If it is so, is this sampling carried out at master
node? Then master has to access the input splits for getting the samples?
2) Also, does total order partitioner allow such ranges where a key can belong
to more than one ranges? I mean something like this, A, C, D, D, H, Y where
keys from A and C sent to one partition, Keys from C to D sent to 2nd
partition, Keys with value D can be sent randomly either to 2nd or 3rd
partition, and so on. or are these ranges mutually exclusive?