You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by xeonmailinglist-gmail <xe...@gmail.com> on 2015/03/11 11:28:53 UTC

Prune out data to a specific reduce task

Hi,

I have this job that has 3 map tasks and 2 reduce tasks. But, I want to 
excludes data that will go to the reduce task 2. This means that, only 
reducer 1 will produce data, and the other one will be empty, or even it 
doesn't execute.

How can I do this in MapReduce?

Example Job Execution


Thanks,

-- 
--

Re: Prune out data to a specific reduce task

Posted by Fei Hu <hu...@gmail.com>.

In the Reducer.class, you could ignore the data that you want to exclude based on the key or value.


> On Mar 12, 2015, at 12:47 PM, xeonmailinglist-gmail <xe...@gmail.com> wrote:
> 
> If I use the partitioner, I must be able to tell map reduce to not execute values from a certain reduce tasks.
> 
> The method public int
>           getPartition(K key, V value, int numReduceTasks) must always return a partition. I can’t return -1. Thus, I don’ t know how to tell Mapreduce to not execute data from a partition. Any suggestion?
> 
> ———— Forwarded Message ————
> 
> Subject: Re: Prune out data to a specific reduce task
> 
> Date: Thu, 12 Mar 2015 12:40:04 -0400
> 
> From: Fei Hu hufei68@gmail.com <http://mailto:hufei68@gmail.com/>
> Reply-To: user@hadoop.apache.org <ma...@hadoop.apache.org>
> To: user@hadoop.apache.org <ma...@hadoop.apache.org>
> Maybe you could use Partitioner.class to solve your problem.
> 
> 
> 
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute.
>> 
>> How can I do this in MapReduce?
>> 
>> <ExampleJobExecution.png>
>> 
>> 
>> Thanks,
>> 
>> -- 
>> --
> 
>

Re: Prune out data to a specific reduce task

Posted by Azuryy Yu <az...@gmail.com>.

Hi,
Can you set only one reduce task? why did you want set up two reudce tasks
and only one work?


On Mon, Mar 16, 2015 at 9:04 AM, Drake민영근 <dr...@nexr.com> wrote:

> Hi,
>
> If you write custom partitioner, just call them to confrim the key match
> with which partition.
>
> You can get the number of reduer from mapcontext.getNumReduceTasks().
> then, get reducer number from Partitioner.getPartition(key, value,
> numReduc). Finally, just write wanted records to the reducers.
>
> Caution: In this way, the parallelism of mapreduce programming model is
> much broken. If you cut the records for Reducer 2, the task still up but
> nothing in action.
>
> Thanks.
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Fri, Mar 13, 2015 at 11:47 PM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>>  Hi,
>>
>> The only obstacle is to know to which partition the map output would go.
>> 1 ~ From the map method, how can I know to which partition the output go?
>> 2 ~ Can I call getPartition(K key, V value, int numReduceTasks) from the
>> map function?
>>
>> Thanks,
>>
>>
>>
>>
>>
>> On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
>>
>> I think Drake's comment
>> "In the map method, records would be ignored with no output.collect() or
>> context.write()."
>> is most valid way to do it as it will avoid further processing downstream
>> and hence less resources would be consumed, as unwanted records are pruned
>> at the source itself.
>> Is there any obstacle from doing this in your map method ?
>>
>>  Regards,
>> Naga
>>  ------------------------------
>> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
>> *Sent:* Thursday, March 12, 2015 22:17
>> *To:* user@hadoop.apache.org
>> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>>
>>   If I use the partitioner, I must be able to tell map reduce to not
>> execute values from a certain reduce tasks.
>>
>> The method public int getPartition(K key, V value, int numReduceTasks)
>> must always return a partition. I can’t return -1. Thus, I don’ t know how
>> to tell Mapreduce to not execute data from a partition. Any suggestion?
>>
>> ———— Forwarded Message ————
>>
>> Subject: Re: Prune out data to a specific reduce task
>>
>> Date: Thu, 12 Mar 2015 12:40:04 -0400
>>
>> From: Fei Hu hufei68@gmail.com <ht...@gmail.com>
>>
>> Reply-To: user@hadoop.apache.org
>>
>> To: user@hadoop.apache.org
>>
>> Maybe you could use Partitioner.class to solve your problem.
>>
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <
>> xeonmailinglist@gmail.com> wrote:
>>
>>  Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
>> excludes data that will go to the reduce task 2. This means that, only
>> reducer 1 will produce data, and the other one will be empty, or even it
>> doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> <ExampleJobExecution.png>
>>
>>
>> Thanks,
>>
>> --
>> --
>>
>>    
>>
>>
>> --
>> --
>>
>>
>

Re: Prune out data to a specific reduce task

Posted by Azuryy Yu <az...@gmail.com>.

Hi,
Can you set only one reduce task? why did you want set up two reudce tasks
and only one work?


On Mon, Mar 16, 2015 at 9:04 AM, Drake민영근 <dr...@nexr.com> wrote:

> Hi,
>
> If you write custom partitioner, just call them to confrim the key match
> with which partition.
>
> You can get the number of reduer from mapcontext.getNumReduceTasks().
> then, get reducer number from Partitioner.getPartition(key, value,
> numReduc). Finally, just write wanted records to the reducers.
>
> Caution: In this way, the parallelism of mapreduce programming model is
> much broken. If you cut the records for Reducer 2, the task still up but
> nothing in action.
>
> Thanks.
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Fri, Mar 13, 2015 at 11:47 PM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>>  Hi,
>>
>> The only obstacle is to know to which partition the map output would go.
>> 1 ~ From the map method, how can I know to which partition the output go?
>> 2 ~ Can I call getPartition(K key, V value, int numReduceTasks) from the
>> map function?
>>
>> Thanks,
>>
>>
>>
>>
>>
>> On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
>>
>> I think Drake's comment
>> "In the map method, records would be ignored with no output.collect() or
>> context.write()."
>> is most valid way to do it as it will avoid further processing downstream
>> and hence less resources would be consumed, as unwanted records are pruned
>> at the source itself.
>> Is there any obstacle from doing this in your map method ?
>>
>>  Regards,
>> Naga
>>  ------------------------------
>> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
>> *Sent:* Thursday, March 12, 2015 22:17
>> *To:* user@hadoop.apache.org
>> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>>
>>   If I use the partitioner, I must be able to tell map reduce to not
>> execute values from a certain reduce tasks.
>>
>> The method public int getPartition(K key, V value, int numReduceTasks)
>> must always return a partition. I can’t return -1. Thus, I don’ t know how
>> to tell Mapreduce to not execute data from a partition. Any suggestion?
>>
>> ———— Forwarded Message ————
>>
>> Subject: Re: Prune out data to a specific reduce task
>>
>> Date: Thu, 12 Mar 2015 12:40:04 -0400
>>
>> From: Fei Hu hufei68@gmail.com <ht...@gmail.com>
>>
>> Reply-To: user@hadoop.apache.org
>>
>> To: user@hadoop.apache.org
>>
>> Maybe you could use Partitioner.class to solve your problem.
>>
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <
>> xeonmailinglist@gmail.com> wrote:
>>
>>  Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
>> excludes data that will go to the reduce task 2. This means that, only
>> reducer 1 will produce data, and the other one will be empty, or even it
>> doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> <ExampleJobExecution.png>
>>
>>
>> Thanks,
>>
>> --
>> --
>>
>>    
>>
>>
>> --
>> --
>>
>>
>

Re: Prune out data to a specific reduce task

Posted by Azuryy Yu <az...@gmail.com>.

Hi,
Can you set only one reduce task? why did you want set up two reudce tasks
and only one work?


On Mon, Mar 16, 2015 at 9:04 AM, Drake민영근 <dr...@nexr.com> wrote:

> Hi,
>
> If you write custom partitioner, just call them to confrim the key match
> with which partition.
>
> You can get the number of reduer from mapcontext.getNumReduceTasks().
> then, get reducer number from Partitioner.getPartition(key, value,
> numReduc). Finally, just write wanted records to the reducers.
>
> Caution: In this way, the parallelism of mapreduce programming model is
> much broken. If you cut the records for Reducer 2, the task still up but
> nothing in action.
>
> Thanks.
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Fri, Mar 13, 2015 at 11:47 PM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>>  Hi,
>>
>> The only obstacle is to know to which partition the map output would go.
>> 1 ~ From the map method, how can I know to which partition the output go?
>> 2 ~ Can I call getPartition(K key, V value, int numReduceTasks) from the
>> map function?
>>
>> Thanks,
>>
>>
>>
>>
>>
>> On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
>>
>> I think Drake's comment
>> "In the map method, records would be ignored with no output.collect() or
>> context.write()."
>> is most valid way to do it as it will avoid further processing downstream
>> and hence less resources would be consumed, as unwanted records are pruned
>> at the source itself.
>> Is there any obstacle from doing this in your map method ?
>>
>>  Regards,
>> Naga
>>  ------------------------------
>> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
>> *Sent:* Thursday, March 12, 2015 22:17
>> *To:* user@hadoop.apache.org
>> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>>
>>   If I use the partitioner, I must be able to tell map reduce to not
>> execute values from a certain reduce tasks.
>>
>> The method public int getPartition(K key, V value, int numReduceTasks)
>> must always return a partition. I can’t return -1. Thus, I don’ t know how
>> to tell Mapreduce to not execute data from a partition. Any suggestion?
>>
>> ———— Forwarded Message ————
>>
>> Subject: Re: Prune out data to a specific reduce task
>>
>> Date: Thu, 12 Mar 2015 12:40:04 -0400
>>
>> From: Fei Hu hufei68@gmail.com <ht...@gmail.com>
>>
>> Reply-To: user@hadoop.apache.org
>>
>> To: user@hadoop.apache.org
>>
>> Maybe you could use Partitioner.class to solve your problem.
>>
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <
>> xeonmailinglist@gmail.com> wrote:
>>
>>  Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
>> excludes data that will go to the reduce task 2. This means that, only
>> reducer 1 will produce data, and the other one will be empty, or even it
>> doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> <ExampleJobExecution.png>
>>
>>
>> Thanks,
>>
>> --
>> --
>>
>>    
>>
>>
>> --
>> --
>>
>>
>

Re: Prune out data to a specific reduce task

Posted by Azuryy Yu <az...@gmail.com>.

Hi,
Can you set only one reduce task? why did you want set up two reudce tasks
and only one work?


On Mon, Mar 16, 2015 at 9:04 AM, Drake민영근 <dr...@nexr.com> wrote:

> Hi,
>
> If you write custom partitioner, just call them to confrim the key match
> with which partition.
>
> You can get the number of reduer from mapcontext.getNumReduceTasks().
> then, get reducer number from Partitioner.getPartition(key, value,
> numReduc). Finally, just write wanted records to the reducers.
>
> Caution: In this way, the parallelism of mapreduce programming model is
> much broken. If you cut the records for Reducer 2, the task still up but
> nothing in action.
>
> Thanks.
>
> Drake 민영근 Ph.D
> kt NexR
>
> On Fri, Mar 13, 2015 at 11:47 PM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>>  Hi,
>>
>> The only obstacle is to know to which partition the map output would go.
>> 1 ~ From the map method, how can I know to which partition the output go?
>> 2 ~ Can I call getPartition(K key, V value, int numReduceTasks) from the
>> map function?
>>
>> Thanks,
>>
>>
>>
>>
>>
>> On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
>>
>> I think Drake's comment
>> "In the map method, records would be ignored with no output.collect() or
>> context.write()."
>> is most valid way to do it as it will avoid further processing downstream
>> and hence less resources would be consumed, as unwanted records are pruned
>> at the source itself.
>> Is there any obstacle from doing this in your map method ?
>>
>>  Regards,
>> Naga
>>  ------------------------------
>> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
>> *Sent:* Thursday, March 12, 2015 22:17
>> *To:* user@hadoop.apache.org
>> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>>
>>   If I use the partitioner, I must be able to tell map reduce to not
>> execute values from a certain reduce tasks.
>>
>> The method public int getPartition(K key, V value, int numReduceTasks)
>> must always return a partition. I can’t return -1. Thus, I don’ t know how
>> to tell Mapreduce to not execute data from a partition. Any suggestion?
>>
>> ———— Forwarded Message ————
>>
>> Subject: Re: Prune out data to a specific reduce task
>>
>> Date: Thu, 12 Mar 2015 12:40:04 -0400
>>
>> From: Fei Hu hufei68@gmail.com <ht...@gmail.com>
>>
>> Reply-To: user@hadoop.apache.org
>>
>> To: user@hadoop.apache.org
>>
>> Maybe you could use Partitioner.class to solve your problem.
>>
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <
>> xeonmailinglist@gmail.com> wrote:
>>
>>  Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
>> excludes data that will go to the reduce task 2. This means that, only
>> reducer 1 will produce data, and the other one will be empty, or even it
>> doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> <ExampleJobExecution.png>
>>
>>
>> Thanks,
>>
>> --
>> --
>>
>>    
>>
>>
>> --
>> --
>>
>>
>

Re: Prune out data to a specific reduce task

Posted by Drake민영근 <dr...@nexr.com>.

Hi,

If you write custom partitioner, just call them to confrim the key match
with which partition.

You can get the number of reduer from mapcontext.getNumReduceTasks(). then,
get reducer number from Partitioner.getPartition(key, value, numReduc).
Finally, just write wanted records to the reducers.

Caution: In this way, the parallelism of mapreduce programming model is
much broken. If you cut the records for Reducer 2, the task still up but
nothing in action.

Thanks.

Drake 민영근 Ph.D
kt NexR

On Fri, Mar 13, 2015 at 11:47 PM, xeonmailinglist-gmail <
xeonmailinglist@gmail.com> wrote:

>  Hi,
>
> The only obstacle is to know to which partition the map output would go.
> 1 ~ From the map method, how can I know to which partition the output go?
> 2 ~ Can I call getPartition(K key, V value, int numReduceTasks) from the
> map function?
>
> Thanks,
>
>
>
>
>
> On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
>
> I think Drake's comment
> "In the map method, records would be ignored with no output.collect() or
> context.write()."
> is most valid way to do it as it will avoid further processing downstream
> and hence less resources would be consumed, as unwanted records are pruned
> at the source itself.
> Is there any obstacle from doing this in your map method ?
>
>  Regards,
> Naga
>  ------------------------------
> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
> *Sent:* Thursday, March 12, 2015 22:17
> *To:* user@hadoop.apache.org
> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>
>   If I use the partitioner, I must be able to tell map reduce to not
> execute values from a certain reduce tasks.
>
> The method public int getPartition(K key, V value, int numReduceTasks)
> must always return a partition. I can’t return -1. Thus, I don’ t know how
> to tell Mapreduce to not execute data from a partition. Any suggestion?
>
> ———— Forwarded Message ————
>
> Subject: Re: Prune out data to a specific reduce task
>
> Date: Thu, 12 Mar 2015 12:40:04 -0400
>
> From: Fei Hu hufei68@gmail.com <ht...@gmail.com>
>
> Reply-To: user@hadoop.apache.org
>
> To: user@hadoop.apache.org
>
> Maybe you could use Partitioner.class to solve your problem.
>
> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>  Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
> excludes data that will go to the reduce task 2. This means that, only
> reducer 1 will produce data, and the other one will be empty, or even it
> doesn't execute.
>
> How can I do this in MapReduce?
>
> <ExampleJobExecution.png>
>
>
> Thanks,
>
> --
> --
>
>    
>
>
> --
> --
>
>

Re: Prune out data to a specific reduce task

Posted by Drake민영근 <dr...@nexr.com>.

Hi,

If you write custom partitioner, just call them to confrim the key match
with which partition.

You can get the number of reduer from mapcontext.getNumReduceTasks(). then,
get reducer number from Partitioner.getPartition(key, value, numReduc).
Finally, just write wanted records to the reducers.

Caution: In this way, the parallelism of mapreduce programming model is
much broken. If you cut the records for Reducer 2, the task still up but
nothing in action.

Thanks.

Drake 민영근 Ph.D
kt NexR

On Fri, Mar 13, 2015 at 11:47 PM, xeonmailinglist-gmail <
xeonmailinglist@gmail.com> wrote:

>  Hi,
>
> The only obstacle is to know to which partition the map output would go.
> 1 ~ From the map method, how can I know to which partition the output go?
> 2 ~ Can I call getPartition(K key, V value, int numReduceTasks) from the
> map function?
>
> Thanks,
>
>
>
>
>
> On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
>
> I think Drake's comment
> "In the map method, records would be ignored with no output.collect() or
> context.write()."
> is most valid way to do it as it will avoid further processing downstream
> and hence less resources would be consumed, as unwanted records are pruned
> at the source itself.
> Is there any obstacle from doing this in your map method ?
>
>  Regards,
> Naga
>  ------------------------------
> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
> *Sent:* Thursday, March 12, 2015 22:17
> *To:* user@hadoop.apache.org
> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>
>   If I use the partitioner, I must be able to tell map reduce to not
> execute values from a certain reduce tasks.
>
> The method public int getPartition(K key, V value, int numReduceTasks)
> must always return a partition. I can’t return -1. Thus, I don’ t know how
> to tell Mapreduce to not execute data from a partition. Any suggestion?
>
> ———— Forwarded Message ————
>
> Subject: Re: Prune out data to a specific reduce task
>
> Date: Thu, 12 Mar 2015 12:40:04 -0400
>
> From: Fei Hu hufei68@gmail.com <ht...@gmail.com>
>
> Reply-To: user@hadoop.apache.org
>
> To: user@hadoop.apache.org
>
> Maybe you could use Partitioner.class to solve your problem.
>
> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>  Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
> excludes data that will go to the reduce task 2. This means that, only
> reducer 1 will produce data, and the other one will be empty, or even it
> doesn't execute.
>
> How can I do this in MapReduce?
>
> <ExampleJobExecution.png>
>
>
> Thanks,
>
> --
> --
>
>    
>
>
> --
> --
>
>

Re: Prune out data to a specific reduce task

Posted by Drake민영근 <dr...@nexr.com>.

Hi,

If you write custom partitioner, just call them to confrim the key match
with which partition.

You can get the number of reduer from mapcontext.getNumReduceTasks(). then,
get reducer number from Partitioner.getPartition(key, value, numReduc).
Finally, just write wanted records to the reducers.

Caution: In this way, the parallelism of mapreduce programming model is
much broken. If you cut the records for Reducer 2, the task still up but
nothing in action.

Thanks.

Drake 민영근 Ph.D
kt NexR

On Fri, Mar 13, 2015 at 11:47 PM, xeonmailinglist-gmail <
xeonmailinglist@gmail.com> wrote:

>  Hi,
>
> The only obstacle is to know to which partition the map output would go.
> 1 ~ From the map method, how can I know to which partition the output go?
> 2 ~ Can I call getPartition(K key, V value, int numReduceTasks) from the
> map function?
>
> Thanks,
>
>
>
>
>
> On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
>
> I think Drake's comment
> "In the map method, records would be ignored with no output.collect() or
> context.write()."
> is most valid way to do it as it will avoid further processing downstream
> and hence less resources would be consumed, as unwanted records are pruned
> at the source itself.
> Is there any obstacle from doing this in your map method ?
>
>  Regards,
> Naga
>  ------------------------------
> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
> *Sent:* Thursday, March 12, 2015 22:17
> *To:* user@hadoop.apache.org
> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>
>   If I use the partitioner, I must be able to tell map reduce to not
> execute values from a certain reduce tasks.
>
> The method public int getPartition(K key, V value, int numReduceTasks)
> must always return a partition. I can’t return -1. Thus, I don’ t know how
> to tell Mapreduce to not execute data from a partition. Any suggestion?
>
> ———— Forwarded Message ————
>
> Subject: Re: Prune out data to a specific reduce task
>
> Date: Thu, 12 Mar 2015 12:40:04 -0400
>
> From: Fei Hu hufei68@gmail.com <ht...@gmail.com>
>
> Reply-To: user@hadoop.apache.org
>
> To: user@hadoop.apache.org
>
> Maybe you could use Partitioner.class to solve your problem.
>
> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>  Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
> excludes data that will go to the reduce task 2. This means that, only
> reducer 1 will produce data, and the other one will be empty, or even it
> doesn't execute.
>
> How can I do this in MapReduce?
>
> <ExampleJobExecution.png>
>
>
> Thanks,
>
> --
> --
>
>    
>
>
> --
> --
>
>

Re: Prune out data to a specific reduce task

Posted by Drake민영근 <dr...@nexr.com>.

Hi,

If you write custom partitioner, just call them to confrim the key match
with which partition.

You can get the number of reduer from mapcontext.getNumReduceTasks(). then,
get reducer number from Partitioner.getPartition(key, value, numReduc).
Finally, just write wanted records to the reducers.

Caution: In this way, the parallelism of mapreduce programming model is
much broken. If you cut the records for Reducer 2, the task still up but
nothing in action.

Thanks.

Drake 민영근 Ph.D
kt NexR

On Fri, Mar 13, 2015 at 11:47 PM, xeonmailinglist-gmail <
xeonmailinglist@gmail.com> wrote:

>  Hi,
>
> The only obstacle is to know to which partition the map output would go.
> 1 ~ From the map method, how can I know to which partition the output go?
> 2 ~ Can I call getPartition(K key, V value, int numReduceTasks) from the
> map function?
>
> Thanks,
>
>
>
>
>
> On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
>
> I think Drake's comment
> "In the map method, records would be ignored with no output.collect() or
> context.write()."
> is most valid way to do it as it will avoid further processing downstream
> and hence less resources would be consumed, as unwanted records are pruned
> at the source itself.
> Is there any obstacle from doing this in your map method ?
>
>  Regards,
> Naga
>  ------------------------------
> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
> *Sent:* Thursday, March 12, 2015 22:17
> *To:* user@hadoop.apache.org
> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>
>   If I use the partitioner, I must be able to tell map reduce to not
> execute values from a certain reduce tasks.
>
> The method public int getPartition(K key, V value, int numReduceTasks)
> must always return a partition. I can’t return -1. Thus, I don’ t know how
> to tell Mapreduce to not execute data from a partition. Any suggestion?
>
> ———— Forwarded Message ————
>
> Subject: Re: Prune out data to a specific reduce task
>
> Date: Thu, 12 Mar 2015 12:40:04 -0400
>
> From: Fei Hu hufei68@gmail.com <ht...@gmail.com>
>
> Reply-To: user@hadoop.apache.org
>
> To: user@hadoop.apache.org
>
> Maybe you could use Partitioner.class to solve your problem.
>
> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>  Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
> excludes data that will go to the reduce task 2. This means that, only
> reducer 1 will produce data, and the other one will be empty, or even it
> doesn't execute.
>
> How can I do this in MapReduce?
>
> <ExampleJobExecution.png>
>
>
> Thanks,
>
> --
> --
>
>    
>
>
> --
> --
>
>

Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

Hi,

The only obstacle is to know to which partition the map output would go.
1 ~ From the map method, how can I know to which partition the output go?
2 ~ Can I call |getPartition(K key, V value, int numReduceTasks)| from 
the map function?

Thanks,




On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
> I think Drake's comment
> "In the map method, records would be ignored with no output.collect() 
> or context.write()."
> is most valid way to do it as it will avoid further processing 
> downstream and hence less resources would be consumed, as unwanted 
> records are pruned at the source itself.
> Is there any obstacle from doing this in your map method ?
>
> Regards,
> Naga
> ------------------------------------------------------------------------
> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
> *Sent:* Thursday, March 12, 2015 22:17
> *To:* user@hadoop.apache.org
> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>
> If I use the partitioner, I must be able to tell map reduce to not 
> execute values from a certain reduce tasks.
>
> The method |public int getPartition(K key, V value, int 
> numReduceTasks)| must always return a partition. I can’t return -1. 
> Thus, I don’ t know how to tell Mapreduce to not execute data from a 
> partition. Any suggestion?
>
> ———— Forwarded Message ————
>
> Subject: Re: Prune out data to a specific reduce task
>
> Date: Thu, 12 Mar 2015 12:40:04 -0400
>
> From: Fei Hu hufei68@gmail.com <ht...@gmail.com>
>
> Reply-To: user@hadoop.apache.org
>
> To: user@hadoop.apache.org
>
> Maybe you could use Partitioner.class to solve your problem.
>
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail 
>> <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
>>
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
>> to excludes data that will go to the reduce task 2. This means that, 
>> only reducer 1 will produce data, and the other one will be empty, or 
>> even it doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> <ExampleJobExecution.png>
>>
>>
>> Thanks,
>>
>> -- 
>> --
> 

-- 
--

Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

Hi,

The only obstacle is to know to which partition the map output would go.
1 ~ From the map method, how can I know to which partition the output go?
2 ~ Can I call |getPartition(K key, V value, int numReduceTasks)| from 
the map function?

Thanks,




On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
> I think Drake's comment
> "In the map method, records would be ignored with no output.collect() 
> or context.write()."
> is most valid way to do it as it will avoid further processing 
> downstream and hence less resources would be consumed, as unwanted 
> records are pruned at the source itself.
> Is there any obstacle from doing this in your map method ?
>
> Regards,
> Naga
> ------------------------------------------------------------------------
> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
> *Sent:* Thursday, March 12, 2015 22:17
> *To:* user@hadoop.apache.org
> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>
> If I use the partitioner, I must be able to tell map reduce to not 
> execute values from a certain reduce tasks.
>
> The method |public int getPartition(K key, V value, int 
> numReduceTasks)| must always return a partition. I can’t return -1. 
> Thus, I don’ t know how to tell Mapreduce to not execute data from a 
> partition. Any suggestion?
>
> ———— Forwarded Message ————
>
> Subject: Re: Prune out data to a specific reduce task
>
> Date: Thu, 12 Mar 2015 12:40:04 -0400
>
> From: Fei Hu hufei68@gmail.com <ht...@gmail.com>
>
> Reply-To: user@hadoop.apache.org
>
> To: user@hadoop.apache.org
>
> Maybe you could use Partitioner.class to solve your problem.
>
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail 
>> <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
>>
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
>> to excludes data that will go to the reduce task 2. This means that, 
>> only reducer 1 will produce data, and the other one will be empty, or 
>> even it doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> <ExampleJobExecution.png>
>>
>>
>> Thanks,
>>
>> -- 
>> --
> 

-- 
--

Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

Hi,

The only obstacle is to know to which partition the map output would go.
1 ~ From the map method, how can I know to which partition the output go?
2 ~ Can I call |getPartition(K key, V value, int numReduceTasks)| from 
the map function?

Thanks,




On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
> I think Drake's comment
> "In the map method, records would be ignored with no output.collect() 
> or context.write()."
> is most valid way to do it as it will avoid further processing 
> downstream and hence less resources would be consumed, as unwanted 
> records are pruned at the source itself.
> Is there any obstacle from doing this in your map method ?
>
> Regards,
> Naga
> ------------------------------------------------------------------------
> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
> *Sent:* Thursday, March 12, 2015 22:17
> *To:* user@hadoop.apache.org
> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>
> If I use the partitioner, I must be able to tell map reduce to not 
> execute values from a certain reduce tasks.
>
> The method |public int getPartition(K key, V value, int 
> numReduceTasks)| must always return a partition. I can’t return -1. 
> Thus, I don’ t know how to tell Mapreduce to not execute data from a 
> partition. Any suggestion?
>
> ———— Forwarded Message ————
>
> Subject: Re: Prune out data to a specific reduce task
>
> Date: Thu, 12 Mar 2015 12:40:04 -0400
>
> From: Fei Hu hufei68@gmail.com <ht...@gmail.com>
>
> Reply-To: user@hadoop.apache.org
>
> To: user@hadoop.apache.org
>
> Maybe you could use Partitioner.class to solve your problem.
>
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail 
>> <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
>>
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
>> to excludes data that will go to the reduce task 2. This means that, 
>> only reducer 1 will produce data, and the other one will be empty, or 
>> even it doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> <ExampleJobExecution.png>
>>
>>
>> Thanks,
>>
>> -- 
>> --
> 

-- 
--

Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

Hi,

The only obstacle is to know to which partition the map output would go.
1 ~ From the map method, how can I know to which partition the output go?
2 ~ Can I call |getPartition(K key, V value, int numReduceTasks)| from 
the map function?

Thanks,




On 13-03-2015 03:25, Naganarasimha G R (Naga) wrote:
> I think Drake's comment
> "In the map method, records would be ignored with no output.collect() 
> or context.write()."
> is most valid way to do it as it will avoid further processing 
> downstream and hence less resources would be consumed, as unwanted 
> records are pruned at the source itself.
> Is there any obstacle from doing this in your map method ?
>
> Regards,
> Naga
> ------------------------------------------------------------------------
> *From:* xeonmailinglist-gmail [xeonmailinglist@gmail.com]
> *Sent:* Thursday, March 12, 2015 22:17
> *To:* user@hadoop.apache.org
> *Subject:* Fwd: Re: Prune out data to a specific reduce task
>
> If I use the partitioner, I must be able to tell map reduce to not 
> execute values from a certain reduce tasks.
>
> The method |public int getPartition(K key, V value, int 
> numReduceTasks)| must always return a partition. I can’t return -1. 
> Thus, I don’ t know how to tell Mapreduce to not execute data from a 
> partition. Any suggestion?
>
> ———— Forwarded Message ————
>
> Subject: Re: Prune out data to a specific reduce task
>
> Date: Thu, 12 Mar 2015 12:40:04 -0400
>
> From: Fei Hu hufei68@gmail.com <ht...@gmail.com>
>
> Reply-To: user@hadoop.apache.org
>
> To: user@hadoop.apache.org
>
> Maybe you could use Partitioner.class to solve your problem.
>
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail 
>> <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
>>
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
>> to excludes data that will go to the reduce task 2. This means that, 
>> only reducer 1 will produce data, and the other one will be empty, or 
>> even it doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> <ExampleJobExecution.png>
>>
>>
>> Thanks,
>>
>> -- 
>> --
> 

-- 
--

RE: Re: Prune out data to a specific reduce task

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

I think Drake's comment
"In the map method, records would be ignored with no output.collect() or context.write()."
is most valid way to do it as it will avoid further processing downstream and hence less resources would be consumed, as unwanted records are pruned at the source itself.
Is there any obstacle from doing this in your map method ?

Regards,
Naga
________________________________
From: xeonmailinglist-gmail [xeonmailinglist@gmail.com]
Sent: Thursday, March 12, 2015 22:17
To: user@hadoop.apache.org
Subject: Fwd: Re: Prune out data to a specific reduce task


If I use the partitioner, I must be able to tell map reduce to not execute values from a certain reduce tasks.

The method public int getPartition(K key, V value, int numReduceTasks) must always return a partition. I can’t return -1. Thus, I don’ t know how to tell Mapreduce to not execute data from a partition. Any suggestion?

———— Forwarded Message ————

Subject: Re: Prune out data to a specific reduce task

Date: Thu, 12 Mar 2015 12:40:04 -0400

From: Fei Hu hufei68@gmail.com<ht...@gmail.com>

Reply-To: user@hadoop.apache.org<ma...@hadoop.apache.org>

To: user@hadoop.apache.org<ma...@hadoop.apache.org>

Maybe you could use Partitioner.class to solve your problem.

On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xe...@gmail.com>> wrote:

Hi,

I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute.

How can I do this in MapReduce?

<ExampleJobExecution.png>


Thanks,


--
--

RE: Re: Prune out data to a specific reduce task

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

I think Drake's comment
"In the map method, records would be ignored with no output.collect() or context.write()."
is most valid way to do it as it will avoid further processing downstream and hence less resources would be consumed, as unwanted records are pruned at the source itself.
Is there any obstacle from doing this in your map method ?

Regards,
Naga
________________________________
From: xeonmailinglist-gmail [xeonmailinglist@gmail.com]
Sent: Thursday, March 12, 2015 22:17
To: user@hadoop.apache.org
Subject: Fwd: Re: Prune out data to a specific reduce task


If I use the partitioner, I must be able to tell map reduce to not execute values from a certain reduce tasks.

The method public int getPartition(K key, V value, int numReduceTasks) must always return a partition. I can’t return -1. Thus, I don’ t know how to tell Mapreduce to not execute data from a partition. Any suggestion?

———— Forwarded Message ————

Subject: Re: Prune out data to a specific reduce task

Date: Thu, 12 Mar 2015 12:40:04 -0400

From: Fei Hu hufei68@gmail.com<ht...@gmail.com>

Reply-To: user@hadoop.apache.org<ma...@hadoop.apache.org>

To: user@hadoop.apache.org<ma...@hadoop.apache.org>

Maybe you could use Partitioner.class to solve your problem.

On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xe...@gmail.com>> wrote:

Hi,

I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute.

How can I do this in MapReduce?

<ExampleJobExecution.png>


Thanks,


--
--

Re: Prune out data to a specific reduce task

Posted by Fei Hu <hu...@gmail.com>.

In the Reducer.class, you could ignore the data that you want to exclude based on the key or value.


> On Mar 12, 2015, at 12:47 PM, xeonmailinglist-gmail <xe...@gmail.com> wrote:
> 
> If I use the partitioner, I must be able to tell map reduce to not execute values from a certain reduce tasks.
> 
> The method public int
>           getPartition(K key, V value, int numReduceTasks) must always return a partition. I can’t return -1. Thus, I don’ t know how to tell Mapreduce to not execute data from a partition. Any suggestion?
> 
> ———— Forwarded Message ————
> 
> Subject: Re: Prune out data to a specific reduce task
> 
> Date: Thu, 12 Mar 2015 12:40:04 -0400
> 
> From: Fei Hu hufei68@gmail.com <http://mailto:hufei68@gmail.com/>
> Reply-To: user@hadoop.apache.org <ma...@hadoop.apache.org>
> To: user@hadoop.apache.org <ma...@hadoop.apache.org>
> Maybe you could use Partitioner.class to solve your problem.
> 
> 
> 
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute.
>> 
>> How can I do this in MapReduce?
>> 
>> <ExampleJobExecution.png>
>> 
>> 
>> Thanks,
>> 
>> -- 
>> --
> 
>

Re: Prune out data to a specific reduce task

Posted by Fei Hu <hu...@gmail.com>.

In the Reducer.class, you could ignore the data that you want to exclude based on the key or value.


> On Mar 12, 2015, at 12:47 PM, xeonmailinglist-gmail <xe...@gmail.com> wrote:
> 
> If I use the partitioner, I must be able to tell map reduce to not execute values from a certain reduce tasks.
> 
> The method public int
>           getPartition(K key, V value, int numReduceTasks) must always return a partition. I can’t return -1. Thus, I don’ t know how to tell Mapreduce to not execute data from a partition. Any suggestion?
> 
> ———— Forwarded Message ————
> 
> Subject: Re: Prune out data to a specific reduce task
> 
> Date: Thu, 12 Mar 2015 12:40:04 -0400
> 
> From: Fei Hu hufei68@gmail.com <http://mailto:hufei68@gmail.com/>
> Reply-To: user@hadoop.apache.org <ma...@hadoop.apache.org>
> To: user@hadoop.apache.org <ma...@hadoop.apache.org>
> Maybe you could use Partitioner.class to solve your problem.
> 
> 
> 
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute.
>> 
>> How can I do this in MapReduce?
>> 
>> <ExampleJobExecution.png>
>> 
>> 
>> Thanks,
>> 
>> -- 
>> --
> 
>

RE: Re: Prune out data to a specific reduce task

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

I think Drake's comment
"In the map method, records would be ignored with no output.collect() or context.write()."
is most valid way to do it as it will avoid further processing downstream and hence less resources would be consumed, as unwanted records are pruned at the source itself.
Is there any obstacle from doing this in your map method ?

Regards,
Naga
________________________________
From: xeonmailinglist-gmail [xeonmailinglist@gmail.com]
Sent: Thursday, March 12, 2015 22:17
To: user@hadoop.apache.org
Subject: Fwd: Re: Prune out data to a specific reduce task


If I use the partitioner, I must be able to tell map reduce to not execute values from a certain reduce tasks.

The method public int getPartition(K key, V value, int numReduceTasks) must always return a partition. I can’t return -1. Thus, I don’ t know how to tell Mapreduce to not execute data from a partition. Any suggestion?

———— Forwarded Message ————

Subject: Re: Prune out data to a specific reduce task

Date: Thu, 12 Mar 2015 12:40:04 -0400

From: Fei Hu hufei68@gmail.com<ht...@gmail.com>

Reply-To: user@hadoop.apache.org<ma...@hadoop.apache.org>

To: user@hadoop.apache.org<ma...@hadoop.apache.org>

Maybe you could use Partitioner.class to solve your problem.

On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xe...@gmail.com>> wrote:

Hi,

I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute.

How can I do this in MapReduce?

<ExampleJobExecution.png>


Thanks,


--
--

Re: Prune out data to a specific reduce task

Posted by Fei Hu <hu...@gmail.com>.

In the Reducer.class, you could ignore the data that you want to exclude based on the key or value.


> On Mar 12, 2015, at 12:47 PM, xeonmailinglist-gmail <xe...@gmail.com> wrote:
> 
> If I use the partitioner, I must be able to tell map reduce to not execute values from a certain reduce tasks.
> 
> The method public int
>           getPartition(K key, V value, int numReduceTasks) must always return a partition. I can’t return -1. Thus, I don’ t know how to tell Mapreduce to not execute data from a partition. Any suggestion?
> 
> ———— Forwarded Message ————
> 
> Subject: Re: Prune out data to a specific reduce task
> 
> Date: Thu, 12 Mar 2015 12:40:04 -0400
> 
> From: Fei Hu hufei68@gmail.com <http://mailto:hufei68@gmail.com/>
> Reply-To: user@hadoop.apache.org <ma...@hadoop.apache.org>
> To: user@hadoop.apache.org <ma...@hadoop.apache.org>
> Maybe you could use Partitioner.class to solve your problem.
> 
> 
> 
>> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute.
>> 
>> How can I do this in MapReduce?
>> 
>> <ExampleJobExecution.png>
>> 
>> 
>> Thanks,
>> 
>> -- 
>> --
> 
>

RE: Re: Prune out data to a specific reduce task

Posted by "Naganarasimha G R (Naga)" <ga...@huawei.com>.

I think Drake's comment
"In the map method, records would be ignored with no output.collect() or context.write()."
is most valid way to do it as it will avoid further processing downstream and hence less resources would be consumed, as unwanted records are pruned at the source itself.
Is there any obstacle from doing this in your map method ?

Regards,
Naga
________________________________
From: xeonmailinglist-gmail [xeonmailinglist@gmail.com]
Sent: Thursday, March 12, 2015 22:17
To: user@hadoop.apache.org
Subject: Fwd: Re: Prune out data to a specific reduce task


If I use the partitioner, I must be able to tell map reduce to not execute values from a certain reduce tasks.

The method public int getPartition(K key, V value, int numReduceTasks) must always return a partition. I can’t return -1. Thus, I don’ t know how to tell Mapreduce to not execute data from a partition. Any suggestion?

———— Forwarded Message ————

Subject: Re: Prune out data to a specific reduce task

Date: Thu, 12 Mar 2015 12:40:04 -0400

From: Fei Hu hufei68@gmail.com<ht...@gmail.com>

Reply-To: user@hadoop.apache.org<ma...@hadoop.apache.org>

To: user@hadoop.apache.org<ma...@hadoop.apache.org>

Maybe you could use Partitioner.class to solve your problem.

On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xe...@gmail.com>> wrote:

Hi,

I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute.

How can I do this in MapReduce?

<ExampleJobExecution.png>


Thanks,


--
--

Fwd: Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

If I use the partitioner, I must be able to tell map reduce to not 
execute values from a certain reduce tasks.

The method |public int getPartition(K key, V value, int numReduceTasks)| 
must always return a partition. I can’t return -1. Thus, I don’ t know 
how to tell Mapreduce to not execute data from a partition. Any suggestion?

———— Forwarded Message ————

Subject: Re: Prune out data to a specific reduce task

Date: Thu, 12 Mar 2015 12:40:04 -0400

From: Fei Hu hufei68@gmail.com <ht...@gmail.com>

Reply-To: user@hadoop.apache.org

To: user@hadoop.apache.org

Maybe you could use Partitioner.class to solve your problem.

> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail 
> <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
>
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
> to excludes data that will go to the reduce task 2. This means that, 
> only reducer 1 will produce data, and the other one will be empty, or 
> even it doesn't execute.
>
> How can I do this in MapReduce?
>
> <ExampleJobExecution.png>
>
>
> Thanks,
>
> -- 
> --

Fwd: Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

If I use the partitioner, I must be able to tell map reduce to not 
execute values from a certain reduce tasks.

The method |public int getPartition(K key, V value, int numReduceTasks)| 
must always return a partition. I can’t return -1. Thus, I don’ t know 
how to tell Mapreduce to not execute data from a partition. Any suggestion?

———— Forwarded Message ————

Subject: Re: Prune out data to a specific reduce task

Date: Thu, 12 Mar 2015 12:40:04 -0400

From: Fei Hu hufei68@gmail.com <ht...@gmail.com>

Reply-To: user@hadoop.apache.org

To: user@hadoop.apache.org

Maybe you could use Partitioner.class to solve your problem.

> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail 
> <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
>
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
> to excludes data that will go to the reduce task 2. This means that, 
> only reducer 1 will produce data, and the other one will be empty, or 
> even it doesn't execute.
>
> How can I do this in MapReduce?
>
> <ExampleJobExecution.png>
>
>
> Thanks,
>
> -- 
> --

Fwd: Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

If I use the partitioner, I must be able to tell map reduce to not 
execute values from a certain reduce tasks.

The method |public int getPartition(K key, V value, int numReduceTasks)| 
must always return a partition. I can’t return -1. Thus, I don’ t know 
how to tell Mapreduce to not execute data from a partition. Any suggestion?

———— Forwarded Message ————

Subject: Re: Prune out data to a specific reduce task

Date: Thu, 12 Mar 2015 12:40:04 -0400

From: Fei Hu hufei68@gmail.com <ht...@gmail.com>

Reply-To: user@hadoop.apache.org

To: user@hadoop.apache.org

Maybe you could use Partitioner.class to solve your problem.

> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail 
> <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
>
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
> to excludes data that will go to the reduce task 2. This means that, 
> only reducer 1 will produce data, and the other one will be empty, or 
> even it doesn't execute.
>
> How can I do this in MapReduce?
>
> <ExampleJobExecution.png>
>
>
> Thanks,
>
> -- 
> --

Fwd: Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

If I use the partitioner, I must be able to tell map reduce to not 
execute values from a certain reduce tasks.

The method |public int getPartition(K key, V value, int numReduceTasks)| 
must always return a partition. I can’t return -1. Thus, I don’ t know 
how to tell Mapreduce to not execute data from a partition. Any suggestion?

———— Forwarded Message ————

Subject: Re: Prune out data to a specific reduce task

Date: Thu, 12 Mar 2015 12:40:04 -0400

From: Fei Hu hufei68@gmail.com <ht...@gmail.com>

Reply-To: user@hadoop.apache.org

To: user@hadoop.apache.org

Maybe you could use Partitioner.class to solve your problem.

> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail 
> <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
>
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
> to excludes data that will go to the reduce task 2. This means that, 
> only reducer 1 will produce data, and the other one will be empty, or 
> even it doesn't execute.
>
> How can I do this in MapReduce?
>
> <ExampleJobExecution.png>
>
>
> Thanks,
>
> -- 
> --

Re: Prune out data to a specific reduce task

Posted by Fei Hu <hu...@gmail.com>.

Maybe you could use Partitioner.class to solve your problem.

> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
> 
> Hi,
> 
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute.
> 
> How can I do this in MapReduce?
> 
> <ExampleJobExecution.png>
> 
> 
> Thanks,
> 
> -- 
> --

Re: Prune out data to a specific reduce task

Posted by Drake민영근 <dr...@nexr.com>.

In the map method, records would be ignored with no output.collect() or
context.write().

Or you just delete output file from reducer 2 at the end of job. the
reducer 2's result file is "part-r-00002".

Drake 민영근 Ph.D
kt NexR

On Wed, Mar 11, 2015 at 9:43 PM, Fabio C. <an...@gmail.com> wrote:

> As far as I know the code running in each reducer is the same you specify
> in your reduce function, so if you know in advance the features of the data
> you want to ignore you can just instruct reducers to do so.
> If you are able to tell whether or not to keep an entry at the beginning,
> you can filter them out within the map function.
> I could think of a wordcount example where we tell the map phase to ignore
> all the words starting with a specific letter...
> What kind of data are you processing and what is the filtering condition?
> Anyway I'm sorry I can't help with the actual code, but I'm not really
> into this right now.
>
> On Wed, Mar 11, 2015 at 12:13 PM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>>  Maybe the correct question is, how can I filter data in mapreduce in
>> Java?
>>
>>
>>
>> On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
>>
>> To exclude data to a specific reducer, should I build a partitioner that
>> do this? Should I have a map function that checks to which reduce task the
>> output goes?
>>
>> Can anyone give me some suggestion?
>>
>> And by the way, I really want to exclude data to a reduce task. So, I
>> will run more than 1 reducer, even if one of them does not get input data.
>>
>>
>> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>>
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
>> excludes data that will go to the reduce task 2. This means that, only
>> reducer 1 will produce data, and the other one will be empty, or even it
>> doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> [image: Example Job Execution]
>>
>>
>> Thanks,
>>
>> --
>> --
>>
>>
>> --
>> --
>>
>>
>> --
>> --
>>
>>
>

Re: Prune out data to a specific reduce task

Posted by Drake민영근 <dr...@nexr.com>.

In the map method, records would be ignored with no output.collect() or
context.write().

Or you just delete output file from reducer 2 at the end of job. the
reducer 2's result file is "part-r-00002".

Drake 민영근 Ph.D
kt NexR

On Wed, Mar 11, 2015 at 9:43 PM, Fabio C. <an...@gmail.com> wrote:

> As far as I know the code running in each reducer is the same you specify
> in your reduce function, so if you know in advance the features of the data
> you want to ignore you can just instruct reducers to do so.
> If you are able to tell whether or not to keep an entry at the beginning,
> you can filter them out within the map function.
> I could think of a wordcount example where we tell the map phase to ignore
> all the words starting with a specific letter...
> What kind of data are you processing and what is the filtering condition?
> Anyway I'm sorry I can't help with the actual code, but I'm not really
> into this right now.
>
> On Wed, Mar 11, 2015 at 12:13 PM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>>  Maybe the correct question is, how can I filter data in mapreduce in
>> Java?
>>
>>
>>
>> On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
>>
>> To exclude data to a specific reducer, should I build a partitioner that
>> do this? Should I have a map function that checks to which reduce task the
>> output goes?
>>
>> Can anyone give me some suggestion?
>>
>> And by the way, I really want to exclude data to a reduce task. So, I
>> will run more than 1 reducer, even if one of them does not get input data.
>>
>>
>> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>>
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
>> excludes data that will go to the reduce task 2. This means that, only
>> reducer 1 will produce data, and the other one will be empty, or even it
>> doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> [image: Example Job Execution]
>>
>>
>> Thanks,
>>
>> --
>> --
>>
>>
>> --
>> --
>>
>>
>> --
>> --
>>
>>
>

Re: Prune out data to a specific reduce task

Posted by Drake민영근 <dr...@nexr.com>.

In the map method, records would be ignored with no output.collect() or
context.write().

Or you just delete output file from reducer 2 at the end of job. the
reducer 2's result file is "part-r-00002".

Drake 민영근 Ph.D
kt NexR

On Wed, Mar 11, 2015 at 9:43 PM, Fabio C. <an...@gmail.com> wrote:

> As far as I know the code running in each reducer is the same you specify
> in your reduce function, so if you know in advance the features of the data
> you want to ignore you can just instruct reducers to do so.
> If you are able to tell whether or not to keep an entry at the beginning,
> you can filter them out within the map function.
> I could think of a wordcount example where we tell the map phase to ignore
> all the words starting with a specific letter...
> What kind of data are you processing and what is the filtering condition?
> Anyway I'm sorry I can't help with the actual code, but I'm not really
> into this right now.
>
> On Wed, Mar 11, 2015 at 12:13 PM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>>  Maybe the correct question is, how can I filter data in mapreduce in
>> Java?
>>
>>
>>
>> On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
>>
>> To exclude data to a specific reducer, should I build a partitioner that
>> do this? Should I have a map function that checks to which reduce task the
>> output goes?
>>
>> Can anyone give me some suggestion?
>>
>> And by the way, I really want to exclude data to a reduce task. So, I
>> will run more than 1 reducer, even if one of them does not get input data.
>>
>>
>> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>>
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
>> excludes data that will go to the reduce task 2. This means that, only
>> reducer 1 will produce data, and the other one will be empty, or even it
>> doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> [image: Example Job Execution]
>>
>>
>> Thanks,
>>
>> --
>> --
>>
>>
>> --
>> --
>>
>>
>> --
>> --
>>
>>
>

Re: Prune out data to a specific reduce task

Posted by Drake민영근 <dr...@nexr.com>.

In the map method, records would be ignored with no output.collect() or
context.write().

Or you just delete output file from reducer 2 at the end of job. the
reducer 2's result file is "part-r-00002".

Drake 민영근 Ph.D
kt NexR

On Wed, Mar 11, 2015 at 9:43 PM, Fabio C. <an...@gmail.com> wrote:

> As far as I know the code running in each reducer is the same you specify
> in your reduce function, so if you know in advance the features of the data
> you want to ignore you can just instruct reducers to do so.
> If you are able to tell whether or not to keep an entry at the beginning,
> you can filter them out within the map function.
> I could think of a wordcount example where we tell the map phase to ignore
> all the words starting with a specific letter...
> What kind of data are you processing and what is the filtering condition?
> Anyway I'm sorry I can't help with the actual code, but I'm not really
> into this right now.
>
> On Wed, Mar 11, 2015 at 12:13 PM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>>  Maybe the correct question is, how can I filter data in mapreduce in
>> Java?
>>
>>
>>
>> On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
>>
>> To exclude data to a specific reducer, should I build a partitioner that
>> do this? Should I have a map function that checks to which reduce task the
>> output goes?
>>
>> Can anyone give me some suggestion?
>>
>> And by the way, I really want to exclude data to a reduce task. So, I
>> will run more than 1 reducer, even if one of them does not get input data.
>>
>>
>> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>>
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
>> excludes data that will go to the reduce task 2. This means that, only
>> reducer 1 will produce data, and the other one will be empty, or even it
>> doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> [image: Example Job Execution]
>>
>>
>> Thanks,
>>
>> --
>> --
>>
>>
>> --
>> --
>>
>>
>> --
>> --
>>
>>
>

Re: Prune out data to a specific reduce task

Posted by "Fabio C." <an...@gmail.com>.

As far as I know the code running in each reducer is the same you specify
in your reduce function, so if you know in advance the features of the data
you want to ignore you can just instruct reducers to do so.
If you are able to tell whether or not to keep an entry at the beginning,
you can filter them out within the map function.
I could think of a wordcount example where we tell the map phase to ignore
all the words starting with a specific letter...
What kind of data are you processing and what is the filtering condition?
Anyway I'm sorry I can't help with the actual code, but I'm not really into
this right now.

On Wed, Mar 11, 2015 at 12:13 PM, xeonmailinglist-gmail <
xeonmailinglist@gmail.com> wrote:

>  Maybe the correct question is, how can I filter data in mapreduce in Java?
>
>
>
> On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
>
> To exclude data to a specific reducer, should I build a partitioner that
> do this? Should I have a map function that checks to which reduce task the
> output goes?
>
> Can anyone give me some suggestion?
>
> And by the way, I really want to exclude data to a reduce task. So, I will
> run more than 1 reducer, even if one of them does not get input data.
>
>
> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
> excludes data that will go to the reduce task 2. This means that, only
> reducer 1 will produce data, and the other one will be empty, or even it
> doesn't execute.
>
> How can I do this in MapReduce?
>
> [image: Example Job Execution]
>
>
> Thanks,
>
> --
> --
>
>
> --
> --
>
>
> --
> --
>
>

Re: Prune out data to a specific reduce task

Posted by "Fabio C." <an...@gmail.com>.

As far as I know the code running in each reducer is the same you specify
in your reduce function, so if you know in advance the features of the data
you want to ignore you can just instruct reducers to do so.
If you are able to tell whether or not to keep an entry at the beginning,
you can filter them out within the map function.
I could think of a wordcount example where we tell the map phase to ignore
all the words starting with a specific letter...
What kind of data are you processing and what is the filtering condition?
Anyway I'm sorry I can't help with the actual code, but I'm not really into
this right now.

On Wed, Mar 11, 2015 at 12:13 PM, xeonmailinglist-gmail <
xeonmailinglist@gmail.com> wrote:

>  Maybe the correct question is, how can I filter data in mapreduce in Java?
>
>
>
> On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
>
> To exclude data to a specific reducer, should I build a partitioner that
> do this? Should I have a map function that checks to which reduce task the
> output goes?
>
> Can anyone give me some suggestion?
>
> And by the way, I really want to exclude data to a reduce task. So, I will
> run more than 1 reducer, even if one of them does not get input data.
>
>
> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
> excludes data that will go to the reduce task 2. This means that, only
> reducer 1 will produce data, and the other one will be empty, or even it
> doesn't execute.
>
> How can I do this in MapReduce?
>
> [image: Example Job Execution]
>
>
> Thanks,
>
> --
> --
>
>
> --
> --
>
>
> --
> --
>
>

Re: Prune out data to a specific reduce task

Posted by "Fabio C." <an...@gmail.com>.

As far as I know the code running in each reducer is the same you specify
in your reduce function, so if you know in advance the features of the data
you want to ignore you can just instruct reducers to do so.
If you are able to tell whether or not to keep an entry at the beginning,
you can filter them out within the map function.
I could think of a wordcount example where we tell the map phase to ignore
all the words starting with a specific letter...
What kind of data are you processing and what is the filtering condition?
Anyway I'm sorry I can't help with the actual code, but I'm not really into
this right now.

On Wed, Mar 11, 2015 at 12:13 PM, xeonmailinglist-gmail <
xeonmailinglist@gmail.com> wrote:

>  Maybe the correct question is, how can I filter data in mapreduce in Java?
>
>
>
> On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
>
> To exclude data to a specific reducer, should I build a partitioner that
> do this? Should I have a map function that checks to which reduce task the
> output goes?
>
> Can anyone give me some suggestion?
>
> And by the way, I really want to exclude data to a reduce task. So, I will
> run more than 1 reducer, even if one of them does not get input data.
>
>
> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
> excludes data that will go to the reduce task 2. This means that, only
> reducer 1 will produce data, and the other one will be empty, or even it
> doesn't execute.
>
> How can I do this in MapReduce?
>
> [image: Example Job Execution]
>
>
> Thanks,
>
> --
> --
>
>
> --
> --
>
>
> --
> --
>
>

Re: Prune out data to a specific reduce task

Posted by "Fabio C." <an...@gmail.com>.

As far as I know the code running in each reducer is the same you specify
in your reduce function, so if you know in advance the features of the data
you want to ignore you can just instruct reducers to do so.
If you are able to tell whether or not to keep an entry at the beginning,
you can filter them out within the map function.
I could think of a wordcount example where we tell the map phase to ignore
all the words starting with a specific letter...
What kind of data are you processing and what is the filtering condition?
Anyway I'm sorry I can't help with the actual code, but I'm not really into
this right now.

On Wed, Mar 11, 2015 at 12:13 PM, xeonmailinglist-gmail <
xeonmailinglist@gmail.com> wrote:

>  Maybe the correct question is, how can I filter data in mapreduce in Java?
>
>
>
> On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
>
> To exclude data to a specific reducer, should I build a partitioner that
> do this? Should I have a map function that checks to which reduce task the
> output goes?
>
> Can anyone give me some suggestion?
>
> And by the way, I really want to exclude data to a reduce task. So, I will
> run more than 1 reducer, even if one of them does not get input data.
>
>
> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
> excludes data that will go to the reduce task 2. This means that, only
> reducer 1 will produce data, and the other one will be empty, or even it
> doesn't execute.
>
> How can I do this in MapReduce?
>
> [image: Example Job Execution]
>
>
> Thanks,
>
> --
> --
>
>
> --
> --
>
>
> --
> --
>
>

Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

Maybe the correct question is, how can I filter data in mapreduce in Java?


On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
> To exclude data to a specific reducer, should I build a partitioner 
> that do this? Should I have a map function that checks to which reduce 
> task the output goes?
>
> Can anyone give me some suggestion?
>
> And by the way, I really want to exclude data to a reduce task. So, I 
> will run more than 1 reducer, even if one of them does not get input data.
>
>
> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
>> to excludes data that will go to the reduce task 2. This means that, 
>> only reducer 1 will produce data, and the other one will be empty, or 
>> even it doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> Example Job Execution
>>
>>
>> Thanks,
>>
>> -- 
>> --
>
> -- 
> --

-- 
--

Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

Maybe the correct question is, how can I filter data in mapreduce in Java?


On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
> To exclude data to a specific reducer, should I build a partitioner 
> that do this? Should I have a map function that checks to which reduce 
> task the output goes?
>
> Can anyone give me some suggestion?
>
> And by the way, I really want to exclude data to a reduce task. So, I 
> will run more than 1 reducer, even if one of them does not get input data.
>
>
> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
>> to excludes data that will go to the reduce task 2. This means that, 
>> only reducer 1 will produce data, and the other one will be empty, or 
>> even it doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> Example Job Execution
>>
>>
>> Thanks,
>>
>> -- 
>> --
>
> -- 
> --

-- 
--

Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

Maybe the correct question is, how can I filter data in mapreduce in Java?


On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
> To exclude data to a specific reducer, should I build a partitioner 
> that do this? Should I have a map function that checks to which reduce 
> task the output goes?
>
> Can anyone give me some suggestion?
>
> And by the way, I really want to exclude data to a reduce task. So, I 
> will run more than 1 reducer, even if one of them does not get input data.
>
>
> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
>> to excludes data that will go to the reduce task 2. This means that, 
>> only reducer 1 will produce data, and the other one will be empty, or 
>> even it doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> Example Job Execution
>>
>>
>> Thanks,
>>
>> -- 
>> --
>
> -- 
> --

-- 
--

Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

Maybe the correct question is, how can I filter data in mapreduce in Java?


On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
> To exclude data to a specific reducer, should I build a partitioner 
> that do this? Should I have a map function that checks to which reduce 
> task the output goes?
>
> Can anyone give me some suggestion?
>
> And by the way, I really want to exclude data to a reduce task. So, I 
> will run more than 1 reducer, even if one of them does not get input data.
>
>
> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
>> to excludes data that will go to the reduce task 2. This means that, 
>> only reducer 1 will produce data, and the other one will be empty, or 
>> even it doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> Example Job Execution
>>
>>
>> Thanks,
>>
>> -- 
>> --
>
> -- 
> --

-- 
--

Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

To exclude data to a specific reducer, should I build a partitioner that 
do this? Should I have a map function that checks to which reduce task 
the output goes?

Can anyone give me some suggestion?

And by the way, I really want to exclude data to a reduce task. So, I 
will run more than 1 reducer, even if one of them does not get input data.

On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
> to excludes data that will go to the reduce task 2. This means that, 
> only reducer 1 will produce data, and the other one will be empty, or 
> even it doesn't execute.
>
> How can I do this in MapReduce?
>
> Example Job Execution
>
>
> Thanks,
>
> -- 
> --

-- 
--

Re: Prune out data to a specific reduce task

Posted by Fei Hu <hu...@gmail.com>.

Maybe you could use Partitioner.class to solve your problem.

> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
> 
> Hi,
> 
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute.
> 
> How can I do this in MapReduce?
> 
> <ExampleJobExecution.png>
> 
> 
> Thanks,
> 
> -- 
> --

Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

To exclude data to a specific reducer, should I build a partitioner that 
do this? Should I have a map function that checks to which reduce task 
the output goes?

Can anyone give me some suggestion?

And by the way, I really want to exclude data to a reduce task. So, I 
will run more than 1 reducer, even if one of them does not get input data.

On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
> to excludes data that will go to the reduce task 2. This means that, 
> only reducer 1 will produce data, and the other one will be empty, or 
> even it doesn't execute.
>
> How can I do this in MapReduce?
>
> Example Job Execution
>
>
> Thanks,
>
> -- 
> --

-- 
--

Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

To exclude data to a specific reducer, should I build a partitioner that 
do this? Should I have a map function that checks to which reduce task 
the output goes?

Can anyone give me some suggestion?

And by the way, I really want to exclude data to a reduce task. So, I 
will run more than 1 reducer, even if one of them does not get input data.

On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
> to excludes data that will go to the reduce task 2. This means that, 
> only reducer 1 will produce data, and the other one will be empty, or 
> even it doesn't execute.
>
> How can I do this in MapReduce?
>
> Example Job Execution
>
>
> Thanks,
>
> -- 
> --

-- 
--

Re: Prune out data to a specific reduce task

Posted by Fei Hu <hu...@gmail.com>.

Maybe you could use Partitioner.class to solve your problem.

> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
> 
> Hi,
> 
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute.
> 
> How can I do this in MapReduce?
> 
> <ExampleJobExecution.png>
> 
> 
> Thanks,
> 
> -- 
> --

Re: Prune out data to a specific reduce task

Posted by xeonmailinglist-gmail <xe...@gmail.com>.

To exclude data to a specific reducer, should I build a partitioner that 
do this? Should I have a map function that checks to which reduce task 
the output goes?

Can anyone give me some suggestion?

And by the way, I really want to exclude data to a reduce task. So, I 
will run more than 1 reducer, even if one of them does not get input data.

On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want 
> to excludes data that will go to the reduce task 2. This means that, 
> only reducer 1 will produce data, and the other one will be empty, or 
> even it doesn't execute.
>
> How can I do this in MapReduce?
>
> Example Job Execution
>
>
> Thanks,
>
> -- 
> --

-- 
--

Re: Prune out data to a specific reduce task

Posted by Fei Hu <hu...@gmail.com>.

Maybe you could use Partitioner.class to solve your problem.

> On Mar 11, 2015, at 6:28 AM, xeonmailinglist-gmail <xeonmailinglist@gmail.com <ma...@gmail.com>> wrote:
> 
> Hi,
> 
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to excludes data that will go to the reduce task 2. This means that, only reducer 1 will produce data, and the other one will be empty, or even it doesn't execute.
> 
> How can I do this in MapReduce?
> 
> <ExampleJobExecution.png>
> 
> 
> Thanks,
> 
> -- 
> --