You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Aseem Anand <as...@gmail.com> on 2012/09/14 13:06:25 UTC

Ignore keys while scheduling reduce jobs

Hi,
Is there anyway I can ignore all keys except a certain key ( determined
after the map stage) to start only 1 reduce job using a partitioner? If so
could someone suggest such a method.

Regards,
Aseem

Re: Ignore keys while scheduling reduce jobs

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

Does the mapper know what is the 1st point in the data set and the cluster
id corresponding to it ? I don't know much about the kmeans algorithm,
hence may be wrong ..

If the mappers have this information, then, the map task can check from the
clusters data whether a cluster id pertains to the first point and emit it
only if this condition is true, ignoring all other records.

Then you can set up your job to have only one reducer that will get all
values for the single cluster id and process it.

Thanks
Hemanth

On Fri, Sep 14, 2012 at 4:56 PM, Aseem Anand <as...@gmail.com> wrote:

> Hi,
> Consider it to be a single iteration Kmeans clustering job such that I
> only wish to schedule reduce jobs for the clusterId(the key for a Kmeans)
> of the cluster corresponding to the 1st point in the dataset.
> I wish to check the clusterId of the first point in the input file and get
> reduce jobs only for that specific clusterId.
>
> I think we shall have to wait for all mappers to end.
>
> Thanks,
> Aseem
>
>
> On Fri, Sep 14, 2012 at 4:43 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hi,
>>
>> When do you know the keys to ignore ? You mentioned "after the map stage"
>> .. is this at the end of each map task, or at the end of all map tasks ?
>>
>> Thanks
>> hemanth
>>
>>
>> On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <as...@gmail.com>wrote:
>>
>>> Hi,
>>> Is there anyway I can ignore all keys except a certain key ( determined
>>> after the map stage) to start only 1 reduce job using a partitioner? If so
>>> could someone suggest such a method.
>>>
>>> Regards,
>>> Aseem
>>>
>>>
>>
>

Re: Ignore keys while scheduling reduce jobs

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

Does the mapper know what is the 1st point in the data set and the cluster
id corresponding to it ? I don't know much about the kmeans algorithm,
hence may be wrong ..

If the mappers have this information, then, the map task can check from the
clusters data whether a cluster id pertains to the first point and emit it
only if this condition is true, ignoring all other records.

Then you can set up your job to have only one reducer that will get all
values for the single cluster id and process it.

Thanks
Hemanth

On Fri, Sep 14, 2012 at 4:56 PM, Aseem Anand <as...@gmail.com> wrote:

> Hi,
> Consider it to be a single iteration Kmeans clustering job such that I
> only wish to schedule reduce jobs for the clusterId(the key for a Kmeans)
> of the cluster corresponding to the 1st point in the dataset.
> I wish to check the clusterId of the first point in the input file and get
> reduce jobs only for that specific clusterId.
>
> I think we shall have to wait for all mappers to end.
>
> Thanks,
> Aseem
>
>
> On Fri, Sep 14, 2012 at 4:43 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hi,
>>
>> When do you know the keys to ignore ? You mentioned "after the map stage"
>> .. is this at the end of each map task, or at the end of all map tasks ?
>>
>> Thanks
>> hemanth
>>
>>
>> On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <as...@gmail.com>wrote:
>>
>>> Hi,
>>> Is there anyway I can ignore all keys except a certain key ( determined
>>> after the map stage) to start only 1 reduce job using a partitioner? If so
>>> could someone suggest such a method.
>>>
>>> Regards,
>>> Aseem
>>>
>>>
>>
>

Re: Ignore keys while scheduling reduce jobs

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

Does the mapper know what is the 1st point in the data set and the cluster
id corresponding to it ? I don't know much about the kmeans algorithm,
hence may be wrong ..

If the mappers have this information, then, the map task can check from the
clusters data whether a cluster id pertains to the first point and emit it
only if this condition is true, ignoring all other records.

Then you can set up your job to have only one reducer that will get all
values for the single cluster id and process it.

Thanks
Hemanth

On Fri, Sep 14, 2012 at 4:56 PM, Aseem Anand <as...@gmail.com> wrote:

> Hi,
> Consider it to be a single iteration Kmeans clustering job such that I
> only wish to schedule reduce jobs for the clusterId(the key for a Kmeans)
> of the cluster corresponding to the 1st point in the dataset.
> I wish to check the clusterId of the first point in the input file and get
> reduce jobs only for that specific clusterId.
>
> I think we shall have to wait for all mappers to end.
>
> Thanks,
> Aseem
>
>
> On Fri, Sep 14, 2012 at 4:43 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hi,
>>
>> When do you know the keys to ignore ? You mentioned "after the map stage"
>> .. is this at the end of each map task, or at the end of all map tasks ?
>>
>> Thanks
>> hemanth
>>
>>
>> On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <as...@gmail.com>wrote:
>>
>>> Hi,
>>> Is there anyway I can ignore all keys except a certain key ( determined
>>> after the map stage) to start only 1 reduce job using a partitioner? If so
>>> could someone suggest such a method.
>>>
>>> Regards,
>>> Aseem
>>>
>>>
>>
>

Re: Ignore keys while scheduling reduce jobs

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

Does the mapper know what is the 1st point in the data set and the cluster
id corresponding to it ? I don't know much about the kmeans algorithm,
hence may be wrong ..

If the mappers have this information, then, the map task can check from the
clusters data whether a cluster id pertains to the first point and emit it
only if this condition is true, ignoring all other records.

Then you can set up your job to have only one reducer that will get all
values for the single cluster id and process it.

Thanks
Hemanth

On Fri, Sep 14, 2012 at 4:56 PM, Aseem Anand <as...@gmail.com> wrote:

> Hi,
> Consider it to be a single iteration Kmeans clustering job such that I
> only wish to schedule reduce jobs for the clusterId(the key for a Kmeans)
> of the cluster corresponding to the 1st point in the dataset.
> I wish to check the clusterId of the first point in the input file and get
> reduce jobs only for that specific clusterId.
>
> I think we shall have to wait for all mappers to end.
>
> Thanks,
> Aseem
>
>
> On Fri, Sep 14, 2012 at 4:43 PM, Hemanth Yamijala <
> yhemanth@thoughtworks.com> wrote:
>
>> Hi,
>>
>> When do you know the keys to ignore ? You mentioned "after the map stage"
>> .. is this at the end of each map task, or at the end of all map tasks ?
>>
>> Thanks
>> hemanth
>>
>>
>> On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <as...@gmail.com>wrote:
>>
>>> Hi,
>>> Is there anyway I can ignore all keys except a certain key ( determined
>>> after the map stage) to start only 1 reduce job using a partitioner? If so
>>> could someone suggest such a method.
>>>
>>> Regards,
>>> Aseem
>>>
>>>
>>
>

Re: Ignore keys while scheduling reduce jobs

Posted by Aseem Anand <as...@gmail.com>.

Hi,
Consider it to be a single iteration Kmeans clustering job such that I only
wish to schedule reduce jobs for the clusterId(the key for a Kmeans) of the
cluster corresponding to the 1st point in the dataset.
I wish to check the clusterId of the first point in the input file and get
reduce jobs only for that specific clusterId.

I think we shall have to wait for all mappers to end.

Thanks,
Aseem

On Fri, Sep 14, 2012 at 4:43 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Hi,
>
> When do you know the keys to ignore ? You mentioned "after the map stage"
> .. is this at the end of each map task, or at the end of all map tasks ?
>
> Thanks
> hemanth
>
>
> On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <as...@gmail.com>wrote:
>
>> Hi,
>> Is there anyway I can ignore all keys except a certain key ( determined
>> after the map stage) to start only 1 reduce job using a partitioner? If so
>> could someone suggest such a method.
>>
>> Regards,
>> Aseem
>>
>>
>

Re: Ignore keys while scheduling reduce jobs

Posted by Aseem Anand <as...@gmail.com>.

Hi,
Consider it to be a single iteration Kmeans clustering job such that I only
wish to schedule reduce jobs for the clusterId(the key for a Kmeans) of the
cluster corresponding to the 1st point in the dataset.
I wish to check the clusterId of the first point in the input file and get
reduce jobs only for that specific clusterId.

I think we shall have to wait for all mappers to end.

Thanks,
Aseem

On Fri, Sep 14, 2012 at 4:43 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Hi,
>
> When do you know the keys to ignore ? You mentioned "after the map stage"
> .. is this at the end of each map task, or at the end of all map tasks ?
>
> Thanks
> hemanth
>
>
> On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <as...@gmail.com>wrote:
>
>> Hi,
>> Is there anyway I can ignore all keys except a certain key ( determined
>> after the map stage) to start only 1 reduce job using a partitioner? If so
>> could someone suggest such a method.
>>
>> Regards,
>> Aseem
>>
>>
>

Re: Ignore keys while scheduling reduce jobs

Posted by Aseem Anand <as...@gmail.com>.

Hi,
Consider it to be a single iteration Kmeans clustering job such that I only
wish to schedule reduce jobs for the clusterId(the key for a Kmeans) of the
cluster corresponding to the 1st point in the dataset.
I wish to check the clusterId of the first point in the input file and get
reduce jobs only for that specific clusterId.

I think we shall have to wait for all mappers to end.

Thanks,
Aseem

On Fri, Sep 14, 2012 at 4:43 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Hi,
>
> When do you know the keys to ignore ? You mentioned "after the map stage"
> .. is this at the end of each map task, or at the end of all map tasks ?
>
> Thanks
> hemanth
>
>
> On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <as...@gmail.com>wrote:
>
>> Hi,
>> Is there anyway I can ignore all keys except a certain key ( determined
>> after the map stage) to start only 1 reduce job using a partitioner? If so
>> could someone suggest such a method.
>>
>> Regards,
>> Aseem
>>
>>
>

Re: Ignore keys while scheduling reduce jobs

Posted by Aseem Anand <as...@gmail.com>.

Hi,
Consider it to be a single iteration Kmeans clustering job such that I only
wish to schedule reduce jobs for the clusterId(the key for a Kmeans) of the
cluster corresponding to the 1st point in the dataset.
I wish to check the clusterId of the first point in the input file and get
reduce jobs only for that specific clusterId.

I think we shall have to wait for all mappers to end.

Thanks,
Aseem

On Fri, Sep 14, 2012 at 4:43 PM, Hemanth Yamijala <yhemanth@thoughtworks.com
> wrote:

> Hi,
>
> When do you know the keys to ignore ? You mentioned "after the map stage"
> .. is this at the end of each map task, or at the end of all map tasks ?
>
> Thanks
> hemanth
>
>
> On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <as...@gmail.com>wrote:
>
>> Hi,
>> Is there anyway I can ignore all keys except a certain key ( determined
>> after the map stage) to start only 1 reduce job using a partitioner? If so
>> could someone suggest such a method.
>>
>> Regards,
>> Aseem
>>
>>
>

Re: Ignore keys while scheduling reduce jobs

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

When do you know the keys to ignore ? You mentioned "after the map stage"
.. is this at the end of each map task, or at the end of all map tasks ?

Thanks
hemanth

On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <as...@gmail.com> wrote:

> Hi,
> Is there anyway I can ignore all keys except a certain key ( determined
> after the map stage) to start only 1 reduce job using a partitioner? If so
> could someone suggest such a method.
>
> Regards,
> Aseem
>
>

Re: Ignore keys while scheduling reduce jobs

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

When do you know the keys to ignore ? You mentioned "after the map stage"
.. is this at the end of each map task, or at the end of all map tasks ?

Thanks
hemanth

On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <as...@gmail.com> wrote:

> Hi,
> Is there anyway I can ignore all keys except a certain key ( determined
> after the map stage) to start only 1 reduce job using a partitioner? If so
> could someone suggest such a method.
>
> Regards,
> Aseem
>
>

Re: Ignore keys while scheduling reduce jobs

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

When do you know the keys to ignore ? You mentioned "after the map stage"
.. is this at the end of each map task, or at the end of all map tasks ?

Thanks
hemanth

On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <as...@gmail.com> wrote:

> Hi,
> Is there anyway I can ignore all keys except a certain key ( determined
> after the map stage) to start only 1 reduce job using a partitioner? If so
> could someone suggest such a method.
>
> Regards,
> Aseem
>
>

Re: Ignore keys while scheduling reduce jobs

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.

Hi,

When do you know the keys to ignore ? You mentioned "after the map stage"
.. is this at the end of each map task, or at the end of all map tasks ?

Thanks
hemanth

On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand <as...@gmail.com> wrote:

> Hi,
> Is there anyway I can ignore all keys except a certain key ( determined
> after the map stage) to start only 1 reduce job using a partitioner? If so
> could someone suggest such a method.
>
> Regards,
> Aseem
>
>