You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Craig Macdonald <cr...@dcs.gla.ac.uk> on 2009/04/03 14:36:54 UTC
best practice: mapred.local vs dfs drives
Hello all,
Following recent hardware discussions, I thought I'd ask a related
question. Our cluster nodes have 3 drives: 1x 160GB system/scratch and
2x 500GB DFS drives.
The 160GB system drive is partitioned such that 100GB is for job
mapred.local space. However, we find that for our application,
mapred.local free space for map output space is the limiting parameter
on the number of reducers we can have (our application prefers less
reducers).
How do people normally work for dfs vs mapred.local space. Do you (a)
share the DFS drives with the task tracker temporary files, Or do you
(b) keep them on separate partitions or drives?
We originally went with (b) because it prevented a run-away job from
eating all the DFS space on the machine, however, I'm beginning to
realise the disadvantages.
Any comments?
Thanks
Craig
Re: best practice: mapred.local vs dfs drives
Posted by Craig Macdonald <cr...@dcs.gla.ac.uk>.
Thanks for the headsup.
C
Owen O'Malley wrote:
> We always share the drives.
>
> -- Owen
>
> On Apr 5, 2009, at 0:52, zsongbo <zs...@gmail.com> wrote:
>
>> I usually set mapred.local.dir to share the disk space with DFS,
>> since some
>> mapreduce job need big temp space.
>>
>>
>>
>> On Fri, Apr 3, 2009 at 8:36 PM, Craig Macdonald
>> <cr...@dcs.gla.ac.uk>wrote:
>>
>>> Hello all,
>>>
>>> Following recent hardware discussions, I thought I'd ask a related
>>> question. Our cluster nodes have 3 drives: 1x 160GB system/scratch
>>> and 2x
>>> 500GB DFS drives.
>>>
>>> The 160GB system drive is partitioned such that 100GB is for job
>>> mapred.local space. However, we find that for our application,
>>> mapred.local
>>> free space for map output space is the limiting parameter on the
>>> number of
>>> reducers we can have (our application prefers less reducers).
>>>
>>> How do people normally work for dfs vs mapred.local space. Do you
>>> (a) share
>>> the DFS drives with the task tracker temporary files, Or do you (b)
>>> keep
>>> them on separate partitions or drives?
>>>
>>> We originally went with (b) because it prevented a run-away job from
>>> eating
>>> all the DFS space on the machine, however, I'm beginning to realise the
>>> disadvantages.
>>>
>>> Any comments?
>>>
>>> Thanks
>>>
>>> Craig
>>>
>>>
Re: best practice: mapred.local vs dfs drives
Posted by Owen O'Malley <ow...@gmail.com>.
We always share the drives.
-- Owen
On Apr 5, 2009, at 0:52, zsongbo <zs...@gmail.com> wrote:
> I usually set mapred.local.dir to share the disk space with DFS,
> since some
> mapreduce job need big temp space.
>
>
>
> On Fri, Apr 3, 2009 at 8:36 PM, Craig Macdonald
> <cr...@dcs.gla.ac.uk>wrote:
>
>> Hello all,
>>
>> Following recent hardware discussions, I thought I'd ask a related
>> question. Our cluster nodes have 3 drives: 1x 160GB system/scratch
>> and 2x
>> 500GB DFS drives.
>>
>> The 160GB system drive is partitioned such that 100GB is for job
>> mapred.local space. However, we find that for our application,
>> mapred.local
>> free space for map output space is the limiting parameter on the
>> number of
>> reducers we can have (our application prefers less reducers).
>>
>> How do people normally work for dfs vs mapred.local space. Do you
>> (a) share
>> the DFS drives with the task tracker temporary files, Or do you (b)
>> keep
>> them on separate partitions or drives?
>>
>> We originally went with (b) because it prevented a run-away job
>> from eating
>> all the DFS space on the machine, however, I'm beginning to realise
>> the
>> disadvantages.
>>
>> Any comments?
>>
>> Thanks
>>
>> Craig
>>
>>
Re: best practice: mapred.local vs dfs drives
Posted by zsongbo <zs...@gmail.com>.
I usually set mapred.local.dir to share the disk space with DFS, since some
mapreduce job need big temp space.
On Fri, Apr 3, 2009 at 8:36 PM, Craig Macdonald <cr...@dcs.gla.ac.uk>wrote:
> Hello all,
>
> Following recent hardware discussions, I thought I'd ask a related
> question. Our cluster nodes have 3 drives: 1x 160GB system/scratch and 2x
> 500GB DFS drives.
>
> The 160GB system drive is partitioned such that 100GB is for job
> mapred.local space. However, we find that for our application, mapred.local
> free space for map output space is the limiting parameter on the number of
> reducers we can have (our application prefers less reducers).
>
> How do people normally work for dfs vs mapred.local space. Do you (a) share
> the DFS drives with the task tracker temporary files, Or do you (b) keep
> them on separate partitions or drives?
>
> We originally went with (b) because it prevented a run-away job from eating
> all the DFS space on the machine, however, I'm beginning to realise the
> disadvantages.
>
> Any comments?
>
> Thanks
>
> Craig
>
>