You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jimmy Wan <ji...@indeed.com> on 2008/03/06 18:56:08 UTC

Does Hadoop Honor Reserved Space?

I've got 2 datanodes setup with the following configuration parameter:
	<property>
	  <name>dfs.datanode.du.reserved</name>
	  <value>429496729600</value>
	  <description>Reserved space in bytes per volume. Always leave this much  
space free for non dfs use.
	  </description>
	</property>

Both are housed on 800GB volumes, so I thought this would keep about half  
the volume free for non-HDFS usage.

After some long running jobs last night, both disk volumes were completely  
filled. The bulk of the data was in:
${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data

This is running as the user hadoop.

Am I interpretting these parameters incorrectly?

I noticed this issue, but it is marked as closed:  
http://issues.apache.org/jira/browse/HADOOP-2549

-- 
Jimmy

Re: Does Hadoop Honor Reserved Space?

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.
Hi Pete, Joydeep,

These sound like thoughts that could lead to excellent suggestions  
with a little more investment of your time.

We'd love it if you could invest some effort into contributing to the  
release process!  Hadoop is open source and becoming active  
contributors is the best possible way to address shortcomings that  
impact your organization.

Thanks for your help!

E14



On Mar 10, 2008, at 8:43 PM, Pete Wyckoff wrote:

>
> +1
>
> (obviously :))
>
>
> On 3/10/08 5:26 PM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:
>
>> I have left some comments behind on the jira.
>>
>> We could argue over what's the right thing to do (and we will on the
>> Jira) - but the higher level problem is that this is another case  
>> where
>> backwards compatibility with existing semantics of this option was  
>> not
>> carried over. Neither was there any notification to admins about this
>> change. The change notes just do not convey the import of this  
>> change to
>> existing deployments (incidentally 1463 was classified as 'Bug Fix' -
>> not that putting under 'Incompatible Fix' would have helped imho).
>>
>> Would request the board/committers to consider setting up something
>> along the lines of:
>>
>> 1. have something better than Change Notes to convey interface  
>> changes
>> 2. a field in the JIRA that marks it out as important from interface
>> change point of view (with notes on what's changing). This could  
>> be used
>> to auto-populate #1
>> 3. Some way of auto-subscribing to bugs that are causing interface
>> changes (even an email filter on the jira mails would do).
>>
>> As Hadoop user base keeps growing - and gets used for 'production'  
>> tasks
>> - I think it's absolutely essential that users/admins can keep in  
>> tune
>> with changes that affect their deployments. Otherwise - any  
>> organization
>> other than Yahoo would have tough time upgrading.
>>
>> (I am new to open-source - but surely this has been solved before?)
>>
>> Joydeep
>>
>> -----Original Message-----
>> From: Hairong Kuang [mailto:hairong@yahoo-inc.com]
>> Sent: Monday, March 10, 2008 5:17 PM
>> To: core-user@hadoop.apache.org
>> Subject: Re: Does Hadoop Honor Reserved Space?
>>
>> I think you have a misunderstanding of the reserved parameter. As I
>> commented on hadoop-1463, remember that dfs.du.reserve is the  
>> space for
>> non-dfs usage, including the space for map/reduce, other  
>> application, fs
>> meta-data etc. In your case since /usr already takes 45GB, it far
>> exceeds
>> the reserved limit 1G. You should set the reserved space to be 50G.
>>
>> Hairong
>>
>>
>> On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:
>>
>>> Filed https://issues.apache.org/jira/browse/HADOOP-2991
>>>
>>> -----Original Message-----
>>> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
>>> Sent: Monday, March 10, 2008 12:56 PM
>>> To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
>>> Cc: Pete Wyckoff
>>> Subject: RE: Does Hadoop Honor Reserved Space?
>>>
>>> folks - Jimmy is right - as we have unfortunately hit it as well:
>>>
>>> https://issues.apache.org/jira/browse/HADOOP-1463 caused a  
>>> regression.
>>> we have left some comments on the bug - but can't reopen it.
>>>
>>> this is going to be affecting all 0.15 and 0.16 deployments!
>>>
>>>
>>> -----Original Message-----
>>> From: Hairong Kuang [mailto:hairong@yahoo-inc.com]
>>> Sent: Thu 3/6/2008 2:01 PM
>>> To: core-user@hadoop.apache.org
>>> Subject: Re: Does Hadoop Honor Reserved Space?
>>>
>>> In addition to the version, could you please send us a copy of the
>>> datanode
>>> report by running the command bin/hadoop dfsadmin -report?
>>>
>>> Thanks,
>>> Hairong
>>>
>>>
>>> On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <js...@facebook.com>  
>>> wrote:
>>>
>>>> but intermediate data is stored in a different directory from
>> dfs/data
>>>> (something like mapred/local by default i think).
>>>>
>>>> what version are u running?
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Ashwinder Ahluwalia on behalf of ahluwalia5@yahoo.com
>>>> Sent: Thu 3/6/2008 10:14 AM
>>>> To: core-user@hadoop.apache.org
>>>> Subject: RE: Does Hadoop Honor Reserved Space?
>>>>
>>>> I've run into a similar issue in the past. From what I understand,
>>> this
>>>> parameter only controls the HDFS space usage. However, the
>>> intermediate data
>>>> in
>>>> the map reduce job is stored on the local file system (not HDFS)  
>>>> and
>>> is not
>>>> subject to this configuration.
>>>>
>>>> In the past I have used mapred.local.dir.minspacekill and
>>>> mapred.local.dir.minspacestart to control the amount of space  
>>>> that is
>>>> allowable
>>>> for use by this temporary data.
>>>>
>>>> Not sure if that is the best approach though, so I'd love to hear
>> what
>>> other
>>>> people have done. In your case, you have a map-red job that will
>>> consume too
>>>> much space (without setting a limit, you didn't have enough disk
>>> capacity for
>>>> the job), so looking at mapred.output.compress and
>>> mapred.compress.map.output
>>>> might be useful to decrease the job's disk requirements.
>>>>
>>>> --Ash
>>>>
>>>> -----Original Message-----
>>>> From: Jimmy Wan [mailto:jimmy@indeed.com]
>>>> Sent: Thursday, March 06, 2008 9:56 AM
>>>> To: core-user@hadoop.apache.org
>>>> Subject: Does Hadoop Honor Reserved Space?
>>>>
>>>> I've got 2 datanodes setup with the following configuration
>> parameter:
>>>> <property>
>>>>  <name>dfs.datanode.du.reserved</name>
>>>>  <value>429496729600</value>
>>>>  <description>Reserved space in bytes per volume. Always leave this
>>>> much
>>>> space free for non dfs use.
>>>>  </description>
>>>> </property>
>>>>
>>>> Both are housed on 800GB volumes, so I thought this would keep  
>>>> about
>>> half
>>>> the volume free for non-HDFS usage.
>>>>
>>>> After some long running jobs last night, both disk volumes were
>>> completely
>>>> filled. The bulk of the data was in:
>>>> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
>>>>
>>>> This is running as the user hadoop.
>>>>
>>>> Am I interpretting these parameters incorrectly?
>>>>
>>>> I noticed this issue, but it is marked as closed:
>>>> http://issues.apache.org/jira/browse/HADOOP-2549
>>>
>>>
>>>
>>
>


Re: Does Hadoop Honor Reserved Space?

Posted by Pete Wyckoff <pw...@facebook.com>.
+1

(obviously :))


On 3/10/08 5:26 PM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:

> I have left some comments behind on the jira.
> 
> We could argue over what's the right thing to do (and we will on the
> Jira) - but the higher level problem is that this is another case where
> backwards compatibility with existing semantics of this option was not
> carried over. Neither was there any notification to admins about this
> change. The change notes just do not convey the import of this change to
> existing deployments (incidentally 1463 was classified as 'Bug Fix' -
> not that putting under 'Incompatible Fix' would have helped imho).
> 
> Would request the board/committers to consider setting up something
> along the lines of:
> 
> 1. have something better than Change Notes to convey interface changes
> 2. a field in the JIRA that marks it out as important from interface
> change point of view (with notes on what's changing). This could be used
> to auto-populate #1
> 3. Some way of auto-subscribing to bugs that are causing interface
> changes (even an email filter on the jira mails would do).
> 
> As Hadoop user base keeps growing - and gets used for 'production' tasks
> - I think it's absolutely essential that users/admins can keep in tune
> with changes that affect their deployments. Otherwise - any organization
> other than Yahoo would have tough time upgrading.
> 
> (I am new to open-source - but surely this has been solved before?)
> 
> Joydeep
> 
> -----Original Message-----
> From: Hairong Kuang [mailto:hairong@yahoo-inc.com]
> Sent: Monday, March 10, 2008 5:17 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Does Hadoop Honor Reserved Space?
> 
> I think you have a misunderstanding of the reserved parameter. As I
> commented on hadoop-1463, remember that dfs.du.reserve is the space for
> non-dfs usage, including the space for map/reduce, other application, fs
> meta-data etc. In your case since /usr already takes 45GB, it far
> exceeds
> the reserved limit 1G. You should set the reserved space to be 50G.
> 
> Hairong
> 
> 
> On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:
> 
>> Filed https://issues.apache.org/jira/browse/HADOOP-2991
>> 
>> -----Original Message-----
>> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
>> Sent: Monday, March 10, 2008 12:56 PM
>> To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
>> Cc: Pete Wyckoff
>> Subject: RE: Does Hadoop Honor Reserved Space?
>> 
>> folks - Jimmy is right - as we have unfortunately hit it as well:
>> 
>> https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression.
>> we have left some comments on the bug - but can't reopen it.
>> 
>> this is going to be affecting all 0.15 and 0.16 deployments!
>> 
>> 
>> -----Original Message-----
>> From: Hairong Kuang [mailto:hairong@yahoo-inc.com]
>> Sent: Thu 3/6/2008 2:01 PM
>> To: core-user@hadoop.apache.org
>> Subject: Re: Does Hadoop Honor Reserved Space?
>>  
>> In addition to the version, could you please send us a copy of the
>> datanode
>> report by running the command bin/hadoop dfsadmin -report?
>> 
>> Thanks,
>> Hairong
>> 
>> 
>> On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:
>> 
>>> but intermediate data is stored in a different directory from
> dfs/data
>>> (something like mapred/local by default i think).
>>> 
>>> what version are u running?
>>> 
>>> 
>>> -----Original Message-----
>>> From: Ashwinder Ahluwalia on behalf of ahluwalia5@yahoo.com
>>> Sent: Thu 3/6/2008 10:14 AM
>>> To: core-user@hadoop.apache.org
>>> Subject: RE: Does Hadoop Honor Reserved Space?
>>>  
>>> I've run into a similar issue in the past. From what I understand,
>> this
>>> parameter only controls the HDFS space usage. However, the
>> intermediate data
>>> in
>>> the map reduce job is stored on the local file system (not HDFS) and
>> is not
>>> subject to this configuration.
>>> 
>>> In the past I have used mapred.local.dir.minspacekill and
>>> mapred.local.dir.minspacestart to control the amount of space that is
>>> allowable
>>> for use by this temporary data.
>>> 
>>> Not sure if that is the best approach though, so I'd love to hear
> what
>> other
>>> people have done. In your case, you have a map-red job that will
>> consume too
>>> much space (without setting a limit, you didn't have enough disk
>> capacity for
>>> the job), so looking at mapred.output.compress and
>> mapred.compress.map.output
>>> might be useful to decrease the job's disk requirements.
>>> 
>>> --Ash
>>> 
>>> -----Original Message-----
>>> From: Jimmy Wan [mailto:jimmy@indeed.com]
>>> Sent: Thursday, March 06, 2008 9:56 AM
>>> To: core-user@hadoop.apache.org
>>> Subject: Does Hadoop Honor Reserved Space?
>>> 
>>> I've got 2 datanodes setup with the following configuration
> parameter:
>>> <property>
>>>  <name>dfs.datanode.du.reserved</name>
>>>  <value>429496729600</value>
>>>  <description>Reserved space in bytes per volume. Always leave this
>>> much  
>>> space free for non dfs use.
>>>  </description>
>>> </property>
>>> 
>>> Both are housed on 800GB volumes, so I thought this would keep about
>> half
>>> the volume free for non-HDFS usage.
>>> 
>>> After some long running jobs last night, both disk volumes were
>> completely
>>> filled. The bulk of the data was in:
>>> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
>>> 
>>> This is running as the user hadoop.
>>> 
>>> Am I interpretting these parameters incorrectly?
>>> 
>>> I noticed this issue, but it is marked as closed:
>>> http://issues.apache.org/jira/browse/HADOOP-2549
>> 
>> 
>> 
> 


RE: Does Hadoop Honor Reserved Space?

Posted by Joydeep Sen Sarma <js...@facebook.com>.
I have left some comments behind on the jira.

We could argue over what's the right thing to do (and we will on the
Jira) - but the higher level problem is that this is another case where
backwards compatibility with existing semantics of this option was not
carried over. Neither was there any notification to admins about this
change. The change notes just do not convey the import of this change to
existing deployments (incidentally 1463 was classified as 'Bug Fix' -
not that putting under 'Incompatible Fix' would have helped imho).

Would request the board/committers to consider setting up something
along the lines of:

1. have something better than Change Notes to convey interface changes
2. a field in the JIRA that marks it out as important from interface
change point of view (with notes on what's changing). This could be used
to auto-populate #1
3. Some way of auto-subscribing to bugs that are causing interface
changes (even an email filter on the jira mails would do).

As Hadoop user base keeps growing - and gets used for 'production' tasks
- I think it's absolutely essential that users/admins can keep in tune
with changes that affect their deployments. Otherwise - any organization
other than Yahoo would have tough time upgrading.

(I am new to open-source - but surely this has been solved before?)

Joydeep

-----Original Message-----
From: Hairong Kuang [mailto:hairong@yahoo-inc.com] 
Sent: Monday, March 10, 2008 5:17 PM
To: core-user@hadoop.apache.org
Subject: Re: Does Hadoop Honor Reserved Space?

I think you have a misunderstanding of the reserved parameter. As I
commented on hadoop-1463, remember that dfs.du.reserve is the space for
non-dfs usage, including the space for map/reduce, other application, fs
meta-data etc. In your case since /usr already takes 45GB, it far
exceeds
the reserved limit 1G. You should set the reserved space to be 50G.

Hairong


On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:

> Filed https://issues.apache.org/jira/browse/HADOOP-2991
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Monday, March 10, 2008 12:56 PM
> To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
> Cc: Pete Wyckoff
> Subject: RE: Does Hadoop Honor Reserved Space?
> 
> folks - Jimmy is right - as we have unfortunately hit it as well:
> 
> https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression.
> we have left some comments on the bug - but can't reopen it.
> 
> this is going to be affecting all 0.15 and 0.16 deployments!
> 
> 
> -----Original Message-----
> From: Hairong Kuang [mailto:hairong@yahoo-inc.com]
> Sent: Thu 3/6/2008 2:01 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Does Hadoop Honor Reserved Space?
>  
> In addition to the version, could you please send us a copy of the
> datanode
> report by running the command bin/hadoop dfsadmin -report?
> 
> Thanks,
> Hairong
> 
> 
> On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:
> 
>> but intermediate data is stored in a different directory from
dfs/data
>> (something like mapred/local by default i think).
>> 
>> what version are u running?
>> 
>> 
>> -----Original Message-----
>> From: Ashwinder Ahluwalia on behalf of ahluwalia5@yahoo.com
>> Sent: Thu 3/6/2008 10:14 AM
>> To: core-user@hadoop.apache.org
>> Subject: RE: Does Hadoop Honor Reserved Space?
>>  
>> I've run into a similar issue in the past. From what I understand,
> this
>> parameter only controls the HDFS space usage. However, the
> intermediate data
>> in
>> the map reduce job is stored on the local file system (not HDFS) and
> is not
>> subject to this configuration.
>> 
>> In the past I have used mapred.local.dir.minspacekill and
>> mapred.local.dir.minspacestart to control the amount of space that is
>> allowable
>> for use by this temporary data.
>> 
>> Not sure if that is the best approach though, so I'd love to hear
what
> other
>> people have done. In your case, you have a map-red job that will
> consume too
>> much space (without setting a limit, you didn't have enough disk
> capacity for
>> the job), so looking at mapred.output.compress and
> mapred.compress.map.output
>> might be useful to decrease the job's disk requirements.
>> 
>> --Ash
>> 
>> -----Original Message-----
>> From: Jimmy Wan [mailto:jimmy@indeed.com]
>> Sent: Thursday, March 06, 2008 9:56 AM
>> To: core-user@hadoop.apache.org
>> Subject: Does Hadoop Honor Reserved Space?
>> 
>> I've got 2 datanodes setup with the following configuration
parameter:
>> <property>
>>  <name>dfs.datanode.du.reserved</name>
>>  <value>429496729600</value>
>>  <description>Reserved space in bytes per volume. Always leave this
>> much  
>> space free for non dfs use.
>>  </description>
>> </property>
>> 
>> Both are housed on 800GB volumes, so I thought this would keep about
> half
>> the volume free for non-HDFS usage.
>> 
>> After some long running jobs last night, both disk volumes were
> completely
>> filled. The bulk of the data was in:
>> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
>> 
>> This is running as the user hadoop.
>> 
>> Am I interpretting these parameters incorrectly?
>> 
>> I noticed this issue, but it is marked as closed:
>> http://issues.apache.org/jira/browse/HADOOP-2549
> 
> 
> 


Re: Does Hadoop Honor Reserved Space?

Posted by Hairong Kuang <ha...@yahoo-inc.com>.
I think you have a misunderstanding of the reserved parameter. As I
commented on hadoop-1463, remember that dfs.du.reserve is the space for
non-dfs usage, including the space for map/reduce, other application, fs
meta-data etc. In your case since /usr already takes 45GB, it far exceeds
the reserved limit 1G. You should set the reserved space to be 50G.

Hairong


On 3/10/08 4:54 PM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:

> Filed https://issues.apache.org/jira/browse/HADOOP-2991
> 
> -----Original Message-----
> From: Joydeep Sen Sarma [mailto:jssarma@facebook.com]
> Sent: Monday, March 10, 2008 12:56 PM
> To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
> Cc: Pete Wyckoff
> Subject: RE: Does Hadoop Honor Reserved Space?
> 
> folks - Jimmy is right - as we have unfortunately hit it as well:
> 
> https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression.
> we have left some comments on the bug - but can't reopen it.
> 
> this is going to be affecting all 0.15 and 0.16 deployments!
> 
> 
> -----Original Message-----
> From: Hairong Kuang [mailto:hairong@yahoo-inc.com]
> Sent: Thu 3/6/2008 2:01 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Does Hadoop Honor Reserved Space?
>  
> In addition to the version, could you please send us a copy of the
> datanode
> report by running the command bin/hadoop dfsadmin -report?
> 
> Thanks,
> Hairong
> 
> 
> On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:
> 
>> but intermediate data is stored in a different directory from dfs/data
>> (something like mapred/local by default i think).
>> 
>> what version are u running?
>> 
>> 
>> -----Original Message-----
>> From: Ashwinder Ahluwalia on behalf of ahluwalia5@yahoo.com
>> Sent: Thu 3/6/2008 10:14 AM
>> To: core-user@hadoop.apache.org
>> Subject: RE: Does Hadoop Honor Reserved Space?
>>  
>> I've run into a similar issue in the past. From what I understand,
> this
>> parameter only controls the HDFS space usage. However, the
> intermediate data
>> in
>> the map reduce job is stored on the local file system (not HDFS) and
> is not
>> subject to this configuration.
>> 
>> In the past I have used mapred.local.dir.minspacekill and
>> mapred.local.dir.minspacestart to control the amount of space that is
>> allowable
>> for use by this temporary data.
>> 
>> Not sure if that is the best approach though, so I'd love to hear what
> other
>> people have done. In your case, you have a map-red job that will
> consume too
>> much space (without setting a limit, you didn't have enough disk
> capacity for
>> the job), so looking at mapred.output.compress and
> mapred.compress.map.output
>> might be useful to decrease the job's disk requirements.
>> 
>> --Ash
>> 
>> -----Original Message-----
>> From: Jimmy Wan [mailto:jimmy@indeed.com]
>> Sent: Thursday, March 06, 2008 9:56 AM
>> To: core-user@hadoop.apache.org
>> Subject: Does Hadoop Honor Reserved Space?
>> 
>> I've got 2 datanodes setup with the following configuration parameter:
>> <property>
>>  <name>dfs.datanode.du.reserved</name>
>>  <value>429496729600</value>
>>  <description>Reserved space in bytes per volume. Always leave this
>> much  
>> space free for non dfs use.
>>  </description>
>> </property>
>> 
>> Both are housed on 800GB volumes, so I thought this would keep about
> half
>> the volume free for non-HDFS usage.
>> 
>> After some long running jobs last night, both disk volumes were
> completely
>> filled. The bulk of the data was in:
>> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
>> 
>> This is running as the user hadoop.
>> 
>> Am I interpretting these parameters incorrectly?
>> 
>> I noticed this issue, but it is marked as closed:
>> http://issues.apache.org/jira/browse/HADOOP-2549
> 
> 
> 


RE: Does Hadoop Honor Reserved Space?

Posted by Joydeep Sen Sarma <js...@facebook.com>.
Filed https://issues.apache.org/jira/browse/HADOOP-2991

-----Original Message-----
From: Joydeep Sen Sarma [mailto:jssarma@facebook.com] 
Sent: Monday, March 10, 2008 12:56 PM
To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
Cc: Pete Wyckoff
Subject: RE: Does Hadoop Honor Reserved Space?

folks - Jimmy is right - as we have unfortunately hit it as well:

https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression.
we have left some comments on the bug - but can't reopen it.

this is going to be affecting all 0.15 and 0.16 deployments!


-----Original Message-----
From: Hairong Kuang [mailto:hairong@yahoo-inc.com]
Sent: Thu 3/6/2008 2:01 PM
To: core-user@hadoop.apache.org
Subject: Re: Does Hadoop Honor Reserved Space?
 
In addition to the version, could you please send us a copy of the
datanode
report by running the command bin/hadoop dfsadmin -report?

Thanks,
Hairong


On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:

> but intermediate data is stored in a different directory from dfs/data
> (something like mapred/local by default i think).
> 
> what version are u running?
> 
> 
> -----Original Message-----
> From: Ashwinder Ahluwalia on behalf of ahluwalia5@yahoo.com
> Sent: Thu 3/6/2008 10:14 AM
> To: core-user@hadoop.apache.org
> Subject: RE: Does Hadoop Honor Reserved Space?
>  
> I've run into a similar issue in the past. From what I understand,
this
> parameter only controls the HDFS space usage. However, the
intermediate data
> in
> the map reduce job is stored on the local file system (not HDFS) and
is not
> subject to this configuration.
> 
> In the past I have used mapred.local.dir.minspacekill and
> mapred.local.dir.minspacestart to control the amount of space that is
> allowable
> for use by this temporary data.
> 
> Not sure if that is the best approach though, so I'd love to hear what
other
> people have done. In your case, you have a map-red job that will
consume too
> much space (without setting a limit, you didn't have enough disk
capacity for
> the job), so looking at mapred.output.compress and
mapred.compress.map.output
> might be useful to decrease the job's disk requirements.
> 
> --Ash
> 
> -----Original Message-----
> From: Jimmy Wan [mailto:jimmy@indeed.com]
> Sent: Thursday, March 06, 2008 9:56 AM
> To: core-user@hadoop.apache.org
> Subject: Does Hadoop Honor Reserved Space?
> 
> I've got 2 datanodes setup with the following configuration parameter:
> <property>
>  <name>dfs.datanode.du.reserved</name>
>  <value>429496729600</value>
>  <description>Reserved space in bytes per volume. Always leave this
> much  
> space free for non dfs use.
>  </description>
> </property>
> 
> Both are housed on 800GB volumes, so I thought this would keep about
half
> the volume free for non-HDFS usage.
> 
> After some long running jobs last night, both disk volumes were
completely
> filled. The bulk of the data was in:
> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
> 
> This is running as the user hadoop.
> 
> Am I interpretting these parameters incorrectly?
> 
> I noticed this issue, but it is marked as closed:
> http://issues.apache.org/jira/browse/HADOOP-2549




RE: Does Hadoop Honor Reserved Space?

Posted by Joydeep Sen Sarma <js...@facebook.com>.
folks - Jimmy is right - as we have unfortunately hit it as well:

https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression. we have left some comments on the bug - but can't reopen it.

this is going to be affecting all 0.15 and 0.16 deployments!


-----Original Message-----
From: Hairong Kuang [mailto:hairong@yahoo-inc.com]
Sent: Thu 3/6/2008 2:01 PM
To: core-user@hadoop.apache.org
Subject: Re: Does Hadoop Honor Reserved Space?
 
In addition to the version, could you please send us a copy of the datanode
report by running the command bin/hadoop dfsadmin -report?

Thanks,
Hairong


On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:

> but intermediate data is stored in a different directory from dfs/data
> (something like mapred/local by default i think).
> 
> what version are u running?
> 
> 
> -----Original Message-----
> From: Ashwinder Ahluwalia on behalf of ahluwalia5@yahoo.com
> Sent: Thu 3/6/2008 10:14 AM
> To: core-user@hadoop.apache.org
> Subject: RE: Does Hadoop Honor Reserved Space?
>  
> I've run into a similar issue in the past. From what I understand, this
> parameter only controls the HDFS space usage. However, the intermediate data
> in
> the map reduce job is stored on the local file system (not HDFS) and is not
> subject to this configuration.
> 
> In the past I have used mapred.local.dir.minspacekill and
> mapred.local.dir.minspacestart to control the amount of space that is
> allowable
> for use by this temporary data.
> 
> Not sure if that is the best approach though, so I'd love to hear what other
> people have done. In your case, you have a map-red job that will consume too
> much space (without setting a limit, you didn't have enough disk capacity for
> the job), so looking at mapred.output.compress and mapred.compress.map.output
> might be useful to decrease the job's disk requirements.
> 
> --Ash
> 
> -----Original Message-----
> From: Jimmy Wan [mailto:jimmy@indeed.com]
> Sent: Thursday, March 06, 2008 9:56 AM
> To: core-user@hadoop.apache.org
> Subject: Does Hadoop Honor Reserved Space?
> 
> I've got 2 datanodes setup with the following configuration parameter:
> <property>
>  <name>dfs.datanode.du.reserved</name>
>  <value>429496729600</value>
>  <description>Reserved space in bytes per volume. Always leave this
> much  
> space free for non dfs use.
>  </description>
> </property>
> 
> Both are housed on 800GB volumes, so I thought this would keep about half
> the volume free for non-HDFS usage.
> 
> After some long running jobs last night, both disk volumes were completely
> filled. The bulk of the data was in:
> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
> 
> This is running as the user hadoop.
> 
> Am I interpretting these parameters incorrectly?
> 
> I noticed this issue, but it is marked as closed:
> http://issues.apache.org/jira/browse/HADOOP-2549




Re: Does Hadoop Honor Reserved Space?

Posted by Jimmy Wan <ji...@indeed.com>.
Unfortunately, I had to clean up my HDFS in order to get some work done,  
but
I was running Hadoop on Hadoop 0.16.0 running on a Linux box. My  
configuration is
two machines. One has the JobTracker/NameNode and a TaskTracker instance  
all
running on the same machine. The other machine is just running a  
TaskTracker.
Replication was set to 2 for the default and the max.

-- 
Jimmy

On Thu, 06 Mar 2008 16:01:16 -0600, Hairong Kuang <ha...@yahoo-inc.com>  
wrote:

> In addition to the version, could you please send us a copy of the  
> datanode
> report by running the command bin/hadoop dfsadmin -report?
>
> Thanks,
> Hairong
>
>
> On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:
>
>> but intermediate data is stored in a different directory from dfs/data
>> (something like mapred/local by default i think).
>>
>> what version are u running?
>>
>>
>> -----Original Message-----
>> From: Ashwinder Ahluwalia on behalf of ahluwalia5@yahoo.com
>> Sent: Thu 3/6/2008 10:14 AM
>> To: core-user@hadoop.apache.org
>> Subject: RE: Does Hadoop Honor Reserved Space?
>>
>> I've run into a similar issue in the past. From what I understand, this
>> parameter only controls the HDFS space usage. However, the intermediate  
>> data in the map reduce job is stored on the local file system (not
>> HDFS) and is not subject to this configuration.
>>
>> In the past I have used mapred.local.dir.minspacekill and
>> mapred.local.dir.minspacestart to control the amount of space that is
>> allowable for use by this temporary data.
>>
>> Not sure if that is the best approach though, so I'd love to hear what  
>> other people have done. In your case, you have a map-red job that will  
>> consume too much space (without setting a limit, you didn't have enough
>> disk capacity for the job), so looking at mapred.output.compress and
>> mapred.compress.map.output might be useful to decrease the job's disk
>> requirements.
>>
>> --Ash
>>
>> -----Original Message-----
>> From: Jimmy Wan [mailto:jimmy@indeed.com]
>> Sent: Thursday, March 06, 2008 9:56 AM
>> To: core-user@hadoop.apache.org
>> Subject: Does Hadoop Honor Reserved Space?
>>
>> I've got 2 datanodes setup with the following configuration parameter:
>> <property>
>>  <name>dfs.datanode.du.reserved</name>
>>  <value>429496729600</value>
>>  <description>Reserved space in bytes per volume. Always leave this
>> much
>> space free for non dfs use.
>>  </description>
>> </property>
>>
>> Both are housed on 800GB volumes, so I thought this would keep about  
>> half
>> the volume free for non-HDFS usage.
>>
>> After some long running jobs last night, both disk volumes were  
>> completely
>> filled. The bulk of the data was in:
>> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
>>
>> This is running as the user hadoop.
>>
>> Am I interpretting these parameters incorrectly?
>>
>> I noticed this issue, but it is marked as closed:
>> http://issues.apache.org/jira/browse/HADOOP-2549

Re: Does Hadoop Honor Reserved Space?

Posted by Hairong Kuang <ha...@yahoo-inc.com>.
In addition to the version, could you please send us a copy of the datanode
report by running the command bin/hadoop dfsadmin -report?

Thanks,
Hairong


On 3/6/08 11:56 AM, "Joydeep Sen Sarma" <js...@facebook.com> wrote:

> but intermediate data is stored in a different directory from dfs/data
> (something like mapred/local by default i think).
> 
> what version are u running?
> 
> 
> -----Original Message-----
> From: Ashwinder Ahluwalia on behalf of ahluwalia5@yahoo.com
> Sent: Thu 3/6/2008 10:14 AM
> To: core-user@hadoop.apache.org
> Subject: RE: Does Hadoop Honor Reserved Space?
>  
> I've run into a similar issue in the past. From what I understand, this
> parameter only controls the HDFS space usage. However, the intermediate data
> in
> the map reduce job is stored on the local file system (not HDFS) and is not
> subject to this configuration.
> 
> In the past I have used mapred.local.dir.minspacekill and
> mapred.local.dir.minspacestart to control the amount of space that is
> allowable
> for use by this temporary data.
> 
> Not sure if that is the best approach though, so I'd love to hear what other
> people have done. In your case, you have a map-red job that will consume too
> much space (without setting a limit, you didn't have enough disk capacity for
> the job), so looking at mapred.output.compress and mapred.compress.map.output
> might be useful to decrease the job's disk requirements.
> 
> --Ash
> 
> -----Original Message-----
> From: Jimmy Wan [mailto:jimmy@indeed.com]
> Sent: Thursday, March 06, 2008 9:56 AM
> To: core-user@hadoop.apache.org
> Subject: Does Hadoop Honor Reserved Space?
> 
> I've got 2 datanodes setup with the following configuration parameter:
> <property>
>  <name>dfs.datanode.du.reserved</name>
>  <value>429496729600</value>
>  <description>Reserved space in bytes per volume. Always leave this
> much  
> space free for non dfs use.
>  </description>
> </property>
> 
> Both are housed on 800GB volumes, so I thought this would keep about half
> the volume free for non-HDFS usage.
> 
> After some long running jobs last night, both disk volumes were completely
> filled. The bulk of the data was in:
> ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
> 
> This is running as the user hadoop.
> 
> Am I interpretting these parameters incorrectly?
> 
> I noticed this issue, but it is marked as closed:
> http://issues.apache.org/jira/browse/HADOOP-2549


RE: Does Hadoop Honor Reserved Space?

Posted by Joydeep Sen Sarma <js...@facebook.com>.
but intermediate data is stored in a different directory from dfs/data (something like mapred/local by default i think).

what version are u running? 


-----Original Message-----
From: Ashwinder Ahluwalia on behalf of ahluwalia5@yahoo.com
Sent: Thu 3/6/2008 10:14 AM
To: core-user@hadoop.apache.org
Subject: RE: Does Hadoop Honor Reserved Space?
 
I've run into a similar issue in the past. From what I understand, this
parameter only controls the HDFS space usage. However, the intermediate data in
the map reduce job is stored on the local file system (not HDFS) and is not
subject to this configuration.

In the past I have used mapred.local.dir.minspacekill and
mapred.local.dir.minspacestart to control the amount of space that is allowable
for use by this temporary data. 

Not sure if that is the best approach though, so I'd love to hear what other
people have done. In your case, you have a map-red job that will consume too
much space (without setting a limit, you didn't have enough disk capacity for
the job), so looking at mapred.output.compress and mapred.compress.map.output
might be useful to decrease the job's disk requirements.

--Ash

-----Original Message-----
From: Jimmy Wan [mailto:jimmy@indeed.com] 
Sent: Thursday, March 06, 2008 9:56 AM
To: core-user@hadoop.apache.org
Subject: Does Hadoop Honor Reserved Space?

I've got 2 datanodes setup with the following configuration parameter:
	<property>
	  <name>dfs.datanode.du.reserved</name>
	  <value>429496729600</value>
	  <description>Reserved space in bytes per volume. Always leave this
much  
space free for non dfs use.
	  </description>
	</property>

Both are housed on 800GB volumes, so I thought this would keep about half  
the volume free for non-HDFS usage.

After some long running jobs last night, both disk volumes were completely  
filled. The bulk of the data was in:
${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data

This is running as the user hadoop.

Am I interpretting these parameters incorrectly?

I noticed this issue, but it is marked as closed:  
http://issues.apache.org/jira/browse/HADOOP-2549

-- 
Jimmy



RE: Does Hadoop Honor Reserved Space?

Posted by ah...@yahoo.com.
I've run into a similar issue in the past. From what I understand, this
parameter only controls the HDFS space usage. However, the intermediate data in
the map reduce job is stored on the local file system (not HDFS) and is not
subject to this configuration.

In the past I have used mapred.local.dir.minspacekill and
mapred.local.dir.minspacestart to control the amount of space that is allowable
for use by this temporary data. 

Not sure if that is the best approach though, so I'd love to hear what other
people have done. In your case, you have a map-red job that will consume too
much space (without setting a limit, you didn't have enough disk capacity for
the job), so looking at mapred.output.compress and mapred.compress.map.output
might be useful to decrease the job's disk requirements.

--Ash

-----Original Message-----
From: Jimmy Wan [mailto:jimmy@indeed.com] 
Sent: Thursday, March 06, 2008 9:56 AM
To: core-user@hadoop.apache.org
Subject: Does Hadoop Honor Reserved Space?

I've got 2 datanodes setup with the following configuration parameter:
	<property>
	  <name>dfs.datanode.du.reserved</name>
	  <value>429496729600</value>
	  <description>Reserved space in bytes per volume. Always leave this
much  
space free for non dfs use.
	  </description>
	</property>

Both are housed on 800GB volumes, so I thought this would keep about half  
the volume free for non-HDFS usage.

After some long running jobs last night, both disk volumes were completely  
filled. The bulk of the data was in:
${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data

This is running as the user hadoop.

Am I interpretting these parameters incorrectly?

I noticed this issue, but it is marked as closed:  
http://issues.apache.org/jira/browse/HADOOP-2549

-- 
Jimmy