You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2008/03/11 00:50:49 UTC

[jira] Created: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
----------------------------------------------------------------------------

                 Key: HADOOP-2991
                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.16.0, 0.15.3, 0.15.2, 0.15.1, 0.15.0
            Reporter: Joydeep Sen Sarma
            Priority: Critical


changes for https://issues.apache.org/jira/browse/HADOOP-1463

have caused a regression. earlier:

- we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.

now this is no longer true. I am quoting Pete Wyckoff's example:

<example>
Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now

Df -kh shows:

Capacity = 100 GB
Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
Used = 95 GBs   

remaining = 100 GB - 50 GB - 1GB = 49 GB 

Min(remaining, available) = 1 GB

98% of which is usable for DFS apparently - 

So, we're at the limit, but are free to use 98% of the remaining 1GB.
</example>

this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda3             130G  123G   49M 100% /


as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577314#action_12577314 ] 

hairong edited comment on HADOOP-2991 at 3/11/08 8:39 AM:
----------------------------------------------------------------

I think we should discuss the meaning of all the terms before we discuss the changes. I see that Pete and Joydeep kept on saying "off".

This is what we defined in hadoop-1463
dfs capacity = the total disk space that data directories are located. This does not mean the total dfs usable space.
dfs used space = the space that dfs used
reserved space = the space reserved for non-dfs usage
remaining = the total free disk space available to dfs, which is equal to MIN(dfs capacity-reserved space-dfs used space, disk available space)*du.pct. 

Block placement is based on the remaining space. So dfs should never use more than (dfs capacity-reserved space) space.

Of course dfs capacity != dfs used space + remaining since disks are shared by dfs and non-dfs applications. This is similar to disk capacity != used space + available space because disks are shared by user applications and O.S.

      was (Author: hairong):
    I think we should discuss the meaning of all the terms before we discuss the changes. I see that Pete and Joydeep kept on saying "off".

This is what we defined in hadoop-1463
dfs capacity = the total disk space that data directories are located. This does not mean the total dfs usable space.
dfs used space = the space that dfs used
reserved space = the space reserved for non-dfs usage
remaining = the total free disk space available to dfs, which is equal to MIN(dfs capacity-reserved space, disk available space)*du.pct. 

Block placement is based on the remaining space. So dfs should never use more than (dfs capacity-reserved space) space.

Of course dfs capacity != dfs used space + remaining since disks are shared by dfs and non-dfs applications. This is similar to disk capacity != used space + available space because disks are shared by user applications and O.S.
  
> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577976#action_12577976 ] 

Joydeep Sen Sarma commented on HADOOP-2991:
-------------------------------------------

2150 does seem like the right solution. if we can specify at machine and directory granularity, the space allocated to DFS - that would be perfect.

given that it would be a new feature - and not sure when it would make it's way in - i don't know what message we have for people who are using or want to control dfs space usage currently. looks like in the worst case (based on 'df' behavior) - there's a different patches required for different installs.

So one way out of this morass could be to make a set of patches available on this jira (apply this if u trust 'available' etc.)

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577546#action_12577546 ] 

Pete Wyckoff commented on HADOOP-2991:
--------------------------------------


Can we just first look at fixing DF.getCapacity() - leave it if you like, but add a method DF.getTotalUsableSpace() which returns DF.getUsed() + DF.getAvailable(). And stop using DF.getCapacity() !

And in the meantime, Joy, I think it does make some sense to poll users and see (a) who even knows about the new semantics and (b) who thinks they are useful and (c) who thinks they are usable.

--- pete
 
ps the example would be really, really helpful.

ps again Hairong, all your comments apply only if the setting for reserved = Everything (including completely unusable by ANYBODY space that cannot be used on the drive + all other space over the lifetime of the drive and machine).


> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577537#action_12577537 ] 

Pete Wyckoff commented on HADOOP-2991:
--------------------------------------

Raghu,

We're not talking a couple of bytes here:

Filesystem            Size  Used Avail Use% Mounted on
/dev/foo             459G  105M  436G   1% /mnt/bar

That's a 23 GB difference! i.e., 459 - 436 = 23

-- pete




> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577277#action_12577277 ] 

rangadi edited comment on HADOOP-2991 at 3/10/08 7:02 PM:
---------------------------------------------------------------

I would rather provide another variable ("dfs.datanode.du.freespace") with default set to zero and modify FSVolume.getAvailable() effectively to return :  {{min(current_getAvailable(), usage.available() - dfs.datanode.du.freespace);}}. Would this work? This will also help vast majority of users who may not want to extend DataNode implementations.

      was (Author: rangadi):
    I would rather provide another variable ("dfs.datanode.du.freespace") with default set to zero essentially modify FSVolume.getAvailable() effectively to return :  {{min(current_getAvailable(), usage.available() - dfs.datanode.du.freespace);}}. Would this work? This will also help vast majority of users who may not want to extend DataNode implementations.
  
> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577539#action_12577539 ] 

Pete Wyckoff commented on HADOOP-2991:
--------------------------------------

Also,

I am still interested in a real life example where this is actually useful -this would really help people understand the motivation.  and please, not in the theoretical sense of someone wants to set aside 50 GB for a mysql db, but rather some person is actually using this in an installation of 10s of machines to set aside 50 GB on EVERY node for a mysql db.

Thanks,
-- pete


> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577277#action_12577277 ] 

Raghu Angadi commented on HADOOP-2991:
--------------------------------------

I would rather provide another variable ("dfs.datanode.du.freespace") with default set to zero essentially modify FSVolume.getAvailable() effectively to return :  {{min(current_getAvailable(), usage.available() - dfs.datanode.du.freespace);}}. Would this work? This will also help vast majority of users who may not want to extend DataNode implementations.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577719#action_12577719 ] 

aw edited comment on HADOOP-2991 at 3/11/08 11:07 PM:
--------------------------------------------------------------------


Ahh file systems.  Can't live with them, can't live with them.

First off: I'm not a big fan of percentages when dealing with file systems.

Back in the day, UFS would reserve 10% of the system for root's usage.  So on a 10G disk, it would save 1G for itself.  Not a big deal and when the file system had issues, that worked out well.  100G would turn into 10G.  Ugh.  Not cool.  Go even bigger and the amounts get insane.  So many implementations changed this to a sliding scale rather than a single percentage.  Some food for thought.

Secondly, df.  A great source of cross platform trouble... Let me throw out one of my favorite real world examples, this time from one of my home machines:

Filesystem             size   used  avail capacity  Mounted on
int                    165G    28K    21G     1%    /int
int/home               165G    68G    21G    77%    /export/home
int/mii2u              165G  1014K    21G     1%    /int/mii2u
int/squid-cache        5.0G   4.4G   591M    89%    /int/squid-cache
int/local              165G   289M    21G     2%    /usr/local

Stop.  Go back and look carefully at those numbers.  

In case you haven't guessed, this is a (partial) output of a df -h from a Solaris machine utilizing ZFS.  It is pretty clear that with the exception of the file system using a hard quota (int/squid-cache), size != used+available.  Instead, size=(all fs used)+available.  Using "used" in any sort of capacity isn't going to tell you anything about how much space is actually available.  This type of output is fairly common for any pool-based storage system.

Then there are file system quotas, which depending upon the OS, may or may not show up in df output.  The same thing with the aforementioned percentages with reserved space.

Anyway, what does this all mean?

Well, in my mind, that all of the above suggestions in the JIRA just really don't work out well... and that's just on UNIX.  Heck, even a heterogeneous UNIX environment makes me shudder.  How does one work with pooled storage *and* traditional file systems if you want to have a single config?

Quite frankly, you can't.  As much as I hate to say it, I suspect the answer (as unpopular as it might be) is probably to set a hard limit on how much space the HDFS will use rather than trying to second guess what the operating system is doing.  Does this suck?  Yes.  Does this suck less than all of the gymnastics around trying to figure this out dynamically?  I think so.

Let's face it, in order to make an app like Hadoop not eat more space than what you want vs. what is configured in the file system, you are essentially looking at partitioning it.  At that point, you might as well just configure it in the app and be done with it. In the end, this basically means that HDFS needs to keep track of how much space it is using at all times and not go over that limit.  This likely also means that it must implement high and low water marks such that if the low water mark is hit, writes to the filesystem get deferred/deprioritized and high water marks basically mean to start rebalancing the blocks or saying the file system is full.

Now, I know that it might be difficult to calculate what the max space should be.  On reflection though, I'm not really sure that's true.  If I know what size my slice is and I have an idea of how much of that I want to give to HDFS, then I can calculate that max value.  If an admin gets in trouble with the space being allocated, the ability to lower the high and low water marks, which should trigger a rebalance, thus freeing space.  This is essentially how apps like squid work. It works quite well.  [Interestingly enough, the file system structure on disk is quite similar to how the data node stores its blocks.... Hmm... ]

One thing to point out with this solution:  if the admin overcommits the space on the drive then, quite frankly, they hung themselves.  They know how much space they gave HDFS.  If they go over it, oh well.  I'd much rather have MapRed blow up than HDFS blow up, since it is much easier to pick up the pieces of a broken job than it is of the file system, especially in the case where there are under-replicated blocks.

Again, I totally admit that this solution is likely to be unpopular.  But I can't see a way out of this mess that works with the multiple types of storage systems in use.

P.S., while I'm here, let me throw my more of my own personal prejudices into this:  putting something like hadoop in / or some other file system  (but not necessarily device) that is used by the OS is just *begging* for trouble.  That's just a bad practice for a real, production system.  If someone does that, they rightly deserve any pain that it caused.

      was (Author: aw):
    
https://issues.apache.org/jira/browse/HADOOP-2991

Ahh file systems.  Can't live with them, can't live with them.

First off: I'm not a big fan of percentages when dealing with file systems.

Back in the day, UFS would reserve 10% of the system for root's usage.  So on a 10G disk, it would save 1G for itself.  Not a big deal and when the file system had issues, that worked out well.  100G would turn into 10G.  Ugh.  Not cool.  Go even bigger and the amounts get insane.  So many implementations changed this to a sliding scale rather than a single percentage.  Some food for thought.

Secondly, df.  A great source of cross platform trouble... Let me throw out one of my favorite real world examples, this time from one of my home machines:

Filesystem             size   used  avail capacity  Mounted on
int                    165G    28K    21G     1%    /int
int/home               165G    68G    21G    77%    /export/home
int/mii2u              165G  1014K    21G     1%    /int/mii2u
int/squid-cache        5.0G   4.4G   591M    89%    /int/squid-cache
int/local              165G   289M    21G     2%    /usr/local

Stop.  Go back and look carefully at those numbers.  

In case you haven't guessed, this is a (partial) output of a df -h from a Solaris machine utilizing ZFS.  It is pretty clear that with the exception of the file system using a hard quota (int/squid-cache), size != used+available.  Instead, size=(all fs used)+available.  Using "used" in any sort of capacity isn't going to tell you anything about how much space is actually available.  This type of output is fairly common for any pool-based storage system.

Then there are file system quotas, which depending upon the OS, may or may not show up in df output.  The same thing with the aforementioned percentages with reserved space.

Anyway, what does this all mean?

Well, in my mind, that all of the above suggestions in the JIRA just really don't work out well... and that's just on UNIX.  Heck, even a heterogeneous UNIX environment makes me shudder.  How does one work with pooled storage *and* traditional file systems if you want to have a single config?

Quite frankly, you can't.  As much as I hate to say it, I suspect the answer (as unpopular as it might be) is probably to set a hard limit on how much space the HDFS will use rather than trying to second guess what the operating system is doing.  Does this suck?  Yes.  Does this suck less than all of the gymnastics around trying to figure this out dynamically?  I think so.

Let's face it, in order to make an app like Hadoop not eat more space than what you want vs. what is configured in the file system, you are essentially looking at partitioning it.  At that point, you might as well just configure it in the app and be done with it. In the end, this basically means that HDFS needs to keep track of how much space it is using at all times and not go over that limit.  This likely also means that it must implement high and low water marks such that if the low water mark is hit, writes to the filesystem get deferred/deprioritized and high water marks basically mean to start rebalancing the blocks or saying the file system is full.

Now, I know that it might be difficult to calculate what the max space should be.  On reflection though, I'm not really sure that's true.  If I know what size my slice is and I have an idea of how much of that I want to give to HDFS, then I can calculate that max value.  If an admin gets in trouble with the space being allocated, the ability to lower the high and low water marks, which should trigger a rebalance, thus freeing space.  This is essentially how apps like squid work. It works quite well.  [Interestingly enough, the file system structure on disk is quite similar to how the data node stores its blocks.... Hmm... ]

One thing to point out with this solution:  if the admin overcommits the space on the drive then, quite frankly, they hung themselves.  They know how much space they gave HDFS.  If they go over it, oh well.  I'd much rather have MapRed blow up than HDFS blow up, since it is much easier to pick up the pieces of a broken job than it is of the file system, especially in the case where there are under-replicated blocks.

Again, I totally admit that this solution is likely to be unpopular.  But I can't see a way out of this mess that works with the multiple types of storage systems in use.

P.S., while I'm here, let me throw my more of my own personal prejudices into this:  putting something like hadoop in / or some other file system  (but not necessarily device) that is used by the OS is just *begging* for trouble.  That's just a bad practice for a real, production system.  If someone does that, they rightly deserve any pain that it caused.
  
> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577265#action_12577265 ] 

rangadi edited comment on HADOOP-2991 at 3/10/08 6:07 PM:
---------------------------------------------------------------

from HADOOP-1463 :
> folks - the implementation does not agree with the semantics discussed in the jira (or the semantics in 0.14)

Joydeep, what is the exact discrepancy between the implementation and the new semantics (an example would help)? Yes, 'capacity' is not accurate. But FSDataset.FSVolume.getAvailable() does take that into account, note that it never returns more than what is in 'available' column of 'df'.

I agree with the later part : it is an incompatible change and it changed the meaning of 'reserved'. Users might need better guidance.

      was (Author: rangadi):
    from HADOOP-1463 :
> folks - the implementation does not agree with the semantics discussed in the jira (or the semantics in 0.14)

Joydeep, what is the exact discrepancy between the implementation and the new semantics (an example would help)? Yes, 'capacity' is not accurate. But it FSDataset.FSVolume.getAvailable() does take into account, note that it never returns more than what is in 'available' column of 'df'.

I agree with the later part : it is an incompatible change and it changed the meaning of 'reserved'. Users might need better guidance.
  
> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577543#action_12577543 ] 

rangadi edited comment on HADOOP-2991 at 3/11/08 11:21 AM:
----------------------------------------------------------------

Pete, Yes, I agree completely! An I did earlier too... Also this was one of many many many incompatible changes that happened over last year.

      was (Author: rangadi):
    Yes, I agree completely! An I did earlier too... Also this was one of many many many incompatible changes that happened over last year.
  
> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578500#action_12578500 ] 

Allen Wittenauer commented on HADOOP-2991:
------------------------------------------

Someone asked me privately how changing from dfs=capacity-reserved to dfs=fixed value impacts map reduce output, its storage usage, etc.

The quick answer is to say that it doesn't, but that's not a very complete answer. :)

I view the world this way:

I can limit HDFS if something like 2150 and what we've talked about here is implemented.  I can equally limit MR output by implementing file system quotas for the user(s)/group(s) that are running the JT/TT/tasks at the file system level.  [And remember everyone: you do not want the HDFS and the tasks running as the same user!]  This guarantees that both HDFS and MR can be fenced in and not take the blame for eating all the drive space.  Any file system fulls will either be a fault with how the system was configured (admins and rope can be dangerous combinations, but a necessary one) or with some less than polite process.

This type of system would actually work much better than, say, partitioning specific drives, given that this gives the admin some flexibility to reconfigure based upon workload.

pete: as to a plug-in... later.  My JIRA came first. :p



> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577493#action_12577493 ] 

Joydeep Sen Sarma commented on HADOOP-2991:
-------------------------------------------

Raghu - the bug is that 

DF.java: parseExecResult()
   this.capacity = Long.parseLong(tokens.nextToken()) * 1024;

is not correct. the code treats this as 'usable' space in computing getAvailable(). But as we are pointing out - this is *not* usable space - but merely the capacity of the drive. Usable space = this.available+this.used (applied in the context of the same file).

(Note again - the notion of *usable* and *capacity* are different in most file systems)

---

as a matter of philosophy (and *outside* of this bug) - i completely and whole heartedly disagree with the notion that:

reserved space = the space reserved for non-dfs usage

that administrators can ever figure out non-dfs usage precisely. in any case - such usage can differ from disk to disk (root partitions consume lot more non-dfs stuff than other partitions) - and it will be a nightmare to start adding disk level configuration to hadoop. It is *completely* counter-intuitive. It is *much* easier for admins to understand that:

reserved space = last N bytes that DFS will not use.

this is a uniform parameter that can be easily understood across all drives and lends to easy planning. Normally, one images the system, installs Hadoop and that's it. One wants to leave some extra space for adding libraries and such - but this is typically small amount of data. It's very easy for an admin following this standard procedure to budget some reserve space for this purpose that dfs will not touch.

please put urself in an admin's shoe. please! 

---

Do you think we could take a poll on the dev/users lists on what controls admins want? 













> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577259#action_12577259 ] 

Joydeep Sen Sarma commented on HADOOP-2991:
-------------------------------------------

Hairong - there are two issues here (please please do not mix them up):

1. the size column of the DF output does not give usable space. This is regardless of 1463. But 1463 makes this worse - because earlier the 'capacity' field never really mattered in edge cases - now it becomes paramount.

2. the argument over what 'reserved' means. I didn't raise this point - but earlier 'reserve' meant that ' please don't touch the last N bytes'. The xml documentation still says so:

  <description>Reserved space in bytes. Always leave this much space free for non dfs use  </description>

now - we have no way of making sure that DFS does not use the last N bytes. As an administrator - i hate this. Earlier i could sleep in peace knowing that DFS would never cause file system full. Now i can't. It is _very_ hard for me to estimate up front all the non DFS usage. It's much easier for me to say 'please do not use last N bytes').

---

this is another case where interface semantics were changed:
a) no backwards compatibility with old semantics (of 0.14)
b) no clear information to admins about changes to existing semantics (I went through the change notes when i was struggling with the compression problems - and this never caught my eye).

---

Please consider the two issues raised here separately. We have, of course, patched this already in our environment. But the general user community will face this problem. There has already been another reported instance that u saw where someone has felt this was not working as expected.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577274#action_12577274 ] 

Pete Wyckoff commented on HADOOP-2991:
--------------------------------------


Raghu/Hairong,

Since this is contentious and I remember it being this way for a long way, why not define an interface and a default implementation and let people put in their own impl.

Maybe

public interface CalcDFSFreeSpaceInterface {
  long getFreeSpace();
}

and impl = 

public CalcDFSFreeSpaceImpl {
 CalcDFSFreeSpaceImpl(Conf c, Partition p) { blah blah }
 long getFreeSpace() { blah blah }
}


Then keep the current implemetation + the bug fix for DF.getCapacity() and others can implement it the way they want.

Where Partition is whatever tell it what partition it should be looking at.

-- pete


> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577258#action_12577258 ] 

Hairong Kuang commented on HADOOP-2991:
---------------------------------------

I think you have a misunderstanding of the reserved parameter. As I commented on hadoop-1463, remember that dfs.du.reserve is the space for non-dfs usage, including the space for map/reduce, other application, fs meta-data etc. In your case since /usr already takes 45GB, it far exceeds the reserved limit 1G. You should set the reserved space to be 50G.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577266#action_12577266 ] 

Pete Wyckoff commented on HADOOP-2991:
--------------------------------------


The formula should be:

min(((DF.Capacity - Conf.Reserved) * Conf.dfs.du.pct) -DU.dfsSpace()), (DF.available() - Conf.reserved));

Now, as Joy says, DF.Capacity is actually not all usable - some of it is used for meta info for the filesystem.

So capacity should be DF.available() + DF.used()

Also, Hairong, I understand the new meaning for reserved, but it really is too hard to use. We'd have to figure out on every machine on every drive what this amount of space is.  The older semantics are much easier to use and helps a lot. Now that I'm a user :) I can see that being able to say to Hadoop (dfs and mapred), never use the last 1 GB or last .5 GB or whatever for safety reasons is helpful.

Yes, this means that the amount of space for DFS will fluctuate, but so what. When there's not enough space due to other things on the drive, the drive isn't used but when there's space, it is.

-- pete

-- pete











> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577273#action_12577273 ] 

Pete Wyckoff commented on HADOOP-2991:
--------------------------------------

If reserved = something like: DU.everythingOtherThanDFS() + DF.getUnusableSpace() [the metadata space for the FS itself]
then everything is ok in the new code.

So, since the DF.getCapacity() is off, this is a bug. 

Also, the above formula is basically uncomputable.

Am I to calculate that for every machine for every disk? What happens if someone installs a new version of python on one of those disks? Do I have to re-calculate everything? And how would I even know to do that.

Raghu, what is the real life motivation and example of how this change of the semantics is useful?


thanks, pete

ps for us, this means we would have to set reserved to 20  GB today. I have no idea tomorrow if it would be 21 or 19. And it means that on our non'/' partitions, we waste about 19 GB * 3 drives * 100 = 6 TB

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577268#action_12577268 ] 

Pete Wyckoff commented on HADOOP-2991:
--------------------------------------

Raghu,

Look at the example Joy gave,  if instead of /usr being 50 GB, the discrpancy in Capacity were that much, you can see your formula breaks down.

If reserved doesn't include everything on the drive other than DFS + the discrepancy of DF + TheAmountYouReallyWantToReserve, it's kinda screwed up.

I can see the motivation for your changing reserved to what else is on the drive, but it's basiclly not useful in practice on any '/' partition or other drive that is used for more than just DFS.  No one can tell ahead what that # should be and you want  me to calculate it for every machine in my cluster?

And there should be a check that the amount of space used outside of DFS is < Reserved and a warning that DFS is gonna fill the entire partition because of this.

The new semantics of reserved would be nice if they could work, but I don't see how it could.

-- pete



> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577994#action_12577994 ] 

Pete Wyckoff commented on HADOOP-2991:
--------------------------------------

Allen as usual makes some excellent points. But, I do think this re-enforces my point that this should be an interface with a default/reference implementation of 2150, but which others are free to implement themselves. And specify their class in the config.

As Allen points out, different hardware has different rules and although 2150 solves it for all hardware, some people may want something more customized to them.

Maybe I want to run an external script to figure out DFS usage? Maybe I want to do 2150, maybe I want to do the current solution, maybe the pre-0.15 solution....

-- pete


> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577310#action_12577310 ] 

Raghu Angadi commented on HADOOP-2991:
--------------------------------------

Sure. Please submit a patch that includes a parameter that does what you want it do.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577528#action_12577528 ] 

Raghu Angadi commented on HADOOP-2991:
--------------------------------------


> is not correct. the code treats this as 'usable' space in computing getAvailable(). But as we are pointing out - this is not usable space - but merely the capacity of the drive. Usable space = this.available+this.used (applied in the context of the same file).

I don't want to beat this too much but, I think the difference  between 'capacity' and 'usable' space well noted. The intention was *not* be accurate to the byte. It is simply cannot be done. 'capacity' makes sense from the adminstrator's point of view.. yes it is not accurate. It is like memory for a machine : one would just say 512MB and might set some config based on that, usually we don't correct it to 498.32 MB.

By 'bug' I meant, does getAvailable() return non-zero number when there is no space and in violation of definition of 'reserved'. Does it result in datanode failing to complete writing a block when it should not.. If your defintion of bug is to say DataNode does not use 15MB left on 320 GB disk, then yes, this calculation is not correct.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577543#action_12577543 ] 

Raghu Angadi commented on HADOOP-2991:
--------------------------------------

Yes, I agree completely! An I did earlier too... Also this was one of many many many incompatible changes that happened over last year.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577314#action_12577314 ] 

Hairong Kuang commented on HADOOP-2991:
---------------------------------------

I think we should discuss the meaning of all the terms before we discuss the changes. I see that Pete and Joydeep kept on saying "off".

This is what we defined in hadoop-1463
dfs capacity = the total disk space that data directories are located. This does not mean the total dfs usable space.
dfs used space = the space that dfs used
reserved space = the space reserved for non-dfs usage
remaining = the total free disk space available to dfs, which is equal to MIN(dfs capacity-reserved space, disk available space)*du.pct. 

Block placement is based on the remaining space. So dfs should never use more than (dfs capacity-reserved space) space.

Of course dfs capacity != dfs used space + remaining since disks are shared by dfs and non-dfs applications. This is similar to disk capacity != used space + available space because disks are shared by user applications and O.S.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577265#action_12577265 ] 

Raghu Angadi commented on HADOOP-2991:
--------------------------------------

from HADOOP-1463 :
> folks - the implementation does not agree with the semantics discussed in the jira (or the semantics in 0.14)

Joydeep, what is the exact discrepancy between the implementation and the new semantics (an example would help)? Yes, 'capacity' is not accurate. But it FSDataset.FSVolume.getAvailable() does take into account, note that it never returns more than what is in 'available' column of 'df'.

I agree with the later part : it is an incompatible change and it changed the meaning of 'reserved'. Users might need better guidance.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577719#action_12577719 ] 

Allen Wittenauer commented on HADOOP-2991:
------------------------------------------


https://issues.apache.org/jira/browse/HADOOP-2991

Ahh file systems.  Can't live with them, can't live with them.

First off: I'm not a big fan of percentages when dealing with file systems.

Back in the day, UFS would reserve 10% of the system for root's usage.  So on a 10G disk, it would save 1G for itself.  Not a big deal and when the file system had issues, that worked out well.  100G would turn into 10G.  Ugh.  Not cool.  Go even bigger and the amounts get insane.  So many implementations changed this to a sliding scale rather than a single percentage.  Some food for thought.

Secondly, df.  A great source of cross platform trouble... Let me throw out one of my favorite real world examples, this time from one of my home machines:

Filesystem             size   used  avail capacity  Mounted on
int                    165G    28K    21G     1%    /int
int/home               165G    68G    21G    77%    /export/home
int/mii2u              165G  1014K    21G     1%    /int/mii2u
int/squid-cache        5.0G   4.4G   591M    89%    /int/squid-cache
int/local              165G   289M    21G     2%    /usr/local

Stop.  Go back and look carefully at those numbers.  

In case you haven't guessed, this is a (partial) output of a df -h from a Solaris machine utilizing ZFS.  It is pretty clear that with the exception of the file system using a hard quota (int/squid-cache), size != used+available.  Instead, size=(all fs used)+available.  Using "used" in any sort of capacity isn't going to tell you anything about how much space is actually available.  This type of output is fairly common for any pool-based storage system.

Then there are file system quotas, which depending upon the OS, may or may not show up in df output.  The same thing with the aforementioned percentages with reserved space.

Anyway, what does this all mean?

Well, in my mind, that all of the above suggestions in the JIRA just really don't work out well... and that's just on UNIX.  Heck, even a heterogeneous UNIX environment makes me shudder.  How does one work with pooled storage *and* traditional file systems if you want to have a single config?

Quite frankly, you can't.  As much as I hate to say it, I suspect the answer (as unpopular as it might be) is probably to set a hard limit on how much space the HDFS will use rather than trying to second guess what the operating system is doing.  Does this suck?  Yes.  Does this suck less than all of the gymnastics around trying to figure this out dynamically?  I think so.

Let's face it, in order to make an app like Hadoop not eat more space than what you want vs. what is configured in the file system, you are essentially looking at partitioning it.  At that point, you might as well just configure it in the app and be done with it. In the end, this basically means that HDFS needs to keep track of how much space it is using at all times and not go over that limit.  This likely also means that it must implement high and low water marks such that if the low water mark is hit, writes to the filesystem get deferred/deprioritized and high water marks basically mean to start rebalancing the blocks or saying the file system is full.

Now, I know that it might be difficult to calculate what the max space should be.  On reflection though, I'm not really sure that's true.  If I know what size my slice is and I have an idea of how much of that I want to give to HDFS, then I can calculate that max value.  If an admin gets in trouble with the space being allocated, the ability to lower the high and low water marks, which should trigger a rebalance, thus freeing space.  This is essentially how apps like squid work. It works quite well.  [Interestingly enough, the file system structure on disk is quite similar to how the data node stores its blocks.... Hmm... ]

One thing to point out with this solution:  if the admin overcommits the space on the drive then, quite frankly, they hung themselves.  They know how much space they gave HDFS.  If they go over it, oh well.  I'd much rather have MapRed blow up than HDFS blow up, since it is much easier to pick up the pieces of a broken job than it is of the file system, especially in the case where there are under-replicated blocks.

Again, I totally admit that this solution is likely to be unpopular.  But I can't see a way out of this mess that works with the multiple types of storage systems in use.

P.S., while I'm here, let me throw my more of my own personal prejudices into this:  putting something like hadoop in / or some other file system  (but not necessarily device) that is used by the OS is just *begging* for trouble.  That's just a bad practice for a real, production system.  If someone does that, they rightly deserve any pain that it caused.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577541#action_12577541 ] 

Hairong Kuang commented on HADOOP-2991:
---------------------------------------

Just to make it clear to the users, dfs capacity is not the dfs usable space. Instead, dfs guarantees that dfs does not use more than dfs capacity - reserved space, which leaves enough space for the use of map/reduce.

And it is OK that non-dfs usage is bigger than the reserved space, dfs uses what's available on the disk. If du.pct is less than 1.0, dfs does not use all the available space but a fraction of it.

I feel that in stead of using the du.pct parameter, it makes sense to use an absolute value as what Raghu suggested. So dfs will leave this much space free. It should make most of the users happy.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577853#action_12577853 ] 

Allen Wittenauer commented on HADOOP-2991:
------------------------------------------

Available is *not* trustable.  It doesn't always take into consideration quotas, and reserved space.  [Oh how many times I've heard users complain with the "df says there is still room but I can't write to my home dir" statement...]  Oh, and inode counts.  I completely forgot about that little gem.

Anyway, yes, I completely agree--this needs to be settable on a per dir basis.  That's actually something I've wanted for a while, since we store logs on the same dir as the data node.  I want more space available on the node with the logs than the others.  I might even have a JIRA on this somewhere.... ahh yes, here it is:

https://issues.apache.org/jira/browse/HADOOP-2150

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577269#action_12577269 ] 

Raghu Angadi commented on HADOOP-2991:
--------------------------------------

Pete,
> min(((DF.Capacity - Conf.Reserved) * Conf.dfs.du.pct) -DU.dfsSpace()), (DF.available() - Conf.reserved));

What is dfsSpace()?

Does this match the new semantics or does this match the semantics you prefer? or both?

Note that I am only asking disconnect between the implementation and semantics. Could you give a example where the implementation is wrong?


> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Hairong Kuang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577276#action_12577276 ] 

Hairong Kuang commented on HADOOP-2991:
---------------------------------------

> So, since the DF.getCapacity() is off, this is a bug.
I did not get this. what do you mean off?

Pete, I understand what you want. But when mapred and dfs share the same disk, empty disk space normally is big. So the cluster admin does not need to change the parameter frequently. I think that you have a different use case. If you need a new feature, you should file it as a new feature.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577726#action_12577726 ] 

Joydeep Sen Sarma commented on HADOOP-2991:
-------------------------------------------

Smirk. I didn't even think about heterogeneous machines. Smirk. Smirk.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Pete Wyckoff (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577279#action_12577279 ] 

Pete Wyckoff commented on HADOOP-2991:
--------------------------------------

Raghu,

Your proposal would work for the free space, but the % of space dfs uses would still be off if I don't set the reserved param properly.

How about if dfs.reserved is 0 (or -1 or not set), you calculate things to include what's actually on disk and we add your new param.

-- pete



> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577540#action_12577540 ] 

Joydeep Sen Sarma commented on HADOOP-2991:
-------------------------------------------

I will quote the notes from the initial bug report. Example df -kh output:

Filesystem Size Used Avail Use% Mounted on
/dev/sda3 130G 123G 49M 100% /

the difference between usable and capacity is 7G. That's 7000000000 bytes to be absolutely clear.

Is that significant enough?


> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577724#action_12577724 ] 

Joydeep Sen Sarma commented on HADOOP-2991:
-------------------------------------------

Allen - do you think the 'available' field of df output is trustable?

If so - what downside do you see to instructing hadoop to not use last N bytes of available space? (Ignoring capacity/size altogether). 

I agree that percentages don't make any sense whatsoever. If we kill it (modulo backwards compatibility for a release - although it's questionable anyone really understands how the hell hadoop manages space anymore) - that would remove a lot of unnecessary confusion.

The problem with your suggestion of configuring Hadoop to use a fixed amount of space is that this would only be workable if this was settable on a per directory basis. U can be prejudiced against people mixing OS with app on same filesystem - but surely u can't be prejudiced against people using drives of different sizes! If DFS max usage could be configured on a per directory basis - that would seem like the ideal solution and would be a welcome feature imho.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2991) dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577270#action_12577270 ] 

Raghu Angadi commented on HADOOP-2991:
--------------------------------------

> The new semantics of reserved would be nice if they could work, but I don't see how it could.

So you want to change the meaning of 'reserved', which is ok. In fact, if you read whole of HADOOP-1463, I didn't prefer the change in meaning of 'reserved' either.

Lets separate the following two :

# bug in implementation (I can't see if there is one).
# new 'feature' to change in meaning of reserved or probably better add new variable that kind of did what 'reserved' was meant for before HAOOP-1463.

> dfs.du.reserved not honored in 0.15/16 (regression from 0.14+patch for 2549)
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-2991
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2991
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0, 0.15.1, 0.15.2, 0.15.3, 0.16.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> changes for https://issues.apache.org/jira/browse/HADOOP-1463
> have caused a regression. earlier:
> - we could set dfs.du.reserve to 1G and be *sure* that 1G would not be used.
> now this is no longer true. I am quoting Pete Wyckoff's example:
> <example>
> Let's look at an example. 100 GB disk and /usr using 45 GB and dfs using 50 GBs now
> Df -kh shows:
> Capacity = 100 GB
> Available = 1 GB (remember ~4 GB chopped out for metadata and stuff)
> Used = 95 GBs   
> remaining = 100 GB - 50 GB - 1GB = 49 GB 
> Min(remaining, available) = 1 GB
> 98% of which is usable for DFS apparently - 
> So, we're at the limit, but are free to use 98% of the remaining 1GB.
> </example>
> this is broke. based on the discussion on 1463 - it seems like the notion of 'capacity' as being the first field of 'df' is problematic. For example - here's what our df output looks like:
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda3             130G  123G   49M 100% /
> as u can see - 'Size' is a misnomer - that much space is not available. Rather the actual usable space is 123G+49M ~ 123G. (not entirely sure what the discrepancy is due to - but have heard this may be due to space reserved for file system metadata). Because of this discrepancy - we end up in a situation where file system is out of space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.