You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Martin Serrano <ma...@attivio.com> on 2015/12/22 03:21:25 UTC

diagnosing the difference between dfs 'du' and 'df'

Hi,

I have an application that is writing data rapidly directly to HDFS
(creates and appends) as well as to HBase (10-15 tables).  The disk free
for the filesystem will report that a large percentage of the system is
in use:

$ hdfs dfs -df -h /
Filesystem     Size     Used  Available  Use%
hdfs://ha   882.6 G  472.6 G    409.9 G   54%

Yet when I try to figure out where the disk space is being used using
dfs -du reports:

$ hdfs dfs -du -h /
0        /app-logs
7.6 G    /apps
382.2 M  /hdp
0        /mapred
0        /mr-history
8.5 K    /tmp
3.8 G    /user

A dfsadmin -report during the same time frame is below.  I'm trying to
figure out where all of this space is going to.  When my application is
killed or quiescent, the df and dfsadmin reports fall in line with what
I would expect.  I'm running HDP 2.3 with a default configuration as set
up by Ambari.  I'm looking for hints or suggestions on how I can
investigate this issue.  It seems crazy that ingesting 12g or so of data
can temporarily consume (reserve?) ~300g of HDFS.

Thanks,
Martin

Configured Capacity: 947644268544 (882.56 GB)
Present Capacity: 947064596261 (882.02 GB)
DFS Remaining: 490046627240 (456.39 GB)
DFS Used: 457017969021 (425.63 GB)
DFS Used%: 48.26%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (3):

Name: *.*.*.*:50010 (**********.com)
Hostname: **********.com
Decommission Status : Normal
Configured Capacity: 315881422848 (294.19 GB)
DFS Used: 218955099179 (203.92 GB)
Non DFS Used: 168255175 (160.46 MB)
DFS Remaining: 96758068494 (90.11 GB)
DFS Used%: 69.32%
DFS Remaining%: 30.63%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 15
Last contact: Mon Dec 21 17:17:38 EST 2015


Name: *.*.*.*:50010 (**********.com)
Hostname: **********.com
Decommission Status : Normal
Configured Capacity: 315881422848 (294.19 GB)
DFS Used: 218873337575 (203.84 GB)
Non DFS Used: 151608508 (144.59 MB)
DFS Remaining: 96856476765 (90.20 GB)
DFS Used%: 69.29%
DFS Remaining%: 30.66%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 16
Last contact: Mon Dec 21 17:17:38 EST 2015


Name: *.*.*.*:50010 (*************.com)
Hostname: ***********.com
Decommission Status : Normal
Configured Capacity: 315881422848 (294.19 GB)
DFS Used: 19189532267 (17.87 GB)
Non DFS Used: 259808600 (247.77 MB)
DFS Remaining: 296432081981 (276.07 GB)
DFS Used%: 6.07%
DFS Remaining%: 93.84%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 16
Last contact: Mon Dec 21 17:17:39 EST 2015



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Martin Serrano <ma...@attivio.com>.

Thanks; I did run these commands as the hdfs user.

On 12/21/2015 09:25 PM, Namikaze Minato wrote:
> This may be a wrong lead, but try to do your "du" command as hdfs
> user, so that we are sure that we don't miss out read-protected
> folders.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Martin Serrano <ma...@attivio.com>.

Thanks; I did run these commands as the hdfs user.

On 12/21/2015 09:25 PM, Namikaze Minato wrote:
> This may be a wrong lead, but try to do your "du" command as hdfs
> user, so that we are sure that we don't miss out read-protected
> folders.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Martin Serrano <ma...@attivio.com>.

Thanks; I did run these commands as the hdfs user.

On 12/21/2015 09:25 PM, Namikaze Minato wrote:
> This may be a wrong lead, but try to do your "du" command as hdfs
> user, so that we are sure that we don't miss out read-protected
> folders.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Martin Serrano <ma...@attivio.com>.

Thanks; I did run these commands as the hdfs user.

On 12/21/2015 09:25 PM, Namikaze Minato wrote:
> This may be a wrong lead, but try to do your "du" command as hdfs
> user, so that we are sure that we don't miss out read-protected
> folders.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Namikaze Minato <ll...@gmail.com>.

This may be a wrong lead, but try to do your "du" command as hdfs
user, so that we are sure that we don't miss out read-protected
folders.

Regards,
LLoyd

On 22 December 2015 at 03:21, Martin Serrano <ma...@attivio.com> wrote:
> Hi,
>
> I have an application that is writing data rapidly directly to HDFS
> (creates and appends) as well as to HBase (10-15 tables).  The disk free
> for the filesystem will report that a large percentage of the system is
> in use:
>
> $ hdfs dfs -df -h /
> Filesystem     Size     Used  Available  Use%
> hdfs://ha   882.6 G  472.6 G    409.9 G   54%
>
> Yet when I try to figure out where the disk space is being used using
> dfs -du reports:
>
> $ hdfs dfs -du -h /
> 0        /app-logs
> 7.6 G    /apps
> 382.2 M  /hdp
> 0        /mapred
> 0        /mr-history
> 8.5 K    /tmp
> 3.8 G    /user
>
> A dfsadmin -report during the same time frame is below.  I'm trying to
> figure out where all of this space is going to.  When my application is
> killed or quiescent, the df and dfsadmin reports fall in line with what
> I would expect.  I'm running HDP 2.3 with a default configuration as set
> up by Ambari.  I'm looking for hints or suggestions on how I can
> investigate this issue.  It seems crazy that ingesting 12g or so of data
> can temporarily consume (reserve?) ~300g of HDFS.
>
> Thanks,
> Martin
>
> Configured Capacity: 947644268544 (882.56 GB)
> Present Capacity: 947064596261 (882.02 GB)
> DFS Remaining: 490046627240 (456.39 GB)
> DFS Used: 457017969021 (425.63 GB)
> DFS Used%: 48.26%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 0
>
> -------------------------------------------------
> Live datanodes (3):
>
> Name: *.*.*.*:50010 (**********.com)
> Hostname: **********.com
> Decommission Status : Normal
> Configured Capacity: 315881422848 (294.19 GB)
> DFS Used: 218955099179 (203.92 GB)
> Non DFS Used: 168255175 (160.46 MB)
> DFS Remaining: 96758068494 (90.11 GB)
> DFS Used%: 69.32%
> DFS Remaining%: 30.63%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 15
> Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
> Name: *.*.*.*:50010 (**********.com)
> Hostname: **********.com
> Decommission Status : Normal
> Configured Capacity: 315881422848 (294.19 GB)
> DFS Used: 218873337575 (203.84 GB)
> Non DFS Used: 151608508 (144.59 MB)
> DFS Remaining: 96856476765 (90.20 GB)
> DFS Used%: 69.29%
> DFS Remaining%: 30.66%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 16
> Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
> Name: *.*.*.*:50010 (*************.com)
> Hostname: ***********.com
> Decommission Status : Normal
> Configured Capacity: 315881422848 (294.19 GB)
> DFS Used: 19189532267 (17.87 GB)
> Non DFS Used: 259808600 (247.77 MB)
> DFS Remaining: 296432081981 (276.07 GB)
> DFS Used%: 6.07%
> DFS Remaining%: 93.84%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 16
> Last contact: Mon Dec 21 17:17:39 EST 2015
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Martin Serrano <ma...@attivio.com>.

I was able to resolve this issue.  By looking at the hdfs-audit.log we
noticed that there were a large number of appends to the same file
occurring in a very short time frame.  My guess is that each append is
reserving a full block (128mb in our configuration), leading to
temporary disk "utilization" until the appends are resolved into a
single file.   We were able to eliminate the issue by turning these
appends into a continuous write.

-Martin

On 12/22/2015 12:59 PM, Anu Engineer wrote:
> Just  a guess, but could you please check what is your dfs.replication set to ? 
>
> You should be able to find that setting in hdfs-site.xml or in core-site.xml
>
> Thanks
> Anu
>  
>
> On 12/21/15, 6:21 PM, "Martin Serrano" <ma...@attivio.com> wrote:
>
>> Hi,
>>
>> I have an application that is writing data rapidly directly to HDFS
>> (creates and appends) as well as to HBase (10-15 tables).  The disk free
>> for the filesystem will report that a large percentage of the system is
>> in use:
>>
>> $ hdfs dfs -df -h /
>> Filesystem     Size     Used  Available  Use%
>> hdfs://ha   882.6 G  472.6 G    409.9 G   54%
>>
>> Yet when I try to figure out where the disk space is being used using
>> dfs -du reports:
>>
>> $ hdfs dfs -du -h /
>> 0        /app-logs
>> 7.6 G    /apps
>> 382.2 M  /hdp
>> 0        /mapred
>> 0        /mr-history
>> 8.5 K    /tmp
>> 3.8 G    /user
>>
>> A dfsadmin -report during the same time frame is below.  I'm trying to
>> figure out where all of this space is going to.  When my application is
>> killed or quiescent, the df and dfsadmin reports fall in line with what
>> I would expect.  I'm running HDP 2.3 with a default configuration as set
>> up by Ambari.  I'm looking for hints or suggestions on how I can
>> investigate this issue.  It seems crazy that ingesting 12g or so of data
>> can temporarily consume (reserve?) ~300g of HDFS.
>>
>> Thanks,
>> Martin
>>
>> Configured Capacity: 947644268544 (882.56 GB)
>> Present Capacity: 947064596261 (882.02 GB)
>> DFS Remaining: 490046627240 (456.39 GB)
>> DFS Used: 457017969021 (425.63 GB)
>> DFS Used%: 48.26%
>> Under replicated blocks: 0
>> Blocks with corrupt replicas: 0
>> Missing blocks: 0
>> Missing blocks (with replication factor 1): 0
>>
>> -------------------------------------------------
>> Live datanodes (3):
>>
>> Name: *.*.*.*:50010 (**********.com)
>> Hostname: **********.com
>> Decommission Status : Normal
>> Configured Capacity: 315881422848 (294.19 GB)
>> DFS Used: 218955099179 (203.92 GB)
>> Non DFS Used: 168255175 (160.46 MB)
>> DFS Remaining: 96758068494 (90.11 GB)
>> DFS Used%: 69.32%
>> DFS Remaining%: 30.63%
>> Configured Cache Capacity: 0 (0 B)
>> Cache Used: 0 (0 B)
>> Cache Remaining: 0 (0 B)
>> Cache Used%: 100.00%
>> Cache Remaining%: 0.00%
>> Xceivers: 15
>> Last contact: Mon Dec 21 17:17:38 EST 2015
>>
>>
>> Name: *.*.*.*:50010 (**********.com)
>> Hostname: **********.com
>> Decommission Status : Normal
>> Configured Capacity: 315881422848 (294.19 GB)
>> DFS Used: 218873337575 (203.84 GB)
>> Non DFS Used: 151608508 (144.59 MB)
>> DFS Remaining: 96856476765 (90.20 GB)
>> DFS Used%: 69.29%
>> DFS Remaining%: 30.66%
>> Configured Cache Capacity: 0 (0 B)
>> Cache Used: 0 (0 B)
>> Cache Remaining: 0 (0 B)
>> Cache Used%: 100.00%
>> Cache Remaining%: 0.00%
>> Xceivers: 16
>> Last contact: Mon Dec 21 17:17:38 EST 2015
>>
>>
>> Name: *.*.*.*:50010 (*************.com)
>> Hostname: ***********.com
>> Decommission Status : Normal
>> Configured Capacity: 315881422848 (294.19 GB)
>> DFS Used: 19189532267 (17.87 GB)
>> Non DFS Used: 259808600 (247.77 MB)
>> DFS Remaining: 296432081981 (276.07 GB)
>> DFS Used%: 6.07%
>> DFS Remaining%: 93.84%
>> Configured Cache Capacity: 0 (0 B)
>> Cache Used: 0 (0 B)
>> Cache Remaining: 0 (0 B)
>> Cache Used%: 100.00%
>> Cache Remaining%: 0.00%
>> Xceivers: 16
>> Last contact: Mon Dec 21 17:17:39 EST 2015
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: user-help@hadoop.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Martin Serrano <ma...@attivio.com>.

I was able to resolve this issue.  By looking at the hdfs-audit.log we
noticed that there were a large number of appends to the same file
occurring in a very short time frame.  My guess is that each append is
reserving a full block (128mb in our configuration), leading to
temporary disk "utilization" until the appends are resolved into a
single file.   We were able to eliminate the issue by turning these
appends into a continuous write.

-Martin

On 12/22/2015 12:59 PM, Anu Engineer wrote:
> Just  a guess, but could you please check what is your dfs.replication set to ? 
>
> You should be able to find that setting in hdfs-site.xml or in core-site.xml
>
> Thanks
> Anu
>  
>
> On 12/21/15, 6:21 PM, "Martin Serrano" <ma...@attivio.com> wrote:
>
>> Hi,
>>
>> I have an application that is writing data rapidly directly to HDFS
>> (creates and appends) as well as to HBase (10-15 tables).  The disk free
>> for the filesystem will report that a large percentage of the system is
>> in use:
>>
>> $ hdfs dfs -df -h /
>> Filesystem     Size     Used  Available  Use%
>> hdfs://ha   882.6 G  472.6 G    409.9 G   54%
>>
>> Yet when I try to figure out where the disk space is being used using
>> dfs -du reports:
>>
>> $ hdfs dfs -du -h /
>> 0        /app-logs
>> 7.6 G    /apps
>> 382.2 M  /hdp
>> 0        /mapred
>> 0        /mr-history
>> 8.5 K    /tmp
>> 3.8 G    /user
>>
>> A dfsadmin -report during the same time frame is below.  I'm trying to
>> figure out where all of this space is going to.  When my application is
>> killed or quiescent, the df and dfsadmin reports fall in line with what
>> I would expect.  I'm running HDP 2.3 with a default configuration as set
>> up by Ambari.  I'm looking for hints or suggestions on how I can
>> investigate this issue.  It seems crazy that ingesting 12g or so of data
>> can temporarily consume (reserve?) ~300g of HDFS.
>>
>> Thanks,
>> Martin
>>
>> Configured Capacity: 947644268544 (882.56 GB)
>> Present Capacity: 947064596261 (882.02 GB)
>> DFS Remaining: 490046627240 (456.39 GB)
>> DFS Used: 457017969021 (425.63 GB)
>> DFS Used%: 48.26%
>> Under replicated blocks: 0
>> Blocks with corrupt replicas: 0
>> Missing blocks: 0
>> Missing blocks (with replication factor 1): 0
>>
>> -------------------------------------------------
>> Live datanodes (3):
>>
>> Name: *.*.*.*:50010 (**********.com)
>> Hostname: **********.com
>> Decommission Status : Normal
>> Configured Capacity: 315881422848 (294.19 GB)
>> DFS Used: 218955099179 (203.92 GB)
>> Non DFS Used: 168255175 (160.46 MB)
>> DFS Remaining: 96758068494 (90.11 GB)
>> DFS Used%: 69.32%
>> DFS Remaining%: 30.63%
>> Configured Cache Capacity: 0 (0 B)
>> Cache Used: 0 (0 B)
>> Cache Remaining: 0 (0 B)
>> Cache Used%: 100.00%
>> Cache Remaining%: 0.00%
>> Xceivers: 15
>> Last contact: Mon Dec 21 17:17:38 EST 2015
>>
>>
>> Name: *.*.*.*:50010 (**********.com)
>> Hostname: **********.com
>> Decommission Status : Normal
>> Configured Capacity: 315881422848 (294.19 GB)
>> DFS Used: 218873337575 (203.84 GB)
>> Non DFS Used: 151608508 (144.59 MB)
>> DFS Remaining: 96856476765 (90.20 GB)
>> DFS Used%: 69.29%
>> DFS Remaining%: 30.66%
>> Configured Cache Capacity: 0 (0 B)
>> Cache Used: 0 (0 B)
>> Cache Remaining: 0 (0 B)
>> Cache Used%: 100.00%
>> Cache Remaining%: 0.00%
>> Xceivers: 16
>> Last contact: Mon Dec 21 17:17:38 EST 2015
>>
>>
>> Name: *.*.*.*:50010 (*************.com)
>> Hostname: ***********.com
>> Decommission Status : Normal
>> Configured Capacity: 315881422848 (294.19 GB)
>> DFS Used: 19189532267 (17.87 GB)
>> Non DFS Used: 259808600 (247.77 MB)
>> DFS Remaining: 296432081981 (276.07 GB)
>> DFS Used%: 6.07%
>> DFS Remaining%: 93.84%
>> Configured Cache Capacity: 0 (0 B)
>> Cache Used: 0 (0 B)
>> Cache Remaining: 0 (0 B)
>> Cache Used%: 100.00%
>> Cache Remaining%: 0.00%
>> Xceivers: 16
>> Last contact: Mon Dec 21 17:17:39 EST 2015
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: user-help@hadoop.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Martin Serrano <ma...@attivio.com>.

I was able to resolve this issue.  By looking at the hdfs-audit.log we
noticed that there were a large number of appends to the same file
occurring in a very short time frame.  My guess is that each append is
reserving a full block (128mb in our configuration), leading to
temporary disk "utilization" until the appends are resolved into a
single file.   We were able to eliminate the issue by turning these
appends into a continuous write.

-Martin

On 12/22/2015 12:59 PM, Anu Engineer wrote:
> Just  a guess, but could you please check what is your dfs.replication set to ? 
>
> You should be able to find that setting in hdfs-site.xml or in core-site.xml
>
> Thanks
> Anu
>  
>
> On 12/21/15, 6:21 PM, "Martin Serrano" <ma...@attivio.com> wrote:
>
>> Hi,
>>
>> I have an application that is writing data rapidly directly to HDFS
>> (creates and appends) as well as to HBase (10-15 tables).  The disk free
>> for the filesystem will report that a large percentage of the system is
>> in use:
>>
>> $ hdfs dfs -df -h /
>> Filesystem     Size     Used  Available  Use%
>> hdfs://ha   882.6 G  472.6 G    409.9 G   54%
>>
>> Yet when I try to figure out where the disk space is being used using
>> dfs -du reports:
>>
>> $ hdfs dfs -du -h /
>> 0        /app-logs
>> 7.6 G    /apps
>> 382.2 M  /hdp
>> 0        /mapred
>> 0        /mr-history
>> 8.5 K    /tmp
>> 3.8 G    /user
>>
>> A dfsadmin -report during the same time frame is below.  I'm trying to
>> figure out where all of this space is going to.  When my application is
>> killed or quiescent, the df and dfsadmin reports fall in line with what
>> I would expect.  I'm running HDP 2.3 with a default configuration as set
>> up by Ambari.  I'm looking for hints or suggestions on how I can
>> investigate this issue.  It seems crazy that ingesting 12g or so of data
>> can temporarily consume (reserve?) ~300g of HDFS.
>>
>> Thanks,
>> Martin
>>
>> Configured Capacity: 947644268544 (882.56 GB)
>> Present Capacity: 947064596261 (882.02 GB)
>> DFS Remaining: 490046627240 (456.39 GB)
>> DFS Used: 457017969021 (425.63 GB)
>> DFS Used%: 48.26%
>> Under replicated blocks: 0
>> Blocks with corrupt replicas: 0
>> Missing blocks: 0
>> Missing blocks (with replication factor 1): 0
>>
>> -------------------------------------------------
>> Live datanodes (3):
>>
>> Name: *.*.*.*:50010 (**********.com)
>> Hostname: **********.com
>> Decommission Status : Normal
>> Configured Capacity: 315881422848 (294.19 GB)
>> DFS Used: 218955099179 (203.92 GB)
>> Non DFS Used: 168255175 (160.46 MB)
>> DFS Remaining: 96758068494 (90.11 GB)
>> DFS Used%: 69.32%
>> DFS Remaining%: 30.63%
>> Configured Cache Capacity: 0 (0 B)
>> Cache Used: 0 (0 B)
>> Cache Remaining: 0 (0 B)
>> Cache Used%: 100.00%
>> Cache Remaining%: 0.00%
>> Xceivers: 15
>> Last contact: Mon Dec 21 17:17:38 EST 2015
>>
>>
>> Name: *.*.*.*:50010 (**********.com)
>> Hostname: **********.com
>> Decommission Status : Normal
>> Configured Capacity: 315881422848 (294.19 GB)
>> DFS Used: 218873337575 (203.84 GB)
>> Non DFS Used: 151608508 (144.59 MB)
>> DFS Remaining: 96856476765 (90.20 GB)
>> DFS Used%: 69.29%
>> DFS Remaining%: 30.66%
>> Configured Cache Capacity: 0 (0 B)
>> Cache Used: 0 (0 B)
>> Cache Remaining: 0 (0 B)
>> Cache Used%: 100.00%
>> Cache Remaining%: 0.00%
>> Xceivers: 16
>> Last contact: Mon Dec 21 17:17:38 EST 2015
>>
>>
>> Name: *.*.*.*:50010 (*************.com)
>> Hostname: ***********.com
>> Decommission Status : Normal
>> Configured Capacity: 315881422848 (294.19 GB)
>> DFS Used: 19189532267 (17.87 GB)
>> Non DFS Used: 259808600 (247.77 MB)
>> DFS Remaining: 296432081981 (276.07 GB)
>> DFS Used%: 6.07%
>> DFS Remaining%: 93.84%
>> Configured Cache Capacity: 0 (0 B)
>> Cache Used: 0 (0 B)
>> Cache Remaining: 0 (0 B)
>> Cache Used%: 100.00%
>> Cache Remaining%: 0.00%
>> Xceivers: 16
>> Last contact: Mon Dec 21 17:17:39 EST 2015
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: user-help@hadoop.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Martin Serrano <ma...@attivio.com>.

I was able to resolve this issue.  By looking at the hdfs-audit.log we
noticed that there were a large number of appends to the same file
occurring in a very short time frame.  My guess is that each append is
reserving a full block (128mb in our configuration), leading to
temporary disk "utilization" until the appends are resolved into a
single file.   We were able to eliminate the issue by turning these
appends into a continuous write.

-Martin

On 12/22/2015 12:59 PM, Anu Engineer wrote:
> Just  a guess, but could you please check what is your dfs.replication set to ? 
>
> You should be able to find that setting in hdfs-site.xml or in core-site.xml
>
> Thanks
> Anu
>  
>
> On 12/21/15, 6:21 PM, "Martin Serrano" <ma...@attivio.com> wrote:
>
>> Hi,
>>
>> I have an application that is writing data rapidly directly to HDFS
>> (creates and appends) as well as to HBase (10-15 tables).  The disk free
>> for the filesystem will report that a large percentage of the system is
>> in use:
>>
>> $ hdfs dfs -df -h /
>> Filesystem     Size     Used  Available  Use%
>> hdfs://ha   882.6 G  472.6 G    409.9 G   54%
>>
>> Yet when I try to figure out where the disk space is being used using
>> dfs -du reports:
>>
>> $ hdfs dfs -du -h /
>> 0        /app-logs
>> 7.6 G    /apps
>> 382.2 M  /hdp
>> 0        /mapred
>> 0        /mr-history
>> 8.5 K    /tmp
>> 3.8 G    /user
>>
>> A dfsadmin -report during the same time frame is below.  I'm trying to
>> figure out where all of this space is going to.  When my application is
>> killed or quiescent, the df and dfsadmin reports fall in line with what
>> I would expect.  I'm running HDP 2.3 with a default configuration as set
>> up by Ambari.  I'm looking for hints or suggestions on how I can
>> investigate this issue.  It seems crazy that ingesting 12g or so of data
>> can temporarily consume (reserve?) ~300g of HDFS.
>>
>> Thanks,
>> Martin
>>
>> Configured Capacity: 947644268544 (882.56 GB)
>> Present Capacity: 947064596261 (882.02 GB)
>> DFS Remaining: 490046627240 (456.39 GB)
>> DFS Used: 457017969021 (425.63 GB)
>> DFS Used%: 48.26%
>> Under replicated blocks: 0
>> Blocks with corrupt replicas: 0
>> Missing blocks: 0
>> Missing blocks (with replication factor 1): 0
>>
>> -------------------------------------------------
>> Live datanodes (3):
>>
>> Name: *.*.*.*:50010 (**********.com)
>> Hostname: **********.com
>> Decommission Status : Normal
>> Configured Capacity: 315881422848 (294.19 GB)
>> DFS Used: 218955099179 (203.92 GB)
>> Non DFS Used: 168255175 (160.46 MB)
>> DFS Remaining: 96758068494 (90.11 GB)
>> DFS Used%: 69.32%
>> DFS Remaining%: 30.63%
>> Configured Cache Capacity: 0 (0 B)
>> Cache Used: 0 (0 B)
>> Cache Remaining: 0 (0 B)
>> Cache Used%: 100.00%
>> Cache Remaining%: 0.00%
>> Xceivers: 15
>> Last contact: Mon Dec 21 17:17:38 EST 2015
>>
>>
>> Name: *.*.*.*:50010 (**********.com)
>> Hostname: **********.com
>> Decommission Status : Normal
>> Configured Capacity: 315881422848 (294.19 GB)
>> DFS Used: 218873337575 (203.84 GB)
>> Non DFS Used: 151608508 (144.59 MB)
>> DFS Remaining: 96856476765 (90.20 GB)
>> DFS Used%: 69.29%
>> DFS Remaining%: 30.66%
>> Configured Cache Capacity: 0 (0 B)
>> Cache Used: 0 (0 B)
>> Cache Remaining: 0 (0 B)
>> Cache Used%: 100.00%
>> Cache Remaining%: 0.00%
>> Xceivers: 16
>> Last contact: Mon Dec 21 17:17:38 EST 2015
>>
>>
>> Name: *.*.*.*:50010 (*************.com)
>> Hostname: ***********.com
>> Decommission Status : Normal
>> Configured Capacity: 315881422848 (294.19 GB)
>> DFS Used: 19189532267 (17.87 GB)
>> Non DFS Used: 259808600 (247.77 MB)
>> DFS Remaining: 296432081981 (276.07 GB)
>> DFS Used%: 6.07%
>> DFS Remaining%: 93.84%
>> Configured Cache Capacity: 0 (0 B)
>> Cache Used: 0 (0 B)
>> Cache Remaining: 0 (0 B)
>> Cache Used%: 100.00%
>> Cache Remaining%: 0.00%
>> Xceivers: 16
>> Last contact: Mon Dec 21 17:17:39 EST 2015
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>> For additional commands, e-mail: user-help@hadoop.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Anu Engineer <ae...@hortonworks.com>.

Just  a guess, but could you please check what is your dfs.replication set to ? 

You should be able to find that setting in hdfs-site.xml or in core-site.xml

Thanks
Anu
 

On 12/21/15, 6:21 PM, "Martin Serrano" <ma...@attivio.com> wrote:

>Hi,
>
>I have an application that is writing data rapidly directly to HDFS
>(creates and appends) as well as to HBase (10-15 tables).  The disk free
>for the filesystem will report that a large percentage of the system is
>in use:
>
>$ hdfs dfs -df -h /
>Filesystem     Size     Used  Available  Use%
>hdfs://ha   882.6 G  472.6 G    409.9 G   54%
>
>Yet when I try to figure out where the disk space is being used using
>dfs -du reports:
>
>$ hdfs dfs -du -h /
>0        /app-logs
>7.6 G    /apps
>382.2 M  /hdp
>0        /mapred
>0        /mr-history
>8.5 K    /tmp
>3.8 G    /user
>
>A dfsadmin -report during the same time frame is below.  I'm trying to
>figure out where all of this space is going to.  When my application is
>killed or quiescent, the df and dfsadmin reports fall in line with what
>I would expect.  I'm running HDP 2.3 with a default configuration as set
>up by Ambari.  I'm looking for hints or suggestions on how I can
>investigate this issue.  It seems crazy that ingesting 12g or so of data
>can temporarily consume (reserve?) ~300g of HDFS.
>
>Thanks,
>Martin
>
>Configured Capacity: 947644268544 (882.56 GB)
>Present Capacity: 947064596261 (882.02 GB)
>DFS Remaining: 490046627240 (456.39 GB)
>DFS Used: 457017969021 (425.63 GB)
>DFS Used%: 48.26%
>Under replicated blocks: 0
>Blocks with corrupt replicas: 0
>Missing blocks: 0
>Missing blocks (with replication factor 1): 0
>
>-------------------------------------------------
>Live datanodes (3):
>
>Name: *.*.*.*:50010 (**********.com)
>Hostname: **********.com
>Decommission Status : Normal
>Configured Capacity: 315881422848 (294.19 GB)
>DFS Used: 218955099179 (203.92 GB)
>Non DFS Used: 168255175 (160.46 MB)
>DFS Remaining: 96758068494 (90.11 GB)
>DFS Used%: 69.32%
>DFS Remaining%: 30.63%
>Configured Cache Capacity: 0 (0 B)
>Cache Used: 0 (0 B)
>Cache Remaining: 0 (0 B)
>Cache Used%: 100.00%
>Cache Remaining%: 0.00%
>Xceivers: 15
>Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
>Name: *.*.*.*:50010 (**********.com)
>Hostname: **********.com
>Decommission Status : Normal
>Configured Capacity: 315881422848 (294.19 GB)
>DFS Used: 218873337575 (203.84 GB)
>Non DFS Used: 151608508 (144.59 MB)
>DFS Remaining: 96856476765 (90.20 GB)
>DFS Used%: 69.29%
>DFS Remaining%: 30.66%
>Configured Cache Capacity: 0 (0 B)
>Cache Used: 0 (0 B)
>Cache Remaining: 0 (0 B)
>Cache Used%: 100.00%
>Cache Remaining%: 0.00%
>Xceivers: 16
>Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
>Name: *.*.*.*:50010 (*************.com)
>Hostname: ***********.com
>Decommission Status : Normal
>Configured Capacity: 315881422848 (294.19 GB)
>DFS Used: 19189532267 (17.87 GB)
>Non DFS Used: 259808600 (247.77 MB)
>DFS Remaining: 296432081981 (276.07 GB)
>DFS Used%: 6.07%
>DFS Remaining%: 93.84%
>Configured Cache Capacity: 0 (0 B)
>Cache Used: 0 (0 B)
>Cache Remaining: 0 (0 B)
>Cache Used%: 100.00%
>Cache Remaining%: 0.00%
>Xceivers: 16
>Last contact: Mon Dec 21 17:17:39 EST 2015
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: user-help@hadoop.apache.org
>
>

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Namikaze Minato <ll...@gmail.com>.

This may be a wrong lead, but try to do your "du" command as hdfs
user, so that we are sure that we don't miss out read-protected
folders.

Regards,
LLoyd

On 22 December 2015 at 03:21, Martin Serrano <ma...@attivio.com> wrote:
> Hi,
>
> I have an application that is writing data rapidly directly to HDFS
> (creates and appends) as well as to HBase (10-15 tables).  The disk free
> for the filesystem will report that a large percentage of the system is
> in use:
>
> $ hdfs dfs -df -h /
> Filesystem     Size     Used  Available  Use%
> hdfs://ha   882.6 G  472.6 G    409.9 G   54%
>
> Yet when I try to figure out where the disk space is being used using
> dfs -du reports:
>
> $ hdfs dfs -du -h /
> 0        /app-logs
> 7.6 G    /apps
> 382.2 M  /hdp
> 0        /mapred
> 0        /mr-history
> 8.5 K    /tmp
> 3.8 G    /user
>
> A dfsadmin -report during the same time frame is below.  I'm trying to
> figure out where all of this space is going to.  When my application is
> killed or quiescent, the df and dfsadmin reports fall in line with what
> I would expect.  I'm running HDP 2.3 with a default configuration as set
> up by Ambari.  I'm looking for hints or suggestions on how I can
> investigate this issue.  It seems crazy that ingesting 12g or so of data
> can temporarily consume (reserve?) ~300g of HDFS.
>
> Thanks,
> Martin
>
> Configured Capacity: 947644268544 (882.56 GB)
> Present Capacity: 947064596261 (882.02 GB)
> DFS Remaining: 490046627240 (456.39 GB)
> DFS Used: 457017969021 (425.63 GB)
> DFS Used%: 48.26%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 0
>
> -------------------------------------------------
> Live datanodes (3):
>
> Name: *.*.*.*:50010 (**********.com)
> Hostname: **********.com
> Decommission Status : Normal
> Configured Capacity: 315881422848 (294.19 GB)
> DFS Used: 218955099179 (203.92 GB)
> Non DFS Used: 168255175 (160.46 MB)
> DFS Remaining: 96758068494 (90.11 GB)
> DFS Used%: 69.32%
> DFS Remaining%: 30.63%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 15
> Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
> Name: *.*.*.*:50010 (**********.com)
> Hostname: **********.com
> Decommission Status : Normal
> Configured Capacity: 315881422848 (294.19 GB)
> DFS Used: 218873337575 (203.84 GB)
> Non DFS Used: 151608508 (144.59 MB)
> DFS Remaining: 96856476765 (90.20 GB)
> DFS Used%: 69.29%
> DFS Remaining%: 30.66%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 16
> Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
> Name: *.*.*.*:50010 (*************.com)
> Hostname: ***********.com
> Decommission Status : Normal
> Configured Capacity: 315881422848 (294.19 GB)
> DFS Used: 19189532267 (17.87 GB)
> Non DFS Used: 259808600 (247.77 MB)
> DFS Remaining: 296432081981 (276.07 GB)
> DFS Used%: 6.07%
> DFS Remaining%: 93.84%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 16
> Last contact: Mon Dec 21 17:17:39 EST 2015
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Anu Engineer <ae...@hortonworks.com>.

Just  a guess, but could you please check what is your dfs.replication set to ? 

You should be able to find that setting in hdfs-site.xml or in core-site.xml

Thanks
Anu
 

On 12/21/15, 6:21 PM, "Martin Serrano" <ma...@attivio.com> wrote:

>Hi,
>
>I have an application that is writing data rapidly directly to HDFS
>(creates and appends) as well as to HBase (10-15 tables).  The disk free
>for the filesystem will report that a large percentage of the system is
>in use:
>
>$ hdfs dfs -df -h /
>Filesystem     Size     Used  Available  Use%
>hdfs://ha   882.6 G  472.6 G    409.9 G   54%
>
>Yet when I try to figure out where the disk space is being used using
>dfs -du reports:
>
>$ hdfs dfs -du -h /
>0        /app-logs
>7.6 G    /apps
>382.2 M  /hdp
>0        /mapred
>0        /mr-history
>8.5 K    /tmp
>3.8 G    /user
>
>A dfsadmin -report during the same time frame is below.  I'm trying to
>figure out where all of this space is going to.  When my application is
>killed or quiescent, the df and dfsadmin reports fall in line with what
>I would expect.  I'm running HDP 2.3 with a default configuration as set
>up by Ambari.  I'm looking for hints or suggestions on how I can
>investigate this issue.  It seems crazy that ingesting 12g or so of data
>can temporarily consume (reserve?) ~300g of HDFS.
>
>Thanks,
>Martin
>
>Configured Capacity: 947644268544 (882.56 GB)
>Present Capacity: 947064596261 (882.02 GB)
>DFS Remaining: 490046627240 (456.39 GB)
>DFS Used: 457017969021 (425.63 GB)
>DFS Used%: 48.26%
>Under replicated blocks: 0
>Blocks with corrupt replicas: 0
>Missing blocks: 0
>Missing blocks (with replication factor 1): 0
>
>-------------------------------------------------
>Live datanodes (3):
>
>Name: *.*.*.*:50010 (**********.com)
>Hostname: **********.com
>Decommission Status : Normal
>Configured Capacity: 315881422848 (294.19 GB)
>DFS Used: 218955099179 (203.92 GB)
>Non DFS Used: 168255175 (160.46 MB)
>DFS Remaining: 96758068494 (90.11 GB)
>DFS Used%: 69.32%
>DFS Remaining%: 30.63%
>Configured Cache Capacity: 0 (0 B)
>Cache Used: 0 (0 B)
>Cache Remaining: 0 (0 B)
>Cache Used%: 100.00%
>Cache Remaining%: 0.00%
>Xceivers: 15
>Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
>Name: *.*.*.*:50010 (**********.com)
>Hostname: **********.com
>Decommission Status : Normal
>Configured Capacity: 315881422848 (294.19 GB)
>DFS Used: 218873337575 (203.84 GB)
>Non DFS Used: 151608508 (144.59 MB)
>DFS Remaining: 96856476765 (90.20 GB)
>DFS Used%: 69.29%
>DFS Remaining%: 30.66%
>Configured Cache Capacity: 0 (0 B)
>Cache Used: 0 (0 B)
>Cache Remaining: 0 (0 B)
>Cache Used%: 100.00%
>Cache Remaining%: 0.00%
>Xceivers: 16
>Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
>Name: *.*.*.*:50010 (*************.com)
>Hostname: ***********.com
>Decommission Status : Normal
>Configured Capacity: 315881422848 (294.19 GB)
>DFS Used: 19189532267 (17.87 GB)
>Non DFS Used: 259808600 (247.77 MB)
>DFS Remaining: 296432081981 (276.07 GB)
>DFS Used%: 6.07%
>DFS Remaining%: 93.84%
>Configured Cache Capacity: 0 (0 B)
>Cache Used: 0 (0 B)
>Cache Remaining: 0 (0 B)
>Cache Used%: 100.00%
>Cache Remaining%: 0.00%
>Xceivers: 16
>Last contact: Mon Dec 21 17:17:39 EST 2015
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: user-help@hadoop.apache.org
>
>

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Anu Engineer <ae...@hortonworks.com>.

Just  a guess, but could you please check what is your dfs.replication set to ? 

You should be able to find that setting in hdfs-site.xml or in core-site.xml

Thanks
Anu
 

On 12/21/15, 6:21 PM, "Martin Serrano" <ma...@attivio.com> wrote:

>Hi,
>
>I have an application that is writing data rapidly directly to HDFS
>(creates and appends) as well as to HBase (10-15 tables).  The disk free
>for the filesystem will report that a large percentage of the system is
>in use:
>
>$ hdfs dfs -df -h /
>Filesystem     Size     Used  Available  Use%
>hdfs://ha   882.6 G  472.6 G    409.9 G   54%
>
>Yet when I try to figure out where the disk space is being used using
>dfs -du reports:
>
>$ hdfs dfs -du -h /
>0        /app-logs
>7.6 G    /apps
>382.2 M  /hdp
>0        /mapred
>0        /mr-history
>8.5 K    /tmp
>3.8 G    /user
>
>A dfsadmin -report during the same time frame is below.  I'm trying to
>figure out where all of this space is going to.  When my application is
>killed or quiescent, the df and dfsadmin reports fall in line with what
>I would expect.  I'm running HDP 2.3 with a default configuration as set
>up by Ambari.  I'm looking for hints or suggestions on how I can
>investigate this issue.  It seems crazy that ingesting 12g or so of data
>can temporarily consume (reserve?) ~300g of HDFS.
>
>Thanks,
>Martin
>
>Configured Capacity: 947644268544 (882.56 GB)
>Present Capacity: 947064596261 (882.02 GB)
>DFS Remaining: 490046627240 (456.39 GB)
>DFS Used: 457017969021 (425.63 GB)
>DFS Used%: 48.26%
>Under replicated blocks: 0
>Blocks with corrupt replicas: 0
>Missing blocks: 0
>Missing blocks (with replication factor 1): 0
>
>-------------------------------------------------
>Live datanodes (3):
>
>Name: *.*.*.*:50010 (**********.com)
>Hostname: **********.com
>Decommission Status : Normal
>Configured Capacity: 315881422848 (294.19 GB)
>DFS Used: 218955099179 (203.92 GB)
>Non DFS Used: 168255175 (160.46 MB)
>DFS Remaining: 96758068494 (90.11 GB)
>DFS Used%: 69.32%
>DFS Remaining%: 30.63%
>Configured Cache Capacity: 0 (0 B)
>Cache Used: 0 (0 B)
>Cache Remaining: 0 (0 B)
>Cache Used%: 100.00%
>Cache Remaining%: 0.00%
>Xceivers: 15
>Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
>Name: *.*.*.*:50010 (**********.com)
>Hostname: **********.com
>Decommission Status : Normal
>Configured Capacity: 315881422848 (294.19 GB)
>DFS Used: 218873337575 (203.84 GB)
>Non DFS Used: 151608508 (144.59 MB)
>DFS Remaining: 96856476765 (90.20 GB)
>DFS Used%: 69.29%
>DFS Remaining%: 30.66%
>Configured Cache Capacity: 0 (0 B)
>Cache Used: 0 (0 B)
>Cache Remaining: 0 (0 B)
>Cache Used%: 100.00%
>Cache Remaining%: 0.00%
>Xceivers: 16
>Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
>Name: *.*.*.*:50010 (*************.com)
>Hostname: ***********.com
>Decommission Status : Normal
>Configured Capacity: 315881422848 (294.19 GB)
>DFS Used: 19189532267 (17.87 GB)
>Non DFS Used: 259808600 (247.77 MB)
>DFS Remaining: 296432081981 (276.07 GB)
>DFS Used%: 6.07%
>DFS Remaining%: 93.84%
>Configured Cache Capacity: 0 (0 B)
>Cache Used: 0 (0 B)
>Cache Remaining: 0 (0 B)
>Cache Used%: 100.00%
>Cache Remaining%: 0.00%
>Xceivers: 16
>Last contact: Mon Dec 21 17:17:39 EST 2015
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: user-help@hadoop.apache.org
>
>

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Anu Engineer <ae...@hortonworks.com>.

Just  a guess, but could you please check what is your dfs.replication set to ? 

You should be able to find that setting in hdfs-site.xml or in core-site.xml

Thanks
Anu
 

On 12/21/15, 6:21 PM, "Martin Serrano" <ma...@attivio.com> wrote:

>Hi,
>
>I have an application that is writing data rapidly directly to HDFS
>(creates and appends) as well as to HBase (10-15 tables).  The disk free
>for the filesystem will report that a large percentage of the system is
>in use:
>
>$ hdfs dfs -df -h /
>Filesystem     Size     Used  Available  Use%
>hdfs://ha   882.6 G  472.6 G    409.9 G   54%
>
>Yet when I try to figure out where the disk space is being used using
>dfs -du reports:
>
>$ hdfs dfs -du -h /
>0        /app-logs
>7.6 G    /apps
>382.2 M  /hdp
>0        /mapred
>0        /mr-history
>8.5 K    /tmp
>3.8 G    /user
>
>A dfsadmin -report during the same time frame is below.  I'm trying to
>figure out where all of this space is going to.  When my application is
>killed or quiescent, the df and dfsadmin reports fall in line with what
>I would expect.  I'm running HDP 2.3 with a default configuration as set
>up by Ambari.  I'm looking for hints or suggestions on how I can
>investigate this issue.  It seems crazy that ingesting 12g or so of data
>can temporarily consume (reserve?) ~300g of HDFS.
>
>Thanks,
>Martin
>
>Configured Capacity: 947644268544 (882.56 GB)
>Present Capacity: 947064596261 (882.02 GB)
>DFS Remaining: 490046627240 (456.39 GB)
>DFS Used: 457017969021 (425.63 GB)
>DFS Used%: 48.26%
>Under replicated blocks: 0
>Blocks with corrupt replicas: 0
>Missing blocks: 0
>Missing blocks (with replication factor 1): 0
>
>-------------------------------------------------
>Live datanodes (3):
>
>Name: *.*.*.*:50010 (**********.com)
>Hostname: **********.com
>Decommission Status : Normal
>Configured Capacity: 315881422848 (294.19 GB)
>DFS Used: 218955099179 (203.92 GB)
>Non DFS Used: 168255175 (160.46 MB)
>DFS Remaining: 96758068494 (90.11 GB)
>DFS Used%: 69.32%
>DFS Remaining%: 30.63%
>Configured Cache Capacity: 0 (0 B)
>Cache Used: 0 (0 B)
>Cache Remaining: 0 (0 B)
>Cache Used%: 100.00%
>Cache Remaining%: 0.00%
>Xceivers: 15
>Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
>Name: *.*.*.*:50010 (**********.com)
>Hostname: **********.com
>Decommission Status : Normal
>Configured Capacity: 315881422848 (294.19 GB)
>DFS Used: 218873337575 (203.84 GB)
>Non DFS Used: 151608508 (144.59 MB)
>DFS Remaining: 96856476765 (90.20 GB)
>DFS Used%: 69.29%
>DFS Remaining%: 30.66%
>Configured Cache Capacity: 0 (0 B)
>Cache Used: 0 (0 B)
>Cache Remaining: 0 (0 B)
>Cache Used%: 100.00%
>Cache Remaining%: 0.00%
>Xceivers: 16
>Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
>Name: *.*.*.*:50010 (*************.com)
>Hostname: ***********.com
>Decommission Status : Normal
>Configured Capacity: 315881422848 (294.19 GB)
>DFS Used: 19189532267 (17.87 GB)
>Non DFS Used: 259808600 (247.77 MB)
>DFS Remaining: 296432081981 (276.07 GB)
>DFS Used%: 6.07%
>DFS Remaining%: 93.84%
>Configured Cache Capacity: 0 (0 B)
>Cache Used: 0 (0 B)
>Cache Remaining: 0 (0 B)
>Cache Used%: 100.00%
>Cache Remaining%: 0.00%
>Xceivers: 16
>Last contact: Mon Dec 21 17:17:39 EST 2015
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
>For additional commands, e-mail: user-help@hadoop.apache.org
>
>

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Namikaze Minato <ll...@gmail.com>.

This may be a wrong lead, but try to do your "du" command as hdfs
user, so that we are sure that we don't miss out read-protected
folders.

Regards,
LLoyd

On 22 December 2015 at 03:21, Martin Serrano <ma...@attivio.com> wrote:
> Hi,
>
> I have an application that is writing data rapidly directly to HDFS
> (creates and appends) as well as to HBase (10-15 tables).  The disk free
> for the filesystem will report that a large percentage of the system is
> in use:
>
> $ hdfs dfs -df -h /
> Filesystem     Size     Used  Available  Use%
> hdfs://ha   882.6 G  472.6 G    409.9 G   54%
>
> Yet when I try to figure out where the disk space is being used using
> dfs -du reports:
>
> $ hdfs dfs -du -h /
> 0        /app-logs
> 7.6 G    /apps
> 382.2 M  /hdp
> 0        /mapred
> 0        /mr-history
> 8.5 K    /tmp
> 3.8 G    /user
>
> A dfsadmin -report during the same time frame is below.  I'm trying to
> figure out where all of this space is going to.  When my application is
> killed or quiescent, the df and dfsadmin reports fall in line with what
> I would expect.  I'm running HDP 2.3 with a default configuration as set
> up by Ambari.  I'm looking for hints or suggestions on how I can
> investigate this issue.  It seems crazy that ingesting 12g or so of data
> can temporarily consume (reserve?) ~300g of HDFS.
>
> Thanks,
> Martin
>
> Configured Capacity: 947644268544 (882.56 GB)
> Present Capacity: 947064596261 (882.02 GB)
> DFS Remaining: 490046627240 (456.39 GB)
> DFS Used: 457017969021 (425.63 GB)
> DFS Used%: 48.26%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 0
>
> -------------------------------------------------
> Live datanodes (3):
>
> Name: *.*.*.*:50010 (**********.com)
> Hostname: **********.com
> Decommission Status : Normal
> Configured Capacity: 315881422848 (294.19 GB)
> DFS Used: 218955099179 (203.92 GB)
> Non DFS Used: 168255175 (160.46 MB)
> DFS Remaining: 96758068494 (90.11 GB)
> DFS Used%: 69.32%
> DFS Remaining%: 30.63%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 15
> Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
> Name: *.*.*.*:50010 (**********.com)
> Hostname: **********.com
> Decommission Status : Normal
> Configured Capacity: 315881422848 (294.19 GB)
> DFS Used: 218873337575 (203.84 GB)
> Non DFS Used: 151608508 (144.59 MB)
> DFS Remaining: 96856476765 (90.20 GB)
> DFS Used%: 69.29%
> DFS Remaining%: 30.66%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 16
> Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
> Name: *.*.*.*:50010 (*************.com)
> Hostname: ***********.com
> Decommission Status : Normal
> Configured Capacity: 315881422848 (294.19 GB)
> DFS Used: 19189532267 (17.87 GB)
> Non DFS Used: 259808600 (247.77 MB)
> DFS Remaining: 296432081981 (276.07 GB)
> DFS Used%: 6.07%
> DFS Remaining%: 93.84%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 16
> Last contact: Mon Dec 21 17:17:39 EST 2015
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org

Re: diagnosing the difference between dfs 'du' and 'df'

Posted by Namikaze Minato <ll...@gmail.com>.

This may be a wrong lead, but try to do your "du" command as hdfs
user, so that we are sure that we don't miss out read-protected
folders.

Regards,
LLoyd

On 22 December 2015 at 03:21, Martin Serrano <ma...@attivio.com> wrote:
> Hi,
>
> I have an application that is writing data rapidly directly to HDFS
> (creates and appends) as well as to HBase (10-15 tables).  The disk free
> for the filesystem will report that a large percentage of the system is
> in use:
>
> $ hdfs dfs -df -h /
> Filesystem     Size     Used  Available  Use%
> hdfs://ha   882.6 G  472.6 G    409.9 G   54%
>
> Yet when I try to figure out where the disk space is being used using
> dfs -du reports:
>
> $ hdfs dfs -du -h /
> 0        /app-logs
> 7.6 G    /apps
> 382.2 M  /hdp
> 0        /mapred
> 0        /mr-history
> 8.5 K    /tmp
> 3.8 G    /user
>
> A dfsadmin -report during the same time frame is below.  I'm trying to
> figure out where all of this space is going to.  When my application is
> killed or quiescent, the df and dfsadmin reports fall in line with what
> I would expect.  I'm running HDP 2.3 with a default configuration as set
> up by Ambari.  I'm looking for hints or suggestions on how I can
> investigate this issue.  It seems crazy that ingesting 12g or so of data
> can temporarily consume (reserve?) ~300g of HDFS.
>
> Thanks,
> Martin
>
> Configured Capacity: 947644268544 (882.56 GB)
> Present Capacity: 947064596261 (882.02 GB)
> DFS Remaining: 490046627240 (456.39 GB)
> DFS Used: 457017969021 (425.63 GB)
> DFS Used%: 48.26%
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 0
>
> -------------------------------------------------
> Live datanodes (3):
>
> Name: *.*.*.*:50010 (**********.com)
> Hostname: **********.com
> Decommission Status : Normal
> Configured Capacity: 315881422848 (294.19 GB)
> DFS Used: 218955099179 (203.92 GB)
> Non DFS Used: 168255175 (160.46 MB)
> DFS Remaining: 96758068494 (90.11 GB)
> DFS Used%: 69.32%
> DFS Remaining%: 30.63%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 15
> Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
> Name: *.*.*.*:50010 (**********.com)
> Hostname: **********.com
> Decommission Status : Normal
> Configured Capacity: 315881422848 (294.19 GB)
> DFS Used: 218873337575 (203.84 GB)
> Non DFS Used: 151608508 (144.59 MB)
> DFS Remaining: 96856476765 (90.20 GB)
> DFS Used%: 69.29%
> DFS Remaining%: 30.66%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 16
> Last contact: Mon Dec 21 17:17:38 EST 2015
>
>
> Name: *.*.*.*:50010 (*************.com)
> Hostname: ***********.com
> Decommission Status : Normal
> Configured Capacity: 315881422848 (294.19 GB)
> DFS Used: 19189532267 (17.87 GB)
> Non DFS Used: 259808600 (247.77 MB)
> DFS Remaining: 296432081981 (276.07 GB)
> DFS Used%: 6.07%
> DFS Remaining%: 93.84%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 16
> Last contact: Mon Dec 21 17:17:39 EST 2015
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
> For additional commands, e-mail: user-help@hadoop.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org