You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by David M <mc...@outlook.com> on 2019/11/08 17:31:45 UTC

HDFS du Utility Inconsistencies?

All,

I'm working on a cluster that is running Hadoop 2.7.3. I have one folder in particular where the command hdfs dfs -du is giving me strange results. If I query the folder and ask for a summary, it tells me 10 GB. If I don't ask for a summary, all of the folders underneath don't even add up to 1 GB, much less 10 GB.

I've verified this is true over time and is true using the hdfs user or any other user. We are on an HDP cluster, so we are using Ranger for HDFS security, and Kerberos for authentication. We see similar results in -count, where the size and counts are both different. We have not seen this behavior in any other folders.

See below for a sample output we are seeing. I've replaced the full path with a fake path to protect the data we have on the cluster. Does anyone know anything that would cause this behavior? Thanks!

$ hdfs dfs -du -h /randomFolder
119.9 M  /randomFolder/bug
1.0 M    /randomFolder/commitment
86.8 K   /randomFolder/customfield
31.3 M   /randomFolder/epic
10.3 M   /randomFolder/feature
4.0 M    /randomFolder/insprintbug
372.9 K  /randomFolder/project
15.1 K   /randomFolder/projectstatus
330.9 M  /randomFolder/story
256.3 M  /randomFolder/subtask
74.7 K   /randomFolder/subtemplate
89.6 M   /randomFolder/task
7.4 M    /randomFolder/techdebt
117.7 K  /randomFolder/template
617.9 K  /randomFolder/tempomember
8.2 K    /randomFolder/tempoteam
1.4 M    /randomFolder/tempoworklog

$ hdfs dfs -du -h -s /randomFolder
10.6 G  /randomFolder

David McGinnis


Re: HDFS du Utility Inconsistencies?

Posted by David M <mc...@outlook.com>.
We use snapshots in the cluster, but I've not seen any snapshot folders underneath the folder in question. I'd need to verify with the application team if snapshots for this folder are available anywhere.

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Arpit Agarwal <aa...@cloudera.com>
Sent: Friday, November 8, 2019 11:41:31 AM
To: David M <mc...@outlook.com>
Cc: user@hadoop.apache.org <us...@hadoop.apache.org>
Subject: Re: HDFS du Utility Inconsistencies?

Got any snapshots?

On Fri, Nov 8, 2019, 09:38 David M <mc...@outlook.com>> wrote:

All,



I’m working on a cluster that is running Hadoop 2.7.3. I have one folder in particular where the command hdfs dfs -du is giving me strange results. If I query the folder and ask for a summary, it tells me 10 GB. If I don’t ask for a summary, all of the folders underneath don’t even add up to 1 GB, much less 10 GB.



I’ve verified this is true over time and is true using the hdfs user or any other user. We are on an HDP cluster, so we are using Ranger for HDFS security, and Kerberos for authentication. We see similar results in -count, where the size and counts are both different. We have not seen this behavior in any other folders.



See below for a sample output we are seeing. I’ve replaced the full path with a fake path to protect the data we have on the cluster. Does anyone know anything that would cause this behavior? Thanks!



$ hdfs dfs -du -h /randomFolder

119.9 M  /randomFolder/bug

1.0 M    /randomFolder/commitment

86.8 K   /randomFolder/customfield

31.3 M   /randomFolder/epic

10.3 M   /randomFolder/feature

4.0 M    /randomFolder/insprintbug

372.9 K  /randomFolder/project

15.1 K   /randomFolder/projectstatus

330.9 M  /randomFolder/story

256.3 M  /randomFolder/subtask

74.7 K   /randomFolder/subtemplate

89.6 M   /randomFolder/task

7.4 M    /randomFolder/techdebt

117.7 K  /randomFolder/template

617.9 K  /randomFolder/tempomember

8.2 K    /randomFolder/tempoteam

1.4 M    /randomFolder/tempoworklog



$ hdfs dfs -du -h -s /randomFolder

10.6 G  /randomFolder



David McGinnis



Re: HDFS du Utility Inconsistencies?

Posted by Arpit Agarwal <aa...@cloudera.com.INVALID>.
Got any snapshots?

On Fri, Nov 8, 2019, 09:38 David M <mc...@outlook.com> wrote:

> All,
>
>
>
> I’m working on a cluster that is running Hadoop 2.7.3. I have one folder
> in particular where the command hdfs dfs -du is giving me strange results.
> If I query the folder and ask for a summary, it tells me 10 GB. If I don’t
> ask for a summary, all of the folders underneath don’t even add up to 1 GB,
> much less 10 GB.
>
>
>
> I’ve verified this is true over time and is true using the hdfs user or
> any other user. We are on an HDP cluster, so we are using Ranger for HDFS
> security, and Kerberos for authentication. We see similar results in
> -count, where the size and counts are both different. We have not seen this
> behavior in any other folders.
>
>
>
> See below for a sample output we are seeing. I’ve replaced the full path
> with a fake path to protect the data we have on the cluster. Does anyone
> know anything that would cause this behavior? Thanks!
>
>
>
> $ hdfs dfs -du -h /randomFolder
>
> 119.9 M  /randomFolder/bug
>
> 1.0 M    /randomFolder/commitment
>
> 86.8 K   /randomFolder/customfield
>
> 31.3 M   /randomFolder/epic
>
> 10.3 M   /randomFolder/feature
>
> 4.0 M    /randomFolder/insprintbug
>
> 372.9 K  /randomFolder/project
>
> 15.1 K   /randomFolder/projectstatus
>
> 330.9 M  /randomFolder/story
>
> 256.3 M  /randomFolder/subtask
>
> 74.7 K   /randomFolder/subtemplate
>
> 89.6 M   /randomFolder/task
>
> 7.4 M    /randomFolder/techdebt
>
> 117.7 K  /randomFolder/template
>
> 617.9 K  /randomFolder/tempomember
>
> 8.2 K    /randomFolder/tempoteam
>
> 1.4 M    /randomFolder/tempoworklog
>
>
>
> $ hdfs dfs -du -h -s /randomFolder
>
> 10.6 G  /randomFolder
>
>
>
> David McGinnis
>
>
>