You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Martin Traverso <mt...@gmail.com> on 2008/02/15 22:05:40 UTC

dfsadmin reporting wrong disk usage numbers

Hi,

Are there any known issues on how dfsadmin reports disk usage? I'm getting
some weird values:

Name: 10.15.104.46:50010
State          : In Service
Total raw bytes: 1433244008448 (1.3 TB)
Remaining raw bytes: 383128089432(356.82 GB)
Used raw bytes: 1042296986024 (970.71 GB)
% used: 72.72%


However, usage on that box is:

size   used  avail capacity  Mounted on
650G   240G   409G    37%    /local/data/hadoop/d0
685G   243G   443G    36%    /local/data/hadoop/d1

d0 and d1 are mounted on two separate drives. The used raw bytes count is
off by 2x.

Thanks,
Martin

Re: dfsadmin reporting wrong disk usage numbers

Posted by Konstantin Shvachko <sh...@yahoo-inc.com>.
Yes, please file a bug.
There are file systems with different block sizes out there Linux or Solaris.

Thanks,
--Konstantin

Martin Traverso wrote:
> I think I found the issue. The class org.apache.hadoop.fs.DU assumes
> 1024-byte blocks when reporting usage information:
> 
>    this.used = Long.parseLong(tokens[0])*1024;
> 
> This works fine in linux, but in Solaris and Mac OS the reported number of
> blocks is based on 512-byte blocks.
> 
> The solution is simple: DU should use "du -sk" instead of "du -s".
> 
> Should I file I bug for this?
> 
> Martin
> 

Re: dfsadmin reporting wrong disk usage numbers

Posted by Martin Traverso <mt...@gmail.com>.
I think I found the issue. The class org.apache.hadoop.fs.DU assumes
1024-byte blocks when reporting usage information:

   this.used = Long.parseLong(tokens[0])*1024;

This works fine in linux, but in Solaris and Mac OS the reported number of
blocks is based on 512-byte blocks.

The solution is simple: DU should use "du -sk" instead of "du -s".

Should I file I bug for this?

Martin

Re: dfsadmin reporting wrong disk usage numbers

Posted by Martin Traverso <mt...@gmail.com>.
>
> What are the data directories
> specified in your configuration? Have you specified two data directories
> per
> volume?
>

No, just one directory per volume. This is the value of dfs.data.dir in my
hadoop-site.xml:

        <property>
          <name>dfs.data.dir</name>

<value>/local/data/hadoop/d0/dfs/data,/local/data/hadoop/d1/dfs/data</value>
        </property>

Martin


>
> Hairong
>
> On 2/15/08 1:05 PM, "Martin Traverso" <mt...@gmail.com> wrote:
>
> > Hi,
> >
> > Are there any known issues on how dfsadmin reports disk usage? I'm
> getting
> > some weird values:
> >
> > Name: 10.15.104.46:50010
> > State          : In Service
> > Total raw bytes: 1433244008448 (1.3 TB)
> > Remaining raw bytes: 383128089432(356.82 GB)
> > Used raw bytes: 1042296986024 (970.71 GB)
> > % used: 72.72%
> >
> >
> > However, usage on that box is:
> >
> > size   used  avail capacity  Mounted on
> > 650G   240G   409G    37%    /local/data/hadoop/d0
> > 685G   243G   443G    36%    /local/data/hadoop/d1
> >
> > d0 and d1 are mounted on two separate drives. The used raw bytes count
> is
> > off by 2x.
> >
> > Thanks,
> > Martin
>
>

Re: dfsadmin reporting wrong disk usage numbers

Posted by Hairong Kuang <ha...@yahoo-inc.com>.
Datanode run du on data directories hourly. In between two "du"s, used space
is updated when a block is added or deleted. What are the data directories
specified in your configuration? Have you specified two data directories per
volume?

Hairong

On 2/15/08 1:05 PM, "Martin Traverso" <mt...@gmail.com> wrote:

> Hi,
> 
> Are there any known issues on how dfsadmin reports disk usage? I'm getting
> some weird values:
> 
> Name: 10.15.104.46:50010
> State          : In Service
> Total raw bytes: 1433244008448 (1.3 TB)
> Remaining raw bytes: 383128089432(356.82 GB)
> Used raw bytes: 1042296986024 (970.71 GB)
> % used: 72.72%
> 
> 
> However, usage on that box is:
> 
> size   used  avail capacity  Mounted on
> 650G   240G   409G    37%    /local/data/hadoop/d0
> 685G   243G   443G    36%    /local/data/hadoop/d1
> 
> d0 and d1 are mounted on two separate drives. The used raw bytes count is
> off by 2x.
> 
> Thanks,
> Martin