You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sagar Shukla <sa...@persistent.co.in> on 2011/07/07 11:59:02 UTC

Difference between DFS Used and Non-DFS Used

Hi,
       What is the difference between DFS Used and Non-DFS used ?

Thanks,
Sagar

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


Re: Difference between DFS Used and Non-DFS Used

Posted by Harsh J <ha...@cloudera.com>.
I did not get that question, "require"? Its not a count of something
HDFS uses, just outside of it (logs, other apps, OS, w/e that uses
other space would show up in that metric). Am not sure I understand
you? Isn't 250 GB already utilized looking at your disks?

On Fri, Jul 8, 2011 at 4:54 PM, Sagar Shukla
<sa...@persistent.co.in> wrote:
> Thanks Harsh. My first question still remains unanswered - "Why does it require non-DFS storage?". If it is cache data then it should get flushed from the system after certain interval of time. And if it is useful data then it should have been part of used DFS data.
>
> I have a setup in which DFS used is use approx. 10 MB whereas non-DFS used is around 250 GB which is quite ridiculous.
>
> Thanks,
> Sagar
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, July 08, 2011 4:42 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Difference between DFS Used and Non-DFS Used
>
> It is just for information's sake (cause it can be computed with the
> data collected). The space is accounted just to let you know that
> there's something being stored on the DataNodes apart from just the
> HDFS data, in case you are running out of space.
>
> On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla
> <sa...@persistent.co.in> wrote:
>> Hi Harsh,
>>     Thanks for your reply.
>>
>> But why does it require non-DFS storage ? And why that space is accounted differently from regular DFS storage ?
>>
>> Ideally, it should have been part of same storage.
>>
>> Thanks,
>> Sagar
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Thursday, July 07, 2011 6:04 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: Difference between DFS Used and Non-DFS Used
>>
>> DFS used is a count of all the space used by the dfs.data.dirs. The
>> non-dfs used space is whatever space is occupied beyond that (which
>> the DN does not account for).
>>
>> On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
>> <sa...@persistent.co.in> wrote:
>>> Hi,
>>>       What is the difference between DFS Used and Non-DFS used ?
>>>
>>> Thanks,
>>> Sagar
>>>
>>> DISCLAIMER
>>> ==========
>>> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>> DISCLAIMER
>> ==========
>> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>>
>>
>
>
>
> --
> Harsh J
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>
>



-- 
Harsh J

RE: Difference between DFS Used and Non-DFS Used

Posted by Sagar Shukla <sa...@persistent.co.in>.
Hi Suresh / Harsh,
      Thanks for the details. Let me go over the setup again and get some understanding of what you are saying.

Thanks,
Sagar

-----Original Message-----
From: Suresh Srinivas [mailto:srini30005@gmail.com] 
Sent: Friday, July 08, 2011 5:43 PM
To: common-user@hadoop.apache.org
Subject: Re: Difference between DFS Used and Non-DFS Used

non DFS storage is not required, it is provided as information only to shown
how the storage is being used.

The available storage on the disks is used for both DFS and non DFS
(mapreduce shuffle output and any other files that could be on the disks).

See if you have unnecessary files or shuffle output that is lingering on
these disks, that is contributing to 250GB. Delete the unneeded files and
you should be able to reclaim some of the 250GB.

On Fri, Jul 8, 2011 at 4:24 AM, Sagar Shukla
<sa...@persistent.co.in>wrote:

> Thanks Harsh. My first question still remains unanswered - "Why does it
> require non-DFS storage?". If it is cache data then it should get flushed
> from the system after certain interval of time. And if it is useful data
> then it should have been part of used DFS data.
>
> I have a setup in which DFS used is use approx. 10 MB whereas non-DFS used
> is around 250 GB which is quite ridiculous.
>
> Thanks,
> Sagar
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, July 08, 2011 4:42 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Difference between DFS Used and Non-DFS Used
>
> It is just for information's sake (cause it can be computed with the
> data collected). The space is accounted just to let you know that
> there's something being stored on the DataNodes apart from just the
> HDFS data, in case you are running out of space.
>
> On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla
> <sa...@persistent.co.in> wrote:
> > Hi Harsh,
> >     Thanks for your reply.
> >
> > But why does it require non-DFS storage ? And why that space is accounted
> differently from regular DFS storage ?
> >
> > Ideally, it should have been part of same storage.
> >
> > Thanks,
> > Sagar
> >
> > -----Original Message-----
> > From: Harsh J [mailto:harsh@cloudera.com]
> > Sent: Thursday, July 07, 2011 6:04 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: Difference between DFS Used and Non-DFS Used
> >
> > DFS used is a count of all the space used by the dfs.data.dirs. The
> > non-dfs used space is whatever space is occupied beyond that (which
> > the DN does not account for).
> >
> > On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
> > <sa...@persistent.co.in> wrote:
> >> Hi,
> >>       What is the difference between DFS Used and Non-DFS used ?
> >>
> >> Thanks,
> >> Sagar
> >>
> >> DISCLAIMER
> >> ==========
> >> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
> >>
> >>
> >
> >
> >
> > --
> > Harsh J
> >
> > DISCLAIMER
> > ==========
> > This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
> >
> >
>
>
>
> --
> Harsh J
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>
>


-- 
Regards,
Suresh

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


Re: Difference between DFS Used and Non-DFS Used

Posted by Suresh Srinivas <sr...@gmail.com>.
non DFS storage is not required, it is provided as information only to shown
how the storage is being used.

The available storage on the disks is used for both DFS and non DFS
(mapreduce shuffle output and any other files that could be on the disks).

See if you have unnecessary files or shuffle output that is lingering on
these disks, that is contributing to 250GB. Delete the unneeded files and
you should be able to reclaim some of the 250GB.

On Fri, Jul 8, 2011 at 4:24 AM, Sagar Shukla
<sa...@persistent.co.in>wrote:

> Thanks Harsh. My first question still remains unanswered - "Why does it
> require non-DFS storage?". If it is cache data then it should get flushed
> from the system after certain interval of time. And if it is useful data
> then it should have been part of used DFS data.
>
> I have a setup in which DFS used is use approx. 10 MB whereas non-DFS used
> is around 250 GB which is quite ridiculous.
>
> Thanks,
> Sagar
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, July 08, 2011 4:42 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Difference between DFS Used and Non-DFS Used
>
> It is just for information's sake (cause it can be computed with the
> data collected). The space is accounted just to let you know that
> there's something being stored on the DataNodes apart from just the
> HDFS data, in case you are running out of space.
>
> On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla
> <sa...@persistent.co.in> wrote:
> > Hi Harsh,
> >     Thanks for your reply.
> >
> > But why does it require non-DFS storage ? And why that space is accounted
> differently from regular DFS storage ?
> >
> > Ideally, it should have been part of same storage.
> >
> > Thanks,
> > Sagar
> >
> > -----Original Message-----
> > From: Harsh J [mailto:harsh@cloudera.com]
> > Sent: Thursday, July 07, 2011 6:04 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: Difference between DFS Used and Non-DFS Used
> >
> > DFS used is a count of all the space used by the dfs.data.dirs. The
> > non-dfs used space is whatever space is occupied beyond that (which
> > the DN does not account for).
> >
> > On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
> > <sa...@persistent.co.in> wrote:
> >> Hi,
> >>       What is the difference between DFS Used and Non-DFS used ?
> >>
> >> Thanks,
> >> Sagar
> >>
> >> DISCLAIMER
> >> ==========
> >> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
> >>
> >>
> >
> >
> >
> > --
> > Harsh J
> >
> > DISCLAIMER
> > ==========
> > This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
> >
> >
>
>
>
> --
> Harsh J
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>
>


-- 
Regards,
Suresh

RE: Difference between DFS Used and Non-DFS Used

Posted by Sagar Shukla <sa...@persistent.co.in>.
Thanks Harsh. My first question still remains unanswered - "Why does it require non-DFS storage?". If it is cache data then it should get flushed from the system after certain interval of time. And if it is useful data then it should have been part of used DFS data.

I have a setup in which DFS used is use approx. 10 MB whereas non-DFS used is around 250 GB which is quite ridiculous.

Thanks,
Sagar

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Friday, July 08, 2011 4:42 PM
To: common-user@hadoop.apache.org
Subject: Re: Difference between DFS Used and Non-DFS Used

It is just for information's sake (cause it can be computed with the
data collected). The space is accounted just to let you know that
there's something being stored on the DataNodes apart from just the
HDFS data, in case you are running out of space.

On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla
<sa...@persistent.co.in> wrote:
> Hi Harsh,
>     Thanks for your reply.
>
> But why does it require non-DFS storage ? And why that space is accounted differently from regular DFS storage ?
>
> Ideally, it should have been part of same storage.
>
> Thanks,
> Sagar
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Thursday, July 07, 2011 6:04 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Difference between DFS Used and Non-DFS Used
>
> DFS used is a count of all the space used by the dfs.data.dirs. The
> non-dfs used space is whatever space is occupied beyond that (which
> the DN does not account for).
>
> On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
> <sa...@persistent.co.in> wrote:
>> Hi,
>>       What is the difference between DFS Used and Non-DFS used ?
>>
>> Thanks,
>> Sagar
>>
>> DISCLAIMER
>> ==========
>> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>>
>>
>
>
>
> --
> Harsh J
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>
>



-- 
Harsh J

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


Re: Difference between DFS Used and Non-DFS Used

Posted by Harsh J <ha...@cloudera.com>.
It is just for information's sake (cause it can be computed with the
data collected). The space is accounted just to let you know that
there's something being stored on the DataNodes apart from just the
HDFS data, in case you are running out of space.

On Fri, Jul 8, 2011 at 10:18 AM, Sagar Shukla
<sa...@persistent.co.in> wrote:
> Hi Harsh,
>     Thanks for your reply.
>
> But why does it require non-DFS storage ? And why that space is accounted differently from regular DFS storage ?
>
> Ideally, it should have been part of same storage.
>
> Thanks,
> Sagar
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Thursday, July 07, 2011 6:04 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Difference between DFS Used and Non-DFS Used
>
> DFS used is a count of all the space used by the dfs.data.dirs. The
> non-dfs used space is whatever space is occupied beyond that (which
> the DN does not account for).
>
> On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
> <sa...@persistent.co.in> wrote:
>> Hi,
>>       What is the difference between DFS Used and Non-DFS used ?
>>
>> Thanks,
>> Sagar
>>
>> DISCLAIMER
>> ==========
>> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>>
>>
>
>
>
> --
> Harsh J
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>
>



-- 
Harsh J

RE: Difference between DFS Used and Non-DFS Used

Posted by Sagar Shukla <sa...@persistent.co.in>.
Hi Harsh,
     Thanks for your reply.

But why does it require non-DFS storage ? And why that space is accounted differently from regular DFS storage ?

Ideally, it should have been part of same storage.

Thanks,
Sagar

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, July 07, 2011 6:04 PM
To: common-user@hadoop.apache.org
Subject: Re: Difference between DFS Used and Non-DFS Used

DFS used is a count of all the space used by the dfs.data.dirs. The
non-dfs used space is whatever space is occupied beyond that (which
the DN does not account for).

On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
<sa...@persistent.co.in> wrote:
> Hi,
>       What is the difference between DFS Used and Non-DFS used ?
>
> Thanks,
> Sagar
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>
>



-- 
Harsh J

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.


Re: Difference between DFS Used and Non-DFS Used

Posted by Harsh J <ha...@cloudera.com>.
DFS used is a count of all the space used by the dfs.data.dirs. The
non-dfs used space is whatever space is occupied beyond that (which
the DN does not account for).

On Thu, Jul 7, 2011 at 3:29 PM, Sagar Shukla
<sa...@persistent.co.in> wrote:
> Hi,
>       What is the difference between DFS Used and Non-DFS used ?
>
> Thanks,
> Sagar
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
>
>



-- 
Harsh J