You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Manoj Samel <ma...@gmail.com> on 2014/10/10 23:21:14 UTC

DFS Used V/S Non DFS Used

Hi,

Not clear how this computation is done

For sake of discussion Say the machine with data node has two disks /disk1
and /disk2. And each of these disk has a directory for data node and a
directory for non-datanode usage.

/disk1/datanode
/disk1/non-datanode
/disk2/datanode
/disk2/non-datanode

The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".

With this, what does the DFS and NonDFS indicates? Does it indicates
SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?

Thanks,

RE: DFS Used V/S Non DFS Used

Posted by Brahma Reddy Battula <br...@huawei.com>.
Hi Manoj


Non DFS used is any data in the filesystem of the data node(s) that isn't in dfs.datanode.data.dirs.

This would include log files, mapreduce shuffle output and local copies of data files (if you put them on a data node).

Use du or a similar tool to see whats taking up the space in your filesystem..


Non DFS used" is calculated by following formula:

Non DFS Used = Configured Capacity - Remaining Space - DFS Used

It is still confusing, at least for me.

Because Configured Capacity = Total Disk Space - Reserved Space.

So Non DFS used = ( Total Disk Space - Reserved Space) - Remaining Space - DFS Used

Let's take a example. Assuming I have 100 GB disk, and I set the reserved space (dfs.datanode.du.reserved) to 30 GB.

In the disk, the system and other files used up to 40 GB, DFS Used 10 GB. If you run df -h , you will see the available space is 50GB for that disk volume.

In HDFS web UI, it will show

Non DFS used = 100GB(Total) - 30 GB( Reserved) - 10 GB (DFS used) - 50GB(Remaining) = 10 GB

So it actually means, you initially configured to reserve 30G for non dfs usage, and 70 G for HDFS. However, it turns out non dfs usage exceeds the 30G reservation and eat up 10 GB space which should belongs to HDFS!

The term "Non DFS used" should really be renamed to something like "How much configured DFS capacity are occupied by non dfs use"

And one should stop try to figure out why the non dfs use are so high inside hadoop.

One useful command is lsof | grep delete, which will help you identify those open file which has been deleted. Sometimes, Hadoop processes (like hive, yarn, and mapred and hdfs) may hold reference to those already deleted files. And these references will occupy disk space.

Also du -hsx * | sort -rh | head -10 helps list the top ten largest folders.





Thanks & Regards



Brahma Reddy Battula



HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
Mobile : +91   9620022006
Fax : +91-80-41118578

________________________________
From: Manoj Samel [manojsameltech@gmail.com]
Sent: Saturday, October 11, 2014 3:08 AM
To: user@hadoop.apache.org
Subject: Re: DFS Used V/S Non DFS Used

Thanks Suresh - still not clear

Say the "dfs.datanode.du.reserved" is not set (default seems 0). The "non DFS Used" reported number is non-zero. What does this means ? What is being referred as "temp files" ? and how can they encroach in the example of /disk1/datanode, /disk2/datanode etc.

Thanks,

On Fri, Oct 10, 2014 at 2:29 PM, Suresh Srinivas <su...@hortonworks.com>> wrote:
Here is the information from - https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
Here are the definition of data reported on the Web UI:
Configured Capacity: Disk space corresponding to all the data directories - Reserved space as defined by dfs.datanode.du.reserved
DFS Used: Space used by DFS
Non DFS Used: 0 if the temporary files do not exceed reserved space. Otherwise this is the size by which temporary files exceed the reserved space and encroach into the DFS configured space.
DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
DFS Used %: (DFS Used / Configured Capacity) * 100
DFS Remaining % = (DFS Remaining / Configured Capacity) * 100

On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <ma...@gmail.com>> wrote:
Hi,

Not clear how this computation is done

For sake of discussion Say the machine with data node has two disks /disk1 and /disk2. And each of these disk has a directory for data node and a directory for non-datanode usage.

/disk1/datanode
/disk1/non-datanode
/disk2/datanode
/disk2/non-datanode

The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".

With this, what does the DFS and NonDFS indicates? Does it indicates SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?

Thanks,





--
http://hortonworks.com/download/

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.


RE: DFS Used V/S Non DFS Used

Posted by Brahma Reddy Battula <br...@huawei.com>.
Hi Manoj


Non DFS used is any data in the filesystem of the data node(s) that isn't in dfs.datanode.data.dirs.

This would include log files, mapreduce shuffle output and local copies of data files (if you put them on a data node).

Use du or a similar tool to see whats taking up the space in your filesystem..


Non DFS used" is calculated by following formula:

Non DFS Used = Configured Capacity - Remaining Space - DFS Used

It is still confusing, at least for me.

Because Configured Capacity = Total Disk Space - Reserved Space.

So Non DFS used = ( Total Disk Space - Reserved Space) - Remaining Space - DFS Used

Let's take a example. Assuming I have 100 GB disk, and I set the reserved space (dfs.datanode.du.reserved) to 30 GB.

In the disk, the system and other files used up to 40 GB, DFS Used 10 GB. If you run df -h , you will see the available space is 50GB for that disk volume.

In HDFS web UI, it will show

Non DFS used = 100GB(Total) - 30 GB( Reserved) - 10 GB (DFS used) - 50GB(Remaining) = 10 GB

So it actually means, you initially configured to reserve 30G for non dfs usage, and 70 G for HDFS. However, it turns out non dfs usage exceeds the 30G reservation and eat up 10 GB space which should belongs to HDFS!

The term "Non DFS used" should really be renamed to something like "How much configured DFS capacity are occupied by non dfs use"

And one should stop try to figure out why the non dfs use are so high inside hadoop.

One useful command is lsof | grep delete, which will help you identify those open file which has been deleted. Sometimes, Hadoop processes (like hive, yarn, and mapred and hdfs) may hold reference to those already deleted files. And these references will occupy disk space.

Also du -hsx * | sort -rh | head -10 helps list the top ten largest folders.





Thanks & Regards



Brahma Reddy Battula



HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
Mobile : +91   9620022006
Fax : +91-80-41118578

________________________________
From: Manoj Samel [manojsameltech@gmail.com]
Sent: Saturday, October 11, 2014 3:08 AM
To: user@hadoop.apache.org
Subject: Re: DFS Used V/S Non DFS Used

Thanks Suresh - still not clear

Say the "dfs.datanode.du.reserved" is not set (default seems 0). The "non DFS Used" reported number is non-zero. What does this means ? What is being referred as "temp files" ? and how can they encroach in the example of /disk1/datanode, /disk2/datanode etc.

Thanks,

On Fri, Oct 10, 2014 at 2:29 PM, Suresh Srinivas <su...@hortonworks.com>> wrote:
Here is the information from - https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
Here are the definition of data reported on the Web UI:
Configured Capacity: Disk space corresponding to all the data directories - Reserved space as defined by dfs.datanode.du.reserved
DFS Used: Space used by DFS
Non DFS Used: 0 if the temporary files do not exceed reserved space. Otherwise this is the size by which temporary files exceed the reserved space and encroach into the DFS configured space.
DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
DFS Used %: (DFS Used / Configured Capacity) * 100
DFS Remaining % = (DFS Remaining / Configured Capacity) * 100

On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <ma...@gmail.com>> wrote:
Hi,

Not clear how this computation is done

For sake of discussion Say the machine with data node has two disks /disk1 and /disk2. And each of these disk has a directory for data node and a directory for non-datanode usage.

/disk1/datanode
/disk1/non-datanode
/disk2/datanode
/disk2/non-datanode

The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".

With this, what does the DFS and NonDFS indicates? Does it indicates SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?

Thanks,





--
http://hortonworks.com/download/

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.


RE: DFS Used V/S Non DFS Used

Posted by Brahma Reddy Battula <br...@huawei.com>.
Hi Manoj


Non DFS used is any data in the filesystem of the data node(s) that isn't in dfs.datanode.data.dirs.

This would include log files, mapreduce shuffle output and local copies of data files (if you put them on a data node).

Use du or a similar tool to see whats taking up the space in your filesystem..


Non DFS used" is calculated by following formula:

Non DFS Used = Configured Capacity - Remaining Space - DFS Used

It is still confusing, at least for me.

Because Configured Capacity = Total Disk Space - Reserved Space.

So Non DFS used = ( Total Disk Space - Reserved Space) - Remaining Space - DFS Used

Let's take a example. Assuming I have 100 GB disk, and I set the reserved space (dfs.datanode.du.reserved) to 30 GB.

In the disk, the system and other files used up to 40 GB, DFS Used 10 GB. If you run df -h , you will see the available space is 50GB for that disk volume.

In HDFS web UI, it will show

Non DFS used = 100GB(Total) - 30 GB( Reserved) - 10 GB (DFS used) - 50GB(Remaining) = 10 GB

So it actually means, you initially configured to reserve 30G for non dfs usage, and 70 G for HDFS. However, it turns out non dfs usage exceeds the 30G reservation and eat up 10 GB space which should belongs to HDFS!

The term "Non DFS used" should really be renamed to something like "How much configured DFS capacity are occupied by non dfs use"

And one should stop try to figure out why the non dfs use are so high inside hadoop.

One useful command is lsof | grep delete, which will help you identify those open file which has been deleted. Sometimes, Hadoop processes (like hive, yarn, and mapred and hdfs) may hold reference to those already deleted files. And these references will occupy disk space.

Also du -hsx * | sort -rh | head -10 helps list the top ten largest folders.





Thanks & Regards



Brahma Reddy Battula



HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
Mobile : +91   9620022006
Fax : +91-80-41118578

________________________________
From: Manoj Samel [manojsameltech@gmail.com]
Sent: Saturday, October 11, 2014 3:08 AM
To: user@hadoop.apache.org
Subject: Re: DFS Used V/S Non DFS Used

Thanks Suresh - still not clear

Say the "dfs.datanode.du.reserved" is not set (default seems 0). The "non DFS Used" reported number is non-zero. What does this means ? What is being referred as "temp files" ? and how can they encroach in the example of /disk1/datanode, /disk2/datanode etc.

Thanks,

On Fri, Oct 10, 2014 at 2:29 PM, Suresh Srinivas <su...@hortonworks.com>> wrote:
Here is the information from - https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
Here are the definition of data reported on the Web UI:
Configured Capacity: Disk space corresponding to all the data directories - Reserved space as defined by dfs.datanode.du.reserved
DFS Used: Space used by DFS
Non DFS Used: 0 if the temporary files do not exceed reserved space. Otherwise this is the size by which temporary files exceed the reserved space and encroach into the DFS configured space.
DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
DFS Used %: (DFS Used / Configured Capacity) * 100
DFS Remaining % = (DFS Remaining / Configured Capacity) * 100

On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <ma...@gmail.com>> wrote:
Hi,

Not clear how this computation is done

For sake of discussion Say the machine with data node has two disks /disk1 and /disk2. And each of these disk has a directory for data node and a directory for non-datanode usage.

/disk1/datanode
/disk1/non-datanode
/disk2/datanode
/disk2/non-datanode

The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".

With this, what does the DFS and NonDFS indicates? Does it indicates SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?

Thanks,





--
http://hortonworks.com/download/

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.


RE: DFS Used V/S Non DFS Used

Posted by Brahma Reddy Battula <br...@huawei.com>.
Hi Manoj


Non DFS used is any data in the filesystem of the data node(s) that isn't in dfs.datanode.data.dirs.

This would include log files, mapreduce shuffle output and local copies of data files (if you put them on a data node).

Use du or a similar tool to see whats taking up the space in your filesystem..


Non DFS used" is calculated by following formula:

Non DFS Used = Configured Capacity - Remaining Space - DFS Used

It is still confusing, at least for me.

Because Configured Capacity = Total Disk Space - Reserved Space.

So Non DFS used = ( Total Disk Space - Reserved Space) - Remaining Space - DFS Used

Let's take a example. Assuming I have 100 GB disk, and I set the reserved space (dfs.datanode.du.reserved) to 30 GB.

In the disk, the system and other files used up to 40 GB, DFS Used 10 GB. If you run df -h , you will see the available space is 50GB for that disk volume.

In HDFS web UI, it will show

Non DFS used = 100GB(Total) - 30 GB( Reserved) - 10 GB (DFS used) - 50GB(Remaining) = 10 GB

So it actually means, you initially configured to reserve 30G for non dfs usage, and 70 G for HDFS. However, it turns out non dfs usage exceeds the 30G reservation and eat up 10 GB space which should belongs to HDFS!

The term "Non DFS used" should really be renamed to something like "How much configured DFS capacity are occupied by non dfs use"

And one should stop try to figure out why the non dfs use are so high inside hadoop.

One useful command is lsof | grep delete, which will help you identify those open file which has been deleted. Sometimes, Hadoop processes (like hive, yarn, and mapred and hdfs) may hold reference to those already deleted files. And these references will occupy disk space.

Also du -hsx * | sort -rh | head -10 helps list the top ten largest folders.





Thanks & Regards



Brahma Reddy Battula



HUAWEI TECHNOLOGIES INDIA PVT.LTD.
Ground,1&2 floors,Solitaire,
139/26,Amarjyoti Layout,Intermediate Ring Road,Domlur
Bangalore - 560 071 , India
Tel : +91- 80- 3980 9600  Ext No: 4905
Mobile : +91   9620022006
Fax : +91-80-41118578

________________________________
From: Manoj Samel [manojsameltech@gmail.com]
Sent: Saturday, October 11, 2014 3:08 AM
To: user@hadoop.apache.org
Subject: Re: DFS Used V/S Non DFS Used

Thanks Suresh - still not clear

Say the "dfs.datanode.du.reserved" is not set (default seems 0). The "non DFS Used" reported number is non-zero. What does this means ? What is being referred as "temp files" ? and how can they encroach in the example of /disk1/datanode, /disk2/datanode etc.

Thanks,

On Fri, Oct 10, 2014 at 2:29 PM, Suresh Srinivas <su...@hortonworks.com>> wrote:
Here is the information from - https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
Here are the definition of data reported on the Web UI:
Configured Capacity: Disk space corresponding to all the data directories - Reserved space as defined by dfs.datanode.du.reserved
DFS Used: Space used by DFS
Non DFS Used: 0 if the temporary files do not exceed reserved space. Otherwise this is the size by which temporary files exceed the reserved space and encroach into the DFS configured space.
DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
DFS Used %: (DFS Used / Configured Capacity) * 100
DFS Remaining % = (DFS Remaining / Configured Capacity) * 100

On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <ma...@gmail.com>> wrote:
Hi,

Not clear how this computation is done

For sake of discussion Say the machine with data node has two disks /disk1 and /disk2. And each of these disk has a directory for data node and a directory for non-datanode usage.

/disk1/datanode
/disk1/non-datanode
/disk2/datanode
/disk2/non-datanode

The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".

With this, what does the DFS and NonDFS indicates? Does it indicates SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?

Thanks,





--
http://hortonworks.com/download/

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.


Re: DFS Used V/S Non DFS Used

Posted by Manoj Samel <ma...@gmail.com>.
Thanks Suresh - still not clear

Say the "dfs.datanode.du.reserved" is not set (default seems 0). The "non
DFS Used" reported number is non-zero. What does this means ? What is being
referred as "temp files" ? and how can they encroach in the example of
/disk1/datanode, /disk2/datanode etc.

Thanks,

On Fri, Oct 10, 2014 at 2:29 PM, Suresh Srinivas <su...@hortonworks.com>
wrote:

> Here is the information from -
> https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
> Here are the definition of data reported on the Web UI:
> Configured Capacity: Disk space corresponding to all the data directories
> - Reserved space as defined by dfs.datanode.du.reserved
> DFS Used: Space used by DFS
> Non DFS Used: 0 if the temporary files do not exceed reserved space.
> Otherwise this is the size by which temporary files exceed the reserved
> space and encroach into the DFS configured space.
> DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
> DFS Used %: (DFS Used / Configured Capacity) * 100
> DFS Remaining % = (DFS Remaining / Configured Capacity) * 100
>
> On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <ma...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Not clear how this computation is done
>>
>> For sake of discussion Say the machine with data node has two disks
>> /disk1 and /disk2. And each of these disk has a directory for data node and
>> a directory for non-datanode usage.
>>
>> /disk1/datanode
>> /disk1/non-datanode
>> /disk2/datanode
>> /disk2/non-datanode
>>
>> The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".
>>
>> With this, what does the DFS and NonDFS indicates? Does it indicates
>> SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?
>>
>> Thanks,
>>
>>
>>
>
>
> --
> http://hortonworks.com/download/
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: DFS Used V/S Non DFS Used

Posted by Manoj Samel <ma...@gmail.com>.
Thanks Suresh - still not clear

Say the "dfs.datanode.du.reserved" is not set (default seems 0). The "non
DFS Used" reported number is non-zero. What does this means ? What is being
referred as "temp files" ? and how can they encroach in the example of
/disk1/datanode, /disk2/datanode etc.

Thanks,

On Fri, Oct 10, 2014 at 2:29 PM, Suresh Srinivas <su...@hortonworks.com>
wrote:

> Here is the information from -
> https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
> Here are the definition of data reported on the Web UI:
> Configured Capacity: Disk space corresponding to all the data directories
> - Reserved space as defined by dfs.datanode.du.reserved
> DFS Used: Space used by DFS
> Non DFS Used: 0 if the temporary files do not exceed reserved space.
> Otherwise this is the size by which temporary files exceed the reserved
> space and encroach into the DFS configured space.
> DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
> DFS Used %: (DFS Used / Configured Capacity) * 100
> DFS Remaining % = (DFS Remaining / Configured Capacity) * 100
>
> On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <ma...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Not clear how this computation is done
>>
>> For sake of discussion Say the machine with data node has two disks
>> /disk1 and /disk2. And each of these disk has a directory for data node and
>> a directory for non-datanode usage.
>>
>> /disk1/datanode
>> /disk1/non-datanode
>> /disk2/datanode
>> /disk2/non-datanode
>>
>> The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".
>>
>> With this, what does the DFS and NonDFS indicates? Does it indicates
>> SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?
>>
>> Thanks,
>>
>>
>>
>
>
> --
> http://hortonworks.com/download/
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: DFS Used V/S Non DFS Used

Posted by Manoj Samel <ma...@gmail.com>.
Thanks Suresh - still not clear

Say the "dfs.datanode.du.reserved" is not set (default seems 0). The "non
DFS Used" reported number is non-zero. What does this means ? What is being
referred as "temp files" ? and how can they encroach in the example of
/disk1/datanode, /disk2/datanode etc.

Thanks,

On Fri, Oct 10, 2014 at 2:29 PM, Suresh Srinivas <su...@hortonworks.com>
wrote:

> Here is the information from -
> https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
> Here are the definition of data reported on the Web UI:
> Configured Capacity: Disk space corresponding to all the data directories
> - Reserved space as defined by dfs.datanode.du.reserved
> DFS Used: Space used by DFS
> Non DFS Used: 0 if the temporary files do not exceed reserved space.
> Otherwise this is the size by which temporary files exceed the reserved
> space and encroach into the DFS configured space.
> DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
> DFS Used %: (DFS Used / Configured Capacity) * 100
> DFS Remaining % = (DFS Remaining / Configured Capacity) * 100
>
> On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <ma...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Not clear how this computation is done
>>
>> For sake of discussion Say the machine with data node has two disks
>> /disk1 and /disk2. And each of these disk has a directory for data node and
>> a directory for non-datanode usage.
>>
>> /disk1/datanode
>> /disk1/non-datanode
>> /disk2/datanode
>> /disk2/non-datanode
>>
>> The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".
>>
>> With this, what does the DFS and NonDFS indicates? Does it indicates
>> SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?
>>
>> Thanks,
>>
>>
>>
>
>
> --
> http://hortonworks.com/download/
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: DFS Used V/S Non DFS Used

Posted by Manoj Samel <ma...@gmail.com>.
Thanks Suresh - still not clear

Say the "dfs.datanode.du.reserved" is not set (default seems 0). The "non
DFS Used" reported number is non-zero. What does this means ? What is being
referred as "temp files" ? and how can they encroach in the example of
/disk1/datanode, /disk2/datanode etc.

Thanks,

On Fri, Oct 10, 2014 at 2:29 PM, Suresh Srinivas <su...@hortonworks.com>
wrote:

> Here is the information from -
> https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
> Here are the definition of data reported on the Web UI:
> Configured Capacity: Disk space corresponding to all the data directories
> - Reserved space as defined by dfs.datanode.du.reserved
> DFS Used: Space used by DFS
> Non DFS Used: 0 if the temporary files do not exceed reserved space.
> Otherwise this is the size by which temporary files exceed the reserved
> space and encroach into the DFS configured space.
> DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
> DFS Used %: (DFS Used / Configured Capacity) * 100
> DFS Remaining % = (DFS Remaining / Configured Capacity) * 100
>
> On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <ma...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Not clear how this computation is done
>>
>> For sake of discussion Say the machine with data node has two disks
>> /disk1 and /disk2. And each of these disk has a directory for data node and
>> a directory for non-datanode usage.
>>
>> /disk1/datanode
>> /disk1/non-datanode
>> /disk2/datanode
>> /disk2/non-datanode
>>
>> The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".
>>
>> With this, what does the DFS and NonDFS indicates? Does it indicates
>> SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?
>>
>> Thanks,
>>
>>
>>
>
>
> --
> http://hortonworks.com/download/
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: DFS Used V/S Non DFS Used

Posted by Suresh Srinivas <su...@hortonworks.com>.
Here is the information from -
https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
Here are the definition of data reported on the Web UI:
Configured Capacity: Disk space corresponding to all the data directories -
Reserved space as defined by dfs.datanode.du.reserved
DFS Used: Space used by DFS
Non DFS Used: 0 if the temporary files do not exceed reserved space.
Otherwise this is the size by which temporary files exceed the reserved
space and encroach into the DFS configured space.
DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
DFS Used %: (DFS Used / Configured Capacity) * 100
DFS Remaining % = (DFS Remaining / Configured Capacity) * 100

On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <ma...@gmail.com>
wrote:

> Hi,
>
> Not clear how this computation is done
>
> For sake of discussion Say the machine with data node has two disks /disk1
> and /disk2. And each of these disk has a directory for data node and a
> directory for non-datanode usage.
>
> /disk1/datanode
> /disk1/non-datanode
> /disk2/datanode
> /disk2/non-datanode
>
> The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".
>
> With this, what does the DFS and NonDFS indicates? Does it indicates
> SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?
>
> Thanks,
>
>
>


-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: DFS Used V/S Non DFS Used

Posted by Suresh Srinivas <su...@hortonworks.com>.
Here is the information from -
https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
Here are the definition of data reported on the Web UI:
Configured Capacity: Disk space corresponding to all the data directories -
Reserved space as defined by dfs.datanode.du.reserved
DFS Used: Space used by DFS
Non DFS Used: 0 if the temporary files do not exceed reserved space.
Otherwise this is the size by which temporary files exceed the reserved
space and encroach into the DFS configured space.
DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
DFS Used %: (DFS Used / Configured Capacity) * 100
DFS Remaining % = (DFS Remaining / Configured Capacity) * 100

On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <ma...@gmail.com>
wrote:

> Hi,
>
> Not clear how this computation is done
>
> For sake of discussion Say the machine with data node has two disks /disk1
> and /disk2. And each of these disk has a directory for data node and a
> directory for non-datanode usage.
>
> /disk1/datanode
> /disk1/non-datanode
> /disk2/datanode
> /disk2/non-datanode
>
> The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".
>
> With this, what does the DFS and NonDFS indicates? Does it indicates
> SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?
>
> Thanks,
>
>
>


-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: DFS Used V/S Non DFS Used

Posted by Suresh Srinivas <su...@hortonworks.com>.
Here is the information from -
https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
Here are the definition of data reported on the Web UI:
Configured Capacity: Disk space corresponding to all the data directories -
Reserved space as defined by dfs.datanode.du.reserved
DFS Used: Space used by DFS
Non DFS Used: 0 if the temporary files do not exceed reserved space.
Otherwise this is the size by which temporary files exceed the reserved
space and encroach into the DFS configured space.
DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
DFS Used %: (DFS Used / Configured Capacity) * 100
DFS Remaining % = (DFS Remaining / Configured Capacity) * 100

On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <ma...@gmail.com>
wrote:

> Hi,
>
> Not clear how this computation is done
>
> For sake of discussion Say the machine with data node has two disks /disk1
> and /disk2. And each of these disk has a directory for data node and a
> directory for non-datanode usage.
>
> /disk1/datanode
> /disk1/non-datanode
> /disk2/datanode
> /disk2/non-datanode
>
> The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".
>
> With this, what does the DFS and NonDFS indicates? Does it indicates
> SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?
>
> Thanks,
>
>
>


-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: DFS Used V/S Non DFS Used

Posted by Suresh Srinivas <su...@hortonworks.com>.
Here is the information from -
https://issues.apache.org/jira/browse/HADOOP-4430?focusedCommentId=12640259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12640259
Here are the definition of data reported on the Web UI:
Configured Capacity: Disk space corresponding to all the data directories -
Reserved space as defined by dfs.datanode.du.reserved
DFS Used: Space used by DFS
Non DFS Used: 0 if the temporary files do not exceed reserved space.
Otherwise this is the size by which temporary files exceed the reserved
space and encroach into the DFS configured space.
DFS Remaining: (Configured Capacity - DFS Used - Non DFS Used)
DFS Used %: (DFS Used / Configured Capacity) * 100
DFS Remaining % = (DFS Remaining / Configured Capacity) * 100

On Fri, Oct 10, 2014 at 2:21 PM, Manoj Samel <ma...@gmail.com>
wrote:

> Hi,
>
> Not clear how this computation is done
>
> For sake of discussion Say the machine with data node has two disks /disk1
> and /disk2. And each of these disk has a directory for data node and a
> directory for non-datanode usage.
>
> /disk1/datanode
> /disk1/non-datanode
> /disk2/datanode
> /disk2/non-datanode
>
> The dfs.datanode.data.dir says "/disk1/datanode,/disk2/datanode".
>
> With this, what does the DFS and NonDFS indicates? Does it indicates
> SUM(/disk*/datanode) & SUM(/disk*/non-datanode) etc. resp. ?
>
> Thanks,
>
>
>


-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.