You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Igor Bolotin (JIRA)" <ji...@apache.org> on 2009/03/18 06:08:50 UTC

[jira] Created: (HADOOP-5523) Datanode stops cleaning disk space

Datanode stops cleaning disk space
----------------------------------

                 Key: HADOOP-5523
                 URL: https://issues.apache.org/jira/browse/HADOOP-5523
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.19.0
         Environment: Linux
            Reporter: Igor Bolotin
            Priority: Critical


Here is the situation - DFS cluster running Hadoop version 0.19.0. The cluster is running on multiple servers with practically identical hardware. 
Everything works perfectly well, except for one thing - from time to time one of the data nodes (every time it's a different node) starts to consume more and more disk space. The node keeps going and if we don't do anything - it runs out of space completely (ignoring 20GB reserved space settings). 
Once restarted - it cleans disk rapidly and goes back to approximately the same utilization as the rest of data nodes in the cluster.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-5523) Datanode stops cleaning disk space

Posted by "Igor Bolotin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682916#action_12682916 ] 

Igor Bolotin commented on HADOOP-5523:
--------------------------------------

DF and DU sizes on the datanode match very closely with information reported by dfsadmin command. 
Lsof reports some 1000 open files in DFS data directories on the problematic datanode, but total size for open files is only about 10GB.

Here is something interesting - fsck before datanode restart reports very significant number of over-replicated blocks (~10% of blocks are over-replicated):

Status: HEALTHY
 Total size:    1472758591906 B (Total open files size: 29050588133 B)                                                      
 Total dirs:    58431                                                                                                       
 Total files:   375703 (Files currently being written: 418)                                                                 
 Total blocks (validated):      387205 (avg. block size 3803562 B) (Total open file blocks (not validated): 595)            
 Minimally replicated blocks:   387205 (100.0 %)                                                                            
 Over-replicated blocks:        38782 (10.015883 %)                                                                         
 Under-replicated blocks:       0 (0.0 %)                                                                                   
 Mis-replicated blocks:         0 (0.0 %)                                                                                   
 Default replication factor:    3                                                                                           
 Average block replication:     3.1003888                                                                                   
 Corrupt blocks:                0                                                                                           
 Missing replicas:              0 (0.0 %)                                                                                   
 Number of data-nodes:          7                                                                                           
 Number of racks:               1                                                                                           

After datanode restart - over-replicated nodes are practically gone:

Status: HEALTHY
 Total size:    1310669475298 B (Total open files size: 29535016933 B)
 Total dirs:    59431
 Total files:   377177 (Files currently being written: 387)
 Total blocks (validated):      386661 (avg. block size 3389712 B) (Total open file blocks (not validated): 607)
 Minimally replicated blocks:   386661 (100.0 %)
 Over-replicated blocks:        272 (0.070345856 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0007036
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          7
 Number of racks:               1


> Datanode stops cleaning disk space
> ----------------------------------
>
>                 Key: HADOOP-5523
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5523
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.0
>         Environment: Linux
>            Reporter: Igor Bolotin
>            Priority: Critical
>
> Here is the situation - DFS cluster running Hadoop version 0.19.0. The cluster is running on multiple servers with practically identical hardware. 
> Everything works perfectly well, except for one thing - from time to time one of the data nodes (every time it's a different node) starts to consume more and more disk space. The node keeps going and if we don't do anything - it runs out of space completely (ignoring 20GB reserved space settings). 
> Once restarted - it cleans disk rapidly and goes back to approximately the same utilization as the rest of data nodes in the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.