You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Igor Bolotin (JIRA)" <ji...@apache.org> on 2009/03/18 06:08:50 UTC
[jira] Created: (HADOOP-5523) Datanode stops cleaning disk space
Datanode stops cleaning disk space
----------------------------------
Key: HADOOP-5523
URL: https://issues.apache.org/jira/browse/HADOOP-5523
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Affects Versions: 0.19.0
Environment: Linux
Reporter: Igor Bolotin
Priority: Critical
Here is the situation - DFS cluster running Hadoop version 0.19.0. The cluster is running on multiple servers with practically identical hardware.
Everything works perfectly well, except for one thing - from time to time one of the data nodes (every time it's a different node) starts to consume more and more disk space. The node keeps going and if we don't do anything - it runs out of space completely (ignoring 20GB reserved space settings).
Once restarted - it cleans disk rapidly and goes back to approximately the same utilization as the rest of data nodes in the cluster.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HADOOP-5523) Datanode stops cleaning disk space
Posted by "Igor Bolotin (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HADOOP-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682916#action_12682916 ]
Igor Bolotin commented on HADOOP-5523:
--------------------------------------
DF and DU sizes on the datanode match very closely with information reported by dfsadmin command.
Lsof reports some 1000 open files in DFS data directories on the problematic datanode, but total size for open files is only about 10GB.
Here is something interesting - fsck before datanode restart reports very significant number of over-replicated blocks (~10% of blocks are over-replicated):
Status: HEALTHY
Total size: 1472758591906 B (Total open files size: 29050588133 B)
Total dirs: 58431
Total files: 375703 (Files currently being written: 418)
Total blocks (validated): 387205 (avg. block size 3803562 B) (Total open file blocks (not validated): 595)
Minimally replicated blocks: 387205 (100.0 %)
Over-replicated blocks: 38782 (10.015883 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.1003888
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 7
Number of racks: 1
After datanode restart - over-replicated nodes are practically gone:
Status: HEALTHY
Total size: 1310669475298 B (Total open files size: 29535016933 B)
Total dirs: 59431
Total files: 377177 (Files currently being written: 387)
Total blocks (validated): 386661 (avg. block size 3389712 B) (Total open file blocks (not validated): 607)
Minimally replicated blocks: 386661 (100.0 %)
Over-replicated blocks: 272 (0.070345856 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0007036
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 7
Number of racks: 1
> Datanode stops cleaning disk space
> ----------------------------------
>
> Key: HADOOP-5523
> URL: https://issues.apache.org/jira/browse/HADOOP-5523
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.19.0
> Environment: Linux
> Reporter: Igor Bolotin
> Priority: Critical
>
> Here is the situation - DFS cluster running Hadoop version 0.19.0. The cluster is running on multiple servers with practically identical hardware.
> Everything works perfectly well, except for one thing - from time to time one of the data nodes (every time it's a different node) starts to consume more and more disk space. The node keeps going and if we don't do anything - it runs out of space completely (ignoring 20GB reserved space settings).
> Once restarted - it cleans disk rapidly and goes back to approximately the same utilization as the rest of data nodes in the cluster.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.