You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2006/08/15 04:23:35 UTC

[Lucene-hadoop Wiki] Update of "DFS requirements" by KonstantinShvachko

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by KonstantinShvachko:
http://wiki.apache.org/lucene-hadoop/DFS_requirements

------------------------------------------------------------------------------
  
   1. '''Re-factoring.''' Develop abstractions for DFS components with each component represented by an interface, specifying its functionality and interaction with other components. With good abstractions, it should be easy to add new features without compromising reliability. The abstractions should be evaluated with required future features in mind. [[BR]] ~-For example, data nodes might have a block transfer object, a block receive object, etc., with carefully defined behavior, coordinated by a top-level control structure, instead of the morass of methods in the data node at present.-~
   2. (Reliability) '''Robust name node checkpointing''' and namespace edits logging. [[BR]] ''Currently the system is not restorable in case of name node hardware failure.'' [[BR]] DFS should store “image” and “edits” files on a local name node disk and replicate them on backup nodes using a simple streaming protocol.
+  3. (Reliability) Define the '''startup process''', what is done by each component, in which order. Introduce a concept of '''“safe mode”''', which would not make any block replication/removal decisions or change the state of the namespace in any way. Name node stays in safe mode until a configurable number of nodes have been started and reported to the name node a configurable percentage of data blocks. [[BR]][http://issues.apache.org/jira/browse/HADOOP-306 HADOOP-306], [http://issues.apache.org/jira/browse/HADOOP-250 HADOOP-250] ''In progress''.
-  3. (Reliability) Define the '''startup process''', what is done by each component, in which order. Introduce a concept of '''“safe mode”''', which would not make any block replication/removal decisions or change the state of the namespace in any way.
-    a. Name node stays in safe mode until a configurable number of nodes have been started and reported to the name node a configurable percentage of data blocks.
-  4. (Reliability) The name node '''checkpoint should store a list of data nodes''' serving distinct data storages that ever reported to the name node. Namely, the following is stored for each data node in the cluster:  [[BR]] <host:port; storageID; time of last heartbeat; user id>. [[BR]] Missing nodes should be reported in the DFS UI, and during the startup. See also 3.a.
+  4. (Reliability) The name node '''checkpoint should store a list of data nodes''' serving distinct data storages that ever reported to the name node. Namely, the following is stored for each data node in the cluster:  [[BR]] <host:port; storageID; time of last heartbeat; user id>. [[BR]] Missing nodes should be reported in the DFS UI, and during the startup. See also 3.a. [[BR]][http://issues.apache.org/jira/browse/HADOOP-306 HADOOP-306],  ''In progress''.
-  5. (Reliability) Nodes with '''read only disks''' should report the problem to the name node and shut themselves down if all their local disks are unavailable.
+  5. (Reliability) Nodes with '''read only disks''' should report the problem to the name node and shut themselves down if all their local disks are unavailable. [[BR]][http://issues.apache.org/jira/browse/HADOOP-163 HADOOP-163] __Done__.
+  6. (Specification) Define '''recovery/failover and software upgrade procedures'''. 
+     a. The recovery of the cluster is manual; a document describing steps for the cluster safe recovery after a name node failure is desired.
+     a. Based on the recovery procedures estimate the downtime of the cluster when the name node fails.
+     a. A document is needed describing general procedures required to transition DFS from one software version to another.
   6. (Reliability) The name node should boost the '''priority of re-replicating blocks''' that are far from their replication target. If necessary it should delay requests for new blocks, opening files etc., in favor of re-replicating blocks that are close to being lost forever.
   7. (Functionality) Currently DFS supports exclusive on create only '''file appends'''. We need more general appends that would allow re-opening files for appending. Our plan is to implement it in two steps:
      a. Exclusive appends.
-     a. Concurrent appends.
+     a. Concurrent appends. [[BR]][http://issues.apache.org/jira/browse/HADOOP-337 HADOOP-337]
-  8. (Functionality) Support for '''“truncate”''' operation. [[BR]] ''This is a new functionality that is not currently supported by DFS.''
+  8. (Functionality) Support for '''“truncate”''' operation. [[BR]] ''This is a new functionality that is not currently supported by DFS.'' [[BR]][http://issues.apache.org/jira/browse/HADOOP-337 HADOOP-337]
   9. (Functionality) '''Configuration''':
-     a. Accepting/rejecting rules for hosts and users based on regular expressions. The string that is matched against the regular expression should include the host, user, and cluster names.
+     a. Accepting/rejecting rules for hosts and users based on regular expressions. The string that is matched against the regular expression should include the host, user, and cluster names. [[BR]][http://issues.apache.org/jira/browse/HADOOP-442 HADOOP-442]
   10. (Functionality) '''DFS browsing UI.''' [[BR]] ''Currently DFS has a rather primitive UI.'' [[BR]] The UI should
-     a. Let browse the file system going down to each file, each file block, and further down to the block replicas.
-     a. Report status of each directory, file, block, and block replica.
-     a. Show list of data nodes, their status, and non-operational nodes (see 4).
+     a. Let browse the file system going down to each file, each file block, and further down to the block replicas. [[BR]] [http://issues.apache.org/jira/browse/HADOOP-347 HADOOP-347] , [http://issues.apache.org/jira/browse/HADOOP-392 HADOOP-392] __Done__.
+     a. Report status of each directory, file, block, and block replica. [[BR]][http://issues.apache.org/jira/browse/HADOOP-347 HADOOP-347] __Done__.
+     a. Show list of data nodes, their status, and non-operational nodes (see 4). [[BR]][http://issues.apache.org/jira/browse/HADOOP-250 HADOOP-250] __Done__.
      a. Show data node configuration and its extended status.
      a. List data node blocks and file names they belong to.
      a. Report the name node configuration parameters.
      a. History of data node failures, restarts, etc.
-  11. (Scalability) Nodes with '''multiple disks''' should maintain local disks data distribution internally.
+  11. (Scalability) Nodes with '''multiple disks''' should maintain local disks data distribution internally. [[BR]][http://issues.apache.org/jira/browse/HADOOP-64 HADOOP-64] ''In progress''.
   12. (Scalability) '''Select-based communication''' for the DFS name node.
   13. (Functionality) Currently, if we want to remove x nodes from the DFS cluster, we need to remove them at most two at a time, and wait until re-replication happens, and there's no feedback on that. It would be good to specify a list of nodes to remove, have their data re-replicated while they're still online, and get a confirmation on completion.
   14. (Specification) Define '''invariants for read and append''' commands. A formalization of DFS consistency model with underlying assumptions and the result guarantees.
@@ -111, +114 @@

      a. The "original" file name or id, an invariant preserved during file renames, could be stored in an additional file associated with each block, like the crc-file (see 15).
      a. Block offset (the block sequence number) could be encoded as a part of the block id, e.g., [[BR]] {{{<block id> = <random unique per file #><block sequence number>}}}
      a. Adding concurrent file append and truncate features will require a block generation number to be stored as a part of the block file name.
-  * (Specification) Define '''recovery/failover and software upgrade procedures'''. 
-     a. The recovery of the cluster is manual; a document describing steps for the cluster safe recovery after a name node failure is desired.
-     a. Based on the recovery procedures estimate the downtime of the cluster when the name node fails.
-     a. A document is needed describing general procedures required to transition DFS from one software version to another.
   * Design a '''DFS backup scheme'''. [[BR]] ~-The backup is intended to prevent from data loss related to file system software bugs, particularly during the system upgrades.-~ [[BR]] ~-The backup might not need to store the entire data set; some applications require just a fraction of critical data so that the rest can be effectively restored.-~