You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by ar...@apache.org on 2015/06/20 22:11:10 UTC

[1/3] hadoop git commit: HDFS-7164. Feature documentation for HDFS-6581. (Contributed by Arpit Agarwal)

Repository: hadoop
Updated Branches:
  refs/heads/branch-2 ee63c3e83 -> 5dcda1685
  refs/heads/branch-2.7 0adb6d829 -> 25b99e481
  refs/heads/trunk 055cd5a9a -> 658b5c84a


HDFS-7164. Feature documentation for HDFS-6581. (Contributed by Arpit Agarwal)


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/25b99e48
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/25b99e48
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/25b99e48

Branch: refs/heads/branch-2.7
Commit: 25b99e4816098e6757e095638af13e7d0cc1a4c9
Parents: 0adb6d8
Author: Arpit Agarwal <ar...@apache.org>
Authored: Sat Jun 20 13:08:18 2015 -0700
Committer: Arpit Agarwal <ar...@apache.org>
Committed: Sat Jun 20 13:08:18 2015 -0700

----------------------------------------------------------------------
 hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt     |   4 +-
 .../site/markdown/CentralizedCacheManagement.md |   2 +
 .../src/site/markdown/MemoryStorage.md          | 114 +++++++++++++++++++
 .../site/resources/images/LazyPersistWrites.png | Bin 0 -> 107161 bytes
 hadoop-project/src/site/site.xml                |   1 +
 5 files changed, 119 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/25b99e48/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
index 17e906d..951a04b 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@@ -1,5 +1,3 @@
-Hadoop HDFS Change Log
-
 Release 2.7.1 - UNRELEASED
 
   INCOMPATIBLE CHANGES
@@ -30,6 +28,8 @@ Release 2.7.1 - UNRELEASED
     HDFS-7546. Document, and set an accepting default for
     dfs.namenode.kerberos.principal.pattern (Harsh J via aw)
 
+    HDFS-7164. Feature documentation for HDFS-6581. (Arpit Agarwal)
+
   OPTIMIZATIONS
 
   BUG FIXES

http://git-wip-us.apache.org/repos/asf/hadoop/blob/25b99e48/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/CentralizedCacheManagement.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/CentralizedCacheManagement.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/CentralizedCacheManagement.md
index b4f08c8..72c125d 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/CentralizedCacheManagement.md
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/CentralizedCacheManagement.md
@@ -233,6 +233,8 @@ Be sure to configure the following:
 
     This determines the maximum amount of memory a DataNode will use for caching. On Unix-like systems, the "locked-in-memory size" ulimit (`ulimit -l`) of the DataNode user also needs to be increased to match this parameter (see below section on [OS Limits](#OS_Limits)). When setting this value, please remember that you will need space in memory for other things as well, such as the DataNode and application JVM heaps and the operating system page cache.
 
+    This setting is shared with the [Lazy Persist Writes feature](./MemoryStorage.html). The Data Node will ensure that the combined memory used by Lazy Persist Writes and Centralized Cache Management does not exceed the amount configured in `dfs.datanode.max.locked.memory`.
+
 #### Optional
 
 The following properties are not required, but may be specified for tuning:

http://git-wip-us.apache.org/repos/asf/hadoop/blob/25b99e48/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/MemoryStorage.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/MemoryStorage.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/MemoryStorage.md
new file mode 100644
index 0000000..97e44e3
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/MemoryStorage.md
@@ -0,0 +1,114 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Memory Storage Support in HDFS
+==============================
+
+* [Introduction](#Introduction)
+* [Administrator Configuration](#Administrator_Configuration)
+    * [Setup RAM Disks on Data Nodes](#Setup_RAM_Disks_on_Data_Nodes)
+        * [Choosing `tmpfs` \(vs `ramfs`\)](#Choosing_`tmpfs`_\(vs_`ramfs`\))
+        * [Mount RAM Disks](#Mount_RAM_Disks)
+        * [Tag `tmpfs` volume with the RAM\_DISK Storage Type](#Tag_`tmpfs`_volume_with_the_RAM\_DISK_Storage_Type)
+        * [Ensure Storage Policies are enabled](#Ensure_Storage_Policies_are_enabled)
+* [Application Usage](#Application_Usage)
+    * [Use the LAZY\_PERSIST Storage Policy](#Use_the_LAZY\_PERSIST_Storage_Policy)
+        * [Invoke `hdfs storagepolicies` command for directories](#Invoke_hdfs_storagepolicies_command_for_directories)
+        * [Call `setStoragePolicy` method for directories](#Call_`setStoragePolicy`_method_for_directories)
+        * [Pass `LAZY_PERSIST` `CreateFlag` for new files](#Pass_`LAZY_PERSIST`_`CreateFlag`_for_new_files)
+
+Introduction
+------------
+
+HDFS supports writing to off-heap memory managed by the Data Nodes. The Data Nodes will flush in-memory data to disk asynchronously thus removing expensive disk IO and checksum computations from the performance-sensitive IO path, hence we call such writes *Lazy Persist* writes. HDFS provides best-effort persistence guarantees for Lazy Persist Writes. Rare data loss is possible in the event of a node restart before replicas are persisted to disk. Applications can choose to use Lazy Persist Writes to trade off some durability guarantees in favor of reduced latency.
+
+This feature is available starting with Apache Hadoop 2.6.0 and was developed under Jira [HDFS-6581](https://issues.apache.org/jira/browse/HDFS-6581).
+
+![Lazy Persist Writes](images/LazyPersistWrites.png)
+
+The target use cases are applications that would benefit from writing relatively low amounts of data (from a few GB up to tens of GBs depending on available memory) with low latency. Memory storage is for applications that run within the cluster and collocated with HDFS Data Nodes. We have observed that the latency overhead from network replication negates the benefits of writing to memory.
+
+Applications that use Lazy Persist Writes will continue to work by falling back to DISK storage if memory is insufficient or unconfigured.
+
+Administrator Configuration
+---------------------------
+
+This section enumerates the administrative steps required before applications can start using the feature in a cluster.
+
+## Setup RAM Disks on Data Nodes
+
+Initialize a RAM disk on each Data Node. The choice of RAM Disk allows better data persistence across Data Node process restarts. The following setup will work on most Linux distributions. Using RAM disks on other platforms is not currently supported.
+
+### Choosing `tmpfs` \(vs `ramfs`\)
+
+Linux supports using two kinds of RAM disks - `tmpfs` and `ramfs`. The size of `tmpfs` is limited by the Linux kernel while `ramfs` grows to fill all available system memory. There is a downside to `tmpfs` since its contents can be swapped to disk under memory pressure. However many performance-sensitive deployments run with swapping disabled so we do not expect this to be an issue in practice.
+
+HDFS currently supports using `tmpfs` partitions. Support for adding `ramfs` is in progress (See [HDFS-8584](https://issues.apache.org/jira/browse/HDFS-8584)).
+
+### Mount RAM Disks
+
+Mount the RAM Disk partition with the Unix `mount` command. E.g. to mount a 32 GB `tmpfs` partition under `/mnt/dn-tmpfs/`
+
+        sudo mount -t tmpfs -o size=32g tmpfs /mnt/dn-tmpfs/
+
+It is recommended you create an entry in the `/etc/fstab` so the RAM Disk is recreated automatically on node restarts. Another option is to use a sub-directory under `/dev/shm` which is a `tmpfs` mount available by default on most Linux distributions. Ensure that the size of the mount is greater than or equal to your `dfs.datanode.max.locked.memory` setting else override it in `/etc/fstab`. Using more than one `tmpfs` partition per Data Node for Lazy Persist Writes is not recommended.
+
+### Tag `tmpfs` volume with the RAM\_DISK Storage Type
+
+Tag the `tmpfs` directory with the RAM_DISK storage type via the `dfs.datanode.data.dir` configuration setting in `hdfs-site.xml`. E.g. On a Data Node with three hard disk volumes `/grid/0`, `/grid/1` and `/grid/2` and a `tmpfs` mount `/mnt/dn-tmpfs`, `dfs.datanode.data.dir` must be set as follows:
+
+        <property>
+          <name>dfs.datanode.data.dir</name>
+          <value>/grid/0,/grid/1,/grid/2,[RAM_DISK]/mnt/dn-tmpfs</value>
+        </property>
+
+This step is crucial. Without the RAM_DISK tag, HDFS will treat the `tmpfs` volume as non-volatile storage and data will not be saved to persistent storage. You will lose data on node restart.
+
+### Ensure Storage Policies are enabled
+
+Ensure that the global setting to turn on Storage Policies is enabled [as documented here](ArchivalStorage.html#Configuration). This setting is on by default.
+
+
+Application Usage
+-----------------
+
+## Use the LAZY\_PERSIST Storage Policy
+
+Applications indicate that HDFS can use Lazy Persist Writes for a file with the `LAZY_PERSIST` storage policy. Administrative privileges are *not* required to set the policy and it can be set in one of three ways.
+
+### Invoke `hdfs storagepolicies` command for directories
+
+Setting the policy on a directory causes it to take effect for all new files in the directory. The `hdfs storagepolicies` command can be used to set the policy as described in the [Storage Policies documentation](ArchivalStorage.html#Storage_Policy_Commands).
+
+        hdfs storagepolicies -setStoragePolicy -path <path> -policy LAZY_PERSIST
+
+### Call `setStoragePolicy` method for directories
+
+Starting with Apache Hadoop 2.8.0, an application can programmatically set the Storage Policy with `FileSystem.setStoragePolicy`. E.g.
+
+        fs.setStoragePolicy(path, "LAZY_PERSIST");
+
+### Pass `LAZY_PERSIST` `CreateFlag` for new files
+
+An application can pass `CreateFlag#LAZY_PERSIST` when creating a new file with `FileSystem#create` API. E.g.
+
+        FSDataOutputStream fos =
+            fs.create(
+                path,
+                FsPermission.getFileDefault(),
+                EnumSet.of(CreateFlag.CREATE, CreateFlag.LAZY_PERSIST),
+                bufferLength,
+                replicationFactor,
+                blockSize,
+                null);

http://git-wip-us.apache.org/repos/asf/hadoop/blob/25b99e48/hadoop-hdfs-project/hadoop-hdfs/src/site/resources/images/LazyPersistWrites.png
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/resources/images/LazyPersistWrites.png b/hadoop-hdfs-project/hadoop-hdfs/src/site/resources/images/LazyPersistWrites.png
new file mode 100644
index 0000000..b2bde93
Binary files /dev/null and b/hadoop-hdfs-project/hadoop-hdfs/src/site/resources/images/LazyPersistWrites.png differ

http://git-wip-us.apache.org/repos/asf/hadoop/blob/25b99e48/hadoop-project/src/site/site.xml
----------------------------------------------------------------------
diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml
index 5bbce8a..9fa7e4d 100644
--- a/hadoop-project/src/site/site.xml
+++ b/hadoop-project/src/site/site.xml
@@ -96,6 +96,7 @@
       <item name="Transparent Encryption" href="hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html"/>
       <item name="HDFS Support for Multihoming" href="hadoop-project-dist/hadoop-hdfs/HdfsMultihoming.html"/>
       <item name="Archival Storage, SSD &amp; Memory" href="hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html"/>
+      <item name="Memory Storage Support" href="hadoop-project-dist/hadoop-hdfs/MemoryStorage.html"/>
     </menu>
 
     <menu name="MapReduce" inherit="top">


[3/3] hadoop git commit: HDFS-7164. Update CHANGES.txt

Posted by ar...@apache.org.
HDFS-7164. Update CHANGES.txt


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/5dcda168
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/5dcda168
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/5dcda168

Branch: refs/heads/branch-2
Commit: 5dcda168596d17f601bdb16b4837e8ba5dcd1d56
Parents: ee63c3e
Author: Arpit Agarwal <ar...@apache.org>
Authored: Sat Jun 20 13:09:21 2015 -0700
Committer: Arpit Agarwal <ar...@apache.org>
Committed: Sat Jun 20 13:09:21 2015 -0700

----------------------------------------------------------------------
 hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/5dcda168/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
index 82b8eb3..f4b68d6 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@@ -295,8 +295,6 @@ Release 2.8.0 - UNRELEASED
     HDFS-8606. Cleanup DFSOutputStream by removing unwanted changes
     from HDFS-8386. (Rakesh R via szetszwo)
 
-    HDFS-7164. Feature documentation for HDFS-6581. (Arpit Agarwal)
-
     HDFS-8238. Move ClientProtocol to the hdfs-client.
     (Takanobu Asanuma via wheat9)
 
@@ -626,6 +624,8 @@ Release 2.7.1 - UNRELEASED
     HDFS-7546. Document, and set an accepting default for
     dfs.namenode.kerberos.principal.pattern (Harsh J via aw)
 
+    HDFS-7164. Feature documentation for HDFS-6581. (Arpit Agarwal)
+
   OPTIMIZATIONS
 
   BUG FIXES


[2/3] hadoop git commit: HDFS-7164. Update CHANGES.txt

Posted by ar...@apache.org.
HDFS-7164. Update CHANGES.txt


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/658b5c84
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/658b5c84
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/658b5c84

Branch: refs/heads/trunk
Commit: 658b5c84ae03b38293bea069efd2e5fd2922d74a
Parents: 055cd5a
Author: Arpit Agarwal <ar...@apache.org>
Authored: Sat Jun 20 13:08:52 2015 -0700
Committer: Arpit Agarwal <ar...@apache.org>
Committed: Sat Jun 20 13:08:52 2015 -0700

----------------------------------------------------------------------
 hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/658b5c84/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
index 882bf3f..aad3c25 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@@ -635,8 +635,6 @@ Release 2.8.0 - UNRELEASED
     HDFS-8606. Cleanup DFSOutputStream by removing unwanted changes
     from HDFS-8386. (Rakesh R via szetszwo)
 
-    HDFS-7164. Feature documentation for HDFS-6581. (Arpit Agarwal)
-
     HDFS-9608. Merge HDFS-7912 to trunk and branch-2 (track BlockInfo instead
     of Block in UnderReplicatedBlocks and PendingReplicationBlocks).
     (Zhe Zhang via wang)
@@ -967,6 +965,8 @@ Release 2.7.1 - UNRELEASED
     HDFS-7546. Document, and set an accepting default for
     dfs.namenode.kerberos.principal.pattern (Harsh J via aw)
 
+    HDFS-7164. Feature documentation for HDFS-6581. (Arpit Agarwal)
+
   OPTIMIZATIONS
 
   BUG FIXES