You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by ar...@apache.org on 2016/02/25 01:50:42 UTC

[29/31] hadoop git commit: HDFS-9843. Document distcp options required for copying between encrypted locations. Contributed by Xiaoyu Yao.

HDFS-9843. Document distcp options required for copying between encrypted locations. Contributed by Xiaoyu Yao.


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/dbbfc58c
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/dbbfc58c
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/dbbfc58c

Branch: refs/heads/HDFS-1312
Commit: dbbfc58c33fd1d2f7abae1784c2d78b7438825e2
Parents: 47b92f2
Author: Chris Nauroth <cn...@apache.org>
Authored: Wed Feb 24 15:16:05 2016 -0800
Committer: Chris Nauroth <cn...@apache.org>
Committed: Wed Feb 24 15:16:05 2016 -0800

----------------------------------------------------------------------
 hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt                  | 3 +++
 .../hadoop-hdfs/src/site/markdown/TransparentEncryption.md   | 8 ++++----
 2 files changed, 7 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/dbbfc58c/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
index c31e768..08a270d 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@@ -1965,6 +1965,9 @@ Release 2.8.0 - UNRELEASED
     HDFS-9854. Log cipher suite negotiation more verbosely
     (Wei-Chiu Chuang via cnauroth)
 
+    HDFS-9843. Document distcp options required for copying between encrypted
+    locations. (Xiaoyu Yao via cnauroth)
+
   OPTIMIZATIONS
 
     HDFS-8026. Trace FSOutputSummer#writeChecksumChunks rather than

http://git-wip-us.apache.org/repos/asf/hadoop/blob/dbbfc58c/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/TransparentEncryption.md
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/TransparentEncryption.md b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/TransparentEncryption.md
index 8314ed5..3b04255 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/TransparentEncryption.md
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/TransparentEncryption.md
@@ -32,7 +32,7 @@ Transparent Encryption in HDFS
 * [Example usage](#Example_usage)
 * [Distcp considerations](#Distcp_considerations)
     * [Running as the superuser](#Running_as_the_superuser)
-    * [Copying between encrypted and unencrypted locations](#Copying_between_encrypted_and_unencrypted_locations)
+    * [Copying into encrypted locations](#Copying_into_encrypted_locations)
 * [Rename and Trash considerations](#Rename_and_Trash_considerations)
 * [Attack vectors](#Attack_vectors)
     * [Hardware access exploits](#Hardware_access_exploits)
@@ -207,11 +207,11 @@ One common usecase for distcp is to replicate data between clusters for backup a
 
 To enable this same workflow when using HDFS encryption, we introduced a new virtual path prefix, `/.reserved/raw/`, that gives superusers direct access to the underlying block data in the filesystem. This allows superusers to distcp data without needing having access to encryption keys, and also avoids the overhead of decrypting and re-encrypting data. It also means the source and destination data will be byte-for-byte identical, which would not be true if the data was being re-encrypted with a new EDEK.
 
-When using `/.reserved/raw` to distcp encrypted data, it's important to preserve extended attributes with the [-px](#a-px) flag. This is because encrypted file attributes (such as the EDEK) are exposed through extended attributes within `/.reserved/raw`, and must be preserved to be able to decrypt the file. This means that if the distcp is initiated at or above the encryption zone root, it will automatically create an encryption zone at the destination if it does not already exist. However, it's still recommended that the admin first create identical encryption zones on the destination cluster to avoid any potential mishaps.
+When using `/.reserved/raw` to distcp encrypted data, it's important to preserve extended attributes with the [-px](../../hadoop-distcp/DistCp.html#Command_Line_Options) flag. This is because encrypted file attributes (such as the EDEK) are exposed through extended attributes within `/.reserved/raw`, and must be preserved to be able to decrypt the file. This means that if the distcp is initiated at or above the encryption zone root, it will automatically create an encryption zone at the destination if it does not already exist. However, it's still recommended that the admin first create identical encryption zones on the destination cluster to avoid any potential mishaps.
 
-### <a name="Copying_between_encrypted_and_unencrypted_locations"></a>Copying between encrypted and unencrypted locations
+### <a name="Copying_into_encrypted_locations"></a>Copying into encrypted locations
 
-By default, distcp compares checksums provided by the filesystem to verify that the data was successfully copied to the destination. When copying between an unencrypted and encrypted location, the filesystem checksums will not match since the underlying block data is different. In this case, specify the [-skipcrccheck](#a-skipcrccheck) and [-update](#a-update) distcp flags to avoid verifying checksums.
+By default, distcp compares checksums provided by the filesystem to verify that the data was successfully copied to the destination. When copying from unencrypted or encrypted location into an encrypted location, the filesystem checksums will not match since the underlying block data is different because a new EDEK will be used to encrypt at destination. In this case, specify the [-skipcrccheck](../../hadoop-distcp/DistCp.html#Command_Line_Options) and [-update](../../hadoop-distcp/DistCp.html#Command_Line_Options) distcp flags to avoid verifying checksums.
 
 <a name="Rename_and_Trash_considerations"></a>Rename and Trash considerations
 ---------------------