You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2014/05/01 23:26:24 UTC
[jira] [Commented] (HADOOP-10150) Hadoop cryptographic file system

    [ https://issues.apache.org/jira/browse/HADOOP-10150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987015#comment-13987015 ] 

Owen O'Malley commented on HADOOP-10150:
----------------------------------------

I've been working through this. We have two metadata items that we need for each file:
* the key name and version
* the iv
Note that the current patches only store the iv, but we really need to store the key name and version. The version is absolutely critical because if you roll a new key version you don't want to re-write all of the current data.

It seems to me there are three reasonable places to store the small amount of metadata:
* at the beginning of the file
* in a side file
* encoded using a filename mangling scheme

The beginning of the file creates trouble because it throws off the block calculations that are done by mapreduce. (In other words, if we slide all of the data down by 1k, then each input split will always cross HDFS block boundaries.) On the other hand, it doesn't add any load to the namenode and will always be consistent with the file.

A side file doesn't change the offsets into the file, but does double the amount of traffic and storage required on the namenode.

Doing name mangling means the underlying HDFS file names are more complicated, but it doesn't mess with either the file offsets or increase the load on the namenode.

I think we should do the name mangling. What do others think?


> Hadoop cryptographic file system
> --------------------------------
>
>                 Key: HADOOP-10150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10150
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 3.0.0
>            Reporter: Yi Liu
>            Assignee: Yi Liu
>              Labels: rhino
>             Fix For: 3.0.0
>
>         Attachments: CryptographicFileSystem.patch, HADOOP cryptographic file system-V2.docx, HADOOP cryptographic file system.pdf, HDFSDataAtRestEncryptionAlternatives.pdf, HDFSDataatRestEncryptionAttackVectors.pdf, HDFSDataatRestEncryptionProposal.pdf, cfs.patch, extended information based on INode feature.patch
>
>
> There is an increasing need for securing data when Hadoop customers use various upper layer applications, such as Map-Reduce, Hive, Pig, HBase and so on.
> HADOOP CFS (HADOOP Cryptographic File System) is used to secure data, based on HADOOP “FilterFileSystem” decorating DFS or other file systems, and transparent to upper layer applications. It’s configurable, scalable and fast.
> High level requirements:
> 1.	Transparent to and no modification required for upper layer applications.
> 2.	“Seek”, “PositionedReadable” are supported for input stream of CFS if the wrapped file system supports them.
> 3.	Very high performance for encryption and decryption, they will not become bottleneck.
> 4.	Can decorate HDFS and all other file systems in Hadoop, and will not modify existing structure of file system, such as namenode and datanode structure if the wrapped file system is HDFS.
> 5.	Admin can configure encryption policies, such as which directory will be encrypted.
> 6.	A robust key management framework.
> 7.	Support Pread and append operations if the wrapped file system supports them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)