You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Alejandro Abdelnur (JIRA)" <ji...@apache.org> on 2014/07/01 06:51:25 UTC

[jira] [Comment Edited] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048355#comment-14048355 ] 

Alejandro Abdelnur edited comment on MAPREDUCE-5890 at 7/1/14 4:50 AM:
-----------------------------------------------------------------------

[~chris.douglas],
I had initially tried to directly modify the {{IFile}} format to handle the iv. The reason I felt this would not be such a clean solution is :
* The {{IFile}} currently does not have a notion of an explicit header/metadata.
* While it is possible to use the {{IFile.Writer}} constructor to write the IV and (thus make it transparent to the rest of the code-base). The reading code-path is not so straight-forward. There are two classes that extend the {{IFile.Reader}} ({{InMemoryReader}} and {{RawKVIteratorReader}}). The {{InMemoryReader}} totally ignores the inputStream that is initialized in the base class constructor and there are places in the codeBase that the input stream is not initialized in the Reader but in the {{Segment::init()}} method (which in my opinion makes the {{IFile}} abstraction a bit leaky since the underlying stream should be handled in its entirity in the IFile Writer/Reader.. the {{Segment}} class (which is part of the {{Merger}} framework) should avoid dealing with the internals of the ).
* Also, I was not able to do away with a lot of if-then checks in the Shuffle phase... (another instance of leaky abstraction mentioned in the previous point), the implementations of {{MapOutput::shuffle}} method creates {{IFileInputStreams}} directly without an associated {{IFile.Reader}}


was (Author: asuresh):
[~chris.douglas],
I had initially tried to directly modify the {{IFile}} format to handle the iv. The reason I felt this would not be such a clean solution is :
* The {{IFile}} currently does not have a notion of an explicit header/metadata.
* While it is possible to use the {{IFile.Writer}} constructor to write the IV and (thus make it transparent to the rest of the code-base). The reading code-path is not so straight-forward. There are two classes that extend the {{IFile.Reader}} ({{InMemoryReader}} and {{RawKVIteratorReader}}). The {{InMemoryReader}} totally ignores the inputStream that is initialized in the base class constructor and there are places in the codeBase that the input stream is not initialized in the Reader but in the {{Segment::init()}} method (which in my opinion makes the {{IFile}} abstraction a bit leaky since the underlying stream should be handled in its entirity in the IFile Writer/Reader.. the {{Segment}} class (which is part of the {{Merger}} framework) should avoid dealing with the internals of the ).
* Also, I was not able to do away with a lot of if-then checks in the Shuffle phase... (another instance of leaky abstraction mentioned in the previous point), the implementations of {{MapOutput::shuffle}} method creates {{IFileInputStream}}s  directly without an associated {{IFile.Reader}}

> Support for encrypting Intermediate data and spills in local filesystem
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5890
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 2.4.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Arun Suresh
>              Labels: encryption
>         Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz
>
>
> For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 & HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)