You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Arun Suresh (JIRA)" <ji...@apache.org> on 2014/06/18 07:34:12 UTC

[jira] [Updated] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun Suresh updated MAPREDUCE-5890:
-----------------------------------

    Attachment: MAPREDUCE-5890.1.patch

Attaching initial patch that encrypts intermediate MapReduce spill files.
NOTE : This is to be applied to the 'fs-encyption' branch

The following are locations in the code path modified by the patch :

1) In the Map phase, when any on-disk file is created : When the Merger writes segments to disk as well as when spill files are written to disk in the MapTask, an IV (Initialization Vector) is initialized for the file and written to the same directory (like the index file) with an appropriate suffix. No encryption happens for in-memory data (when segments are sorted in-memory). At the end of the Map phase, each Map task will have written a single spill file to disk, which is encrypted and an associated IV file will also be present in the directory.

2) The Shuffle Handler : When request for partition comes in from the Fetcher, The ShuffleHandler checks to see if the spill file for the map attempt is encrypted (which would be true if it finds an associated 'crypto-iv' suffixed file). It then adds the IV into the ShuffleHeader for that spillfile and sends the encrypted stream as is to the Fetcher.

3) In the Reduce Phase : The fetcher receives the ShuffleHeader for the HTTP stream and if it finds the IV, will use the IV to wrap the InputStream with a CryptoInputStream and pass it on to the Reduce stage Mergers. Before the Merger writes to disk (Either the OnDiskMerger or the InMemoryMerger)

Other Notes :
* There is no need for over the network encryption of shuffle data as it is already encrypted.
* The Encryption keys are set into the TokenCache in the JobSubmitter.





> Support for encrypting Intermediate data and spills in local filesystem
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5890
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 2.4.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Arun Suresh
>         Attachments: MAPREDUCE-5890.1.patch
>
>
> For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 & HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)