You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2015/04/02 03:34:53 UTC

[jira] [Comment Edited] (SPARK-5682) Add encrypted shuffle in spark

    [ https://issues.apache.org/jira/browse/SPARK-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390235#comment-14390235 ] 

liyunzhang_intel edited comment on SPARK-5682 at 4/2/15 1:34 AM:
-----------------------------------------------------------------

Hi all:
  Now there are two methods to implement SPARK-5682(Add encrypted shuffle in spark).
  Method1: use [Chimera|https://github.com/intel-hadoop/chimera](Chimera is a project which strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to facilitate AES-NI based data encryption in other projects.) to implement spark encrypted shuffle.  Pull request: https://github.com/apache/spark/pull/5307.
  Method2: Add crypto package in spark-core module and add CryptoInputStream.scala and CryptoOutputStream.scala and so on in this package. Pull request : https://github.com/apache/spark/pull/4491.
The latest design doc "Design Document of Encrypted Spark Shuffle_20150402" has been submitted.
Which one is better?  Any advices/guidance are welcome!



was (Author: kellyzly):
Hi all:
  Now there are two methods to implement SPARK-5682(Add encrypted shuffle in spark).
  Method1: use [Chimera|https://github.com/intel-hadoop/chimera](Chimera is a project which strips code related to CryptoInputStream/CryptoOutputStream from Hadoop to facilitate AES-NI based data encryption in other projects.) to implement spark encrypted shuffle.  Pull request: https://github.com/apache/spark/pull/5307.
  Method2: Add crypto package in spark-core module and add CryptoInputStream.scala and CryptoOutputStream.scala and so on in this package. Pull request : https://github.com/apache/spark/pull/4491.

Which one is better?  Any advices/guidance are welcome!


> Add encrypted shuffle in spark
> ------------------------------
>
>                 Key: SPARK-5682
>                 URL: https://issues.apache.org/jira/browse/SPARK-5682
>             Project: Spark
>          Issue Type: New Feature
>          Components: Shuffle
>            Reporter: liyunzhang_intel
>         Attachments: Design Document of Encrypted Spark Shuffle_20150209.docx, Design Document of Encrypted Spark Shuffle_20150318.docx, Design Document of Encrypted Spark Shuffle_20150402.docx
>
>
> Encrypted shuffle is enabled in hadoop 2.6 which make the process of shuffle data safer. This feature is necessary in spark. AES  is a specification for the encryption of electronic data. There are 5 common modes in AES. CTR is one of the modes. We use two codec JceAesCtrCryptoCodec and OpensslAesCtrCryptoCodec to enable spark encrypted shuffle which is also used in hadoop encrypted shuffle. JceAesCtrypoCodec uses encrypted algorithms  jdk provides while OpensslAesCtrCryptoCodec uses encrypted algorithms  openssl provides. 
> Because ugi credential info is used in the process of encrypted shuffle, we first enable encrypted shuffle on spark-on-yarn framework.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org