You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Konstantin Shvachko (JIRA)" <ji...@apache.org> on 2012/08/28 08:26:09 UTC
[jira] [Comment Edited] (MAPREDUCE-4491) Encryption and Key Protection

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442982#comment-13442982 ] 

Konstantin Shvachko edited comment on MAPREDUCE-4491 at 8/28/12 5:25 PM:
-------------------------------------------------------------------------

Benoy. I went over your design document. Pretty comprehensive description. 
Want to clarify couple of things. 
# Do I understand correctly that your approach can be used to securely store (encrypt) data even on non-secure (security=simple) clusters?
# So JobClient uses current user credentials to obtain keys from the KeyStore, encrypts them with cluster-public-key and sends to the cluster along with the user credentials. JobTracker has nothing to do with the keys and passes the encrypted blob over to TaskTrackers scheduled to execute the tasks. TT decrypts the user keys using private-cluster-key and handles them to the local tasks, which is secure as keys don't travel over the wires. Is it right so far?
# TT should be using user credentials to decrypt the blob of keys somehow? Or does it authenticate the user and then decrypts if authentication passes? I did not find it in your document.
# How cluster-private-key is delivered to TTs?
# I think configuration parameters naming need some changes. They should not start with {{mapreduce.job}}. Based on your examples you can just encrypt a HDFS file without spawning any actual jobs. In this case seeing {{mapreduce.job.*}} seems confusing.
My suggestion is to prefix all parameters with simply {{hadoop.crypto.*}} Then you can use e.g. full word "keystore" instead of "ks".

I plan to get into reviewing the implementation soon.
                
      was (Author: shv):
    Benoy. I went over your design document. Pretty comprehensive description. 
Want to clarify couple of things. 
# Do I understand correctly that your approach can be used to securely store (encrypt) data even on non-secure (security=simple) clusters?
# So JobClient uses current user credentials to obtain keys from the KeyStore, encrypts them with cluster-public-key and sends to the cluster along with the user credentials. JobTracker has nothing to do with the keys and passes the encrypted blob over to TaskTrackers scheduled to execute the tasks. TT decrypts the user keys using private-cluster-key and handles them to the local tasks, which is secure as keys don't travel over the wires. Is it right so far?
# TT should be using user credentials to decrypt the blob of keys somehow? Or does it authenticate the user and then decrypts if authentication passes? I did not find it in your document.
# How cluster-private-key is delivered to TTs?
# I think configuration parameters naming need some changes. They should not start with {{mapreduce.job}}. Based on your examples you can just encrypt a HDFS file without spawning any actual jobs. In this case seeing {{mapreduce.job.*}} seems confusing.
My suggestion is to prefix all parameters with simply {{crypto.*}} Then you can use e.g. full word "keystore" instead of "ks".

I plan to get into reviewing the implementation soon.
                  
> Encryption and Key Protection
> -----------------------------
>
>                 Key: MAPREDUCE-4491
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4491
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: documentation, security, task-controller, tasktracker
>            Reporter: Benoy Antony
>            Assignee: Benoy Antony
>         Attachments: Hadoop_Encryption.pdf, Hadoop_Encryption.pdf
>
>
> When dealing with sensitive data, it is required to keep the data encrypted wherever it is stored. Common use case is to pull encrypted data out of a datasource and store in HDFS for analysis. The keys are stored in an external keystore. 
> The feature adds a customizable framework to integrate different types of keystores, support for Java KeyStore, read keys from keystores, and transport keys from JobClient to Tasks.
> The feature adds PGP encryption as a codec and additional utilities to perform encryption related steps.
> The design document is attached. It explains the requirement, design and use cases.
> Kindly review and comment. Collaboration is very much welcome.
> I have a tested patch for this for 1.1 and will upload it soon as an initial work for further refinement.
> Update: The patches are uploaded to subtasks. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira