You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Dian Fu (JIRA)" <ji...@apache.org> on 2014/11/24 11:16:12 UTC

[jira] [Created] (MAPREDUCE-6171) The visibilities of the distributed cache files and archives should be determined by both their permissions and if they are located in HDFS encryption zone

Dian Fu created MAPREDUCE-6171:
----------------------------------

             Summary: The visibilities of the distributed cache files and archives should be determined by both their permissions and if they are located in HDFS encryption zone
                 Key: MAPREDUCE-6171
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6171
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: security
            Reporter: Dian Fu


The visibilities of the distributed cache files and archives are currently determined by the permission of these files or archives. 
The following is the logic of method isPublic() in class ClientDistributedCacheManager:
{code}
static boolean isPublic(Configuration conf, URI uri,
      Map<URI, FileStatus> statCache) throws IOException {
    FileSystem fs = FileSystem.get(uri, conf);
    Path current = new Path(uri.getPath());
    //the leaf level file should be readable by others
    if (!checkPermissionOfOther(fs, current, FsAction.READ, statCache)) {
      return false;
    }
    return ancestorsHaveExecutePermissions(fs, current.getParent(), statCache);
  }
{code}
At NodeManager side, it will use "yarn" user to download public files and use the user who submits the job to download private files. In normal cases, there is no problem with this. However, if the files are located in an encryption zone(HDFS-6134) and yarn user are configured to be disallowed to fetch the DataEncryptionKey(DEK) of this encryption zone by KMS, the download process of this file will fail. 

You can reproduce this issue with the following steps (assume you submit job with user "testUser"): 
# create a clean cluster which has HDFS cryptographic FileSystem feature
# create directory "/data/" in HDFS and make it as an encryption zone with keyName "testKey"
# configure KMS to only allow user "testUser" can decrypt DEK of key "testKey" in KMS
{code}
  <property>
    <name>key.acl.testKey.DECRYPT_EEK</name>
    <value>testUser</value>
  </property>
{code}
# execute job "teragen" with user "testUser":
{code}
su -s /bin/bash testUser -c "hadoop jar hadoop-mapreduce-examples*.jar teragen 10000 /data/terasort-input" 
{code}
# execute job "terasort" with user "testUser":
{code}
su -s /bin/bash testUser -c "hadoop jar hadoop-mapreduce-examples*.jar terasort /data/terasort-input /data/terasort-output"
{code}

You will see logs like this at the job submitter's console:
{code}
INFO mapreduce.Job: Job job_1416860917658_0002 failed with state FAILED due to: Application application_1416860917658_0002 failed 2 times due to AM Container for appattempt_1416860917658_0002_000002 exited with  exitCode: -1000 due to: org.apache.hadoop.security.authorize.AuthorizationException: User [yarn] is not authorized to perform [DECRYPT_EEK] on key with ACL name [testKey]!!
{code}

The initial idea to solve this issue is to modify the logic in ClientDistributedCacheManager.isPublic to consider also whether this file is in an encryption zone. If it is in an encryption zone, this file should be considered as private.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)