You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "shanyu zhao (Jira)" <ji...@apache.org> on 2019/09/18 03:24:00 UTC
[jira] [Comment Edited] (YARN-9834) Allow using a pool of local users to run Yarn Secure Container in secure mode

    [ https://issues.apache.org/jira/browse/YARN-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932009#comment-16932009 ] 

shanyu zhao edited comment on YARN-9834 at 9/18/19 3:23 AM:
------------------------------------------------------------

{quote}Given the reasoning of node manager running in Docker container, and node manager isn't really authenticating with Kerberos for the host credential. The proposal drops the basic security of trusted hosts. This means replay attack is possible. Wouldn't it be easier to run the cluster in simple security instead of breaking secure cluster to work like simple security cluster?{quote}
Node manager actually authenticate with Resource manager via Kerberos, we configured a keytab file for node manager to use. All the local pool users do not have permission to access this keytab file. It is still a secure Hadoop cluster. The only reason for Winbind/SSSD to sync domain user to local user, is for LinuxCotnainerExecutor to start the Yarn container process as the synced "domain user" name, without any implicit permission associated with that domain user. What we did here is to skip the domain user sync part, and dynamically allocate local users to Yarn containers to achieve files and processes isolation.

{quote}What happen if the node manager restarted? Will this cause Joe's delegation token to leak?{quote}
You raised a good point here. Node manager process restart will cause re-initialization of ResourceLocalizationService, which will rename the local directories for the node manager (including all existing application folders as sub folders), then schedule delete task. Joe's delegation token was among the files to be deleted. However, this is async process, so in theory there is a short window that if the scheduled FileDeletionTask has not been executed yet. A simple fix is in additional to the rename, change permission to 700 on these folders:
{code}
    renameLocalDir(lfs, localDir, ContainerLocalizer.USERCACHE,
      currentTimeStamp);
    renameLocalDir(lfs, localDir, ContainerLocalizer.FILECACHE,
      currentTimeStamp);
    renameLocalDir(lfs, localDir, ResourceLocalizationService.NM_PRIVATE_DIR,
      currentTimeStamp);
{code}


was (Author: shanyu):
{quote}Given the reasoning of node manager running in Docker container, and node manager isn't really authenticating with Kerberos for the host credential. The proposal drops the basic security of trusted hosts. This means replay attack is possible. Wouldn't it be easier to run the cluster in simple security instead of breaking secure cluster to work like simple security cluster?{quote}
Node manager actually authenticate with Resource manager via Kerberos, we configured a keytab file for node manager to use. All the local pool users do not have permission to access this keytab file. It is still a secure Hadoop cluster. The only reason for Winbind/SSSD to sync domain user to local user, is for LinuxCotnainerExecutor to start the Yarn container process as the synced "domain user" name, without any implicit permission associated with that domain user. What we did here is to skip the domain user sync part, and dynamically allocate local users to Yarn containers to achieve files and processes isolation.

{quote}What happen if the node manager restarted? Will this cause Joe's delegation token to leak?{quote}
You raised a good point here. Node manager process restart will cause re-initialization of ResourceLocalizationService, which will rename the local directories for the node manager (including all existing application folders as sub folders), then schedule delete task. Joe's delegation token was among the files to be deleted. However, this is async process, so in theory there is a short window that if the scheduled FileDeletionTask has not been executed yet. A simple fix is in additional to the rename, change permission to 700 on these folders:
{code}
renameLocalDir(lfs, localDir, ContainerLocalizer.USERCACHE,
      currentTimeStamp);
    renameLocalDir(lfs, localDir, ContainerLocalizer.FILECACHE,
      currentTimeStamp);
    renameLocalDir(lfs, localDir, ResourceLocalizationService.NM_PRIVATE_DIR,
      currentTimeStamp);
{code}

> Allow using a pool of local users to run Yarn Secure Container in secure mode
> -----------------------------------------------------------------------------
>
>                 Key: YARN-9834
>                 URL: https://issues.apache.org/jira/browse/YARN-9834
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.1.2
>            Reporter: shanyu zhao
>            Assignee: shanyu zhao
>            Priority: Major
>
> Yarn Secure Container in secure mode allows separation of different user's local files and container processes running on the same node manager. This depends on an out of band service such as SSSD/Winbind to sync all domain users to local machine.
> Winbind user sync has lots of overhead, especially for large corporations. Also if running Yarn inside Kubernetes cluster (meaning node managers running inside Docker container), it doesn't make sense for each container to domain join with Active Directory and sync a whole copy of domain users.
> We should allow a new configuration to Yarn, such that we can pre-create a pool of users on each machine/Docker container. And at runtime, Yarn allocates a local user to the domain user that submits the application. When all containers of that user are finished and all files belonging to that user are deleted, we can release the allocation and allow other users to use the same local user to run their Yarn containers.
> h2. Design
> We propose to add these new configurations:
> {code:java}
> yarn.nodemanager.linux-container-executor.secure-mode.use-local-user, defaults to false
> yarn.nodemanager.linux-container-executor.secure-mode.local-user-prefix, defaults to "user"{code}
> By default this feature is turned off. If we enable it, with local-user-prefix set to "user", then we expect there are pre-created local users user0 - usern, where the total number of local users equals to:
> {code:java}
> yarn.nodemanager.resource.cpu-vcores {code}
> We can use an in-memory allocator to keep the domain user to local user mapping. 
> Now when to add the mapping and when to remove it?
> In node manager, ApplicationImpl implements the state machine for a Yarn app life cycle, only if the app has at least 1 container running on that node manager. We can hook up the code to add the mapping during application initialization.
> For removing the mapping, we need to wait for 3 things:
> 1) All applications of the same user is completed;
>  2) All log handling of the applications (log aggregation or non-aggregated handling) is done;
>  3) All pending FileDeletionTask that use the user's identity is finished.
> Note that all operation to these reference counting should be synchronized operation.
> If all of our local users in the pool are allocated, we'll return "nonexistuser" as runas user, this will cause the container to fail to execute and Yarn will relaunch it in other nodes.
> h2. Limitations
> 1) This feature does not support PRIVATE visibility type of resource allocation. Because PRIVATE type of resources are potentially cached in the node manager for a very long time, supporting it will be a security problem that a user might be able to peek into previous user's PRIVATE resources. We can modify code to treat all PRIVATE type of resource as APPLICATION type.
> 2) It is recommended to enable DominantResourceCalculator so that no more than "cpu-vcores" number of concurrent containers running on a node manager:
> {code:java}
> yarn.scheduler.capacity.resource-calculator
> = org.apache.hadoop.yarn.util.resource.DominantResourceCalculator {code}
> 3) Currently this feature does not work with Yarn Node Manager recovery. This is because the mappings are kept in memory, it cannot be recovered after node manager restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org