You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Yuqi Wang (JIRA)" <ji...@apache.org> on 2018/07/06 03:46:00 UTC

[jira] [Comment Edited] (HADOOP-15528) Deprecate ContainerLaunch#link by using FileUtil#SymLink

    [ https://issues.apache.org/jira/browse/HADOOP-15528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534409#comment-16534409 ] 

Yuqi Wang edited comment on HADOOP-15528 at 7/6/18 3:45 AM:
------------------------------------------------------------

[~giovanni.fumarola], [~elgoiri], 

Some concerns from my side (please correct me if I am wrong), please take a look:
 # *Maybe Security and Resource Isolation Leak:*
 The *old behavior* is the symlink operation is *executed in the batch script*, which is executed as a child process under some limited privileged and resource isolation environment, such as windows job object (with windows secure container) or linux cgroups, etc. 
 However, the *new behavior* is the symlink operation is *executed by NM itself*, which is executed as a child process under NM itself, it shares the same execution environment as NM.
 So, I worry about there may be some leak for Security, Resource Isolation, etc.
 # *Exit procedure is not straightforward and exit info is too less to debug.*
 For the PATCH implementation:
 It execute the symlink operation before container starts. If fails, it just record a "exit XXX" in batch script instead of throw the failure to its caller. So, even if you execute symlink before container starts, but the fail will not be propagated outside until the container starts.
 So, if I try to debug a container failure, I will see there is a sudden "exit XXX" in the batch script without any other info for why NM add this line there.
 I hope we can make the execution and propagate exit status in the same execution environment, instead of split them into different. The old behavior is all in batch script. But the new behavior split them into NM and batch script.
 # *Better to have Retry:*
 For the PATCH implementation:
 A symlink error from container launch caller should be a transient error, so you will also need to add the corresponding symlink failure exitcode into shouldCountTowardsMaxAttemptRetry. So RM will always retry the AM container in face of symlink error.

Overall, at least for the PATCH implementation, I did not see any benefits.


was (Author: yqwang):
[~giovanni.fumarola], [~elgoiri], 

Some concerns from my side (please correct me if I am wrong), please take a look:
 # *Maybe Security and Resource Isolation Leak:***
The *old behavior* is the symlink operation is *executed in the batch script*, which is executed as a child process under some limited privileged and resource isolation environment, such as windows job object (with windows secure container) or linux cgroups, etc. 
However, the *new behavior* is the symlink operation is executed by NM itself, which is executed as a child process under NM itself, it shares the same execution environment as NM.
So, I worry about there may be some leak for Security, Resource Isolation, etc.
 # *Exit procedure is not straightforward and exit info is too less to debug.*
For the PATCH implementation:
It execute the symlink operation before container starts. If fails, it just record a "exit XXX" in batch script instead of throw the failure to its caller. So, even if you execute symlink before container starts, but the fail will not be propagated outside until the container starts.
So, if I try to debug a container failure, I will see there is a sudden "exit XXX" in the batch script without any other info for why NM add this line there.
I hope we can make the execution and propagate exit status in the same execution environment, instead of split them into different. The old behavior is all in batch script. But the new behavior split them into NM and batch script.
 # *Better to have Retry:*
For the PATCH implementation:
A symlink error from container launch caller should be a transient error, so you will also need to add the corresponding symlink failure exitcode into shouldCountTowardsMaxAttemptRetry. So RM will always retry the AM container in face of symlink error.

Overall, at least for the PATCH implementation, I did not see any benefits.

> Deprecate ContainerLaunch#link by using FileUtil#SymLink
> --------------------------------------------------------
>
>                 Key: HADOOP-15528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15528
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Giovanni Matteo Fumarola
>            Assignee: Giovanni Matteo Fumarola
>            Priority: Major
>         Attachments: HADOOP-15528-HADOOP-15461.v1.patch, HADOOP-15528-HADOOP-15461.v2.patch, HADOOP-15528-HADOOP-15461.v3.patch
>
>
> {{ContainerLaunch}} currently uses its own utility to create links (including winutils).
> This should be deprecated and rely on {{FileUtil#SymLink}} which is already multi-platform and pure Java.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org