You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by va...@apache.org on 2019/02/20 19:45:51 UTC

[spark] branch master updated: [SPARK-26877][YARN] Support user-level app staging directory in yarn mode when spark.yarn…

This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new eb6fd7e  [SPARK-26877][YARN] Support user-level app staging directory in yarn mode when spark.yarn…
eb6fd7e is described below

commit eb6fd7eab77d3d5b2e7e827a21b127b146e5c089
Author: Liupengcheng <li...@xiaomi.com>
AuthorDate: Wed Feb 20 11:45:12 2019 -0800

    [SPARK-26877][YARN] Support user-level app staging directory in yarn mode when spark.yarn…
    
    Currently, when running applications on yarn mode, the app staging directory of  is controlled by `spark.yarn.stagingDir` config if specified, and this directory cannot separate different users, sometimes, it's inconvenient for file and quota management for users.
    
    Sometimes, there might be an unexpected increasing of the staging files, two possible reasons are:
    1. The `spark.yarn.preserve.staging.files` provided can be misused by users
    2. cron task constantly starting new applications on non-existent yarn queue(wrong configuration).
    
    But now, we are not easy to find out the which user obtains the most HDFS files or spaces.
    what's more, even we want set HDFS name quota or space quota for each user to limit the increase is impossible.
    
    So I propose to add user sub directories under this app staging directory which is more clear.
    
    existing UT
    
    Closes #23786 from liupc/Support-user-level-app-staging-dir.
    
    Authored-by: Liupengcheng <li...@xiaomi.com>
    Signed-off-by: Marcelo Vanzin <va...@cloudera.com>
---
 .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala      | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 6ca81fb..e0dba8c 100644
--- a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -177,7 +177,8 @@ private[spark] class Client(
 
       // The app staging dir based on the STAGING_DIR configuration if configured
       // otherwise based on the users home directory.
-      val appStagingBaseDir = sparkConf.get(STAGING_DIR).map { new Path(_) }
+      val appStagingBaseDir = sparkConf.get(STAGING_DIR)
+        .map { new Path(_, UserGroupInformation.getCurrentUser.getShortUserName) }
         .getOrElse(FileSystem.get(hadoopConf).getHomeDirectory())
       stagingDirPath = new Path(appStagingBaseDir, getAppStagingDir(appId))
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org