You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by jongyoul <gi...@git.apache.org> on 2014/08/26 07:39:20 UTC

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

GitHub user jongyoul opened a pull request:

    https://github.com/apache/spark/pull/2126

    SPARK-3223 runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jongyoul/spark branch-1.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2126.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2126
    
----
commit 5b7755948451f7312e68905fd3105dc847363d1c
Author: Jongyoul Lee <jo...@gmail.com>
Date:   2014-08-26T05:36:52Z

    SPARK-3223 runAsSparkUser cannot change HDFS write permission properly in mesos cluster mode

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by timothysc <gi...@git.apache.org>.

Github user timothysc commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-58067088
  
    @jongyoul Stack integration isn't covered by the unit tests.   As long as folks have test details outlined and some form of reproduction information, I think that's as good as it's going to get for now.    


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-58049735
  
    @timothysc I think it's a little bit complicated situation, so basically, I've used a logger for printing Hadoop User in details. I inserted a logger code on every code that I want to test and saw the result and analyzed code and run again. Actually, I'm not good at how to write test code on distributed situation, and I think it's efficient way to test my patch. Finally, if you have an idea to test my code, please let me know that. I'm willing to write codes and try to test them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tnachen <gi...@git.apache.org>.

Github user tnachen commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60352253
  
    @mateiz the tests that failed seems nothing to do with this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-55889694
  
    @tgravescs not seriously, I'm deal with this issue unless it's not fixed
    until Sep.
    
    On Wednesday, September 17, 2014, Tom Graves <no...@github.com>
    wrote:
    
    > @jongyoul <https://github.com/jongyoul> sorry I don't know what you mean
    > by on Aug?
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/2126#issuecomment-55887296>.
    >
    
    
    -- 
    이종열, Jongyoul Lee, 李宗烈
    http://madeng.net


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53614721
  
    oh my misunderstanding... You're right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53456220
  
    ah so I think there is some confusion in the logging you are looking at.  The hadoop login commit messages are coming from us doing a getCurrentUser() in the transferCredentials routine (      transferCredentials(UserGroupInformation.getCurrentUser(), ugi)), which isn't inside the doAs yet so those can basically be ignored.  
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53845393
  
    I've found a strange thing below,
    
    14/08/29 16:21:04 DEBUG SparkHadoopUtil: running as user: 1001079
    14/08/29 16:21:04 DEBUG UserGroupInformation: hadoop login
    14/08/29 16:21:04 DEBUG UserGroupInformation: hadoop login commit
    14/08/29 16:21:04 DEBUG UserGroupInformation: using local user:UnixPrincipal: hdfs
    14/08/29 16:21:04 DEBUG UserGroupInformation: UGI loginUser:hdfs (auth:SIMPLE)
    14/08/29 16:21:04 DEBUG UserGroupInformation: PrivilegedAction as:1001079 (auth:SIMPLE) from:org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:56)
    14/08/29 16:21:04 INFO SparkHadoopUtil: User: 1001079 (auth:SIMPLE)
    14/08/29 16:21:04 INFO MesosExecutorBackend: MesosExecutorBackend 1 => User: 1001079 (auth:SIMPLE)
    14/08/29 16:21:04 INFO MesosExecutorBackend: MesosExecutorBackend 2 => User: 1001079 (auth:SIMPLE)
    14/08/29 16:21:04 INFO MesosExecutorBackend: MesosExecutorBackend 3 => User: 1001079 (auth:SIMPLE)
    WARNING: Logging before InitGoogleLogging() is written to STDERR
    I0829 16:21:04.362865 18515 exec.cpp:131] Version: 0.19.1
    I0829 16:21:04.366479 18546 exec.cpp:205] Executor registered on slave 20140829-160303-3374320138-60030-3252-41
    14/08/29 16:21:04 INFO MesosExecutorBackend: Registered with Mesos as executor ID 20140829-160303-3374320138-60030-3252-41
    14/08/29 16:21:04 INFO Executor: Executor => User: hdfs (auth:SIMPLE)
    
    Mesos' id is "hdfs", and my id is "1001079', as you told that, I think a credential problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53447464
  
    This is debug logs for two different versions.
    
    HADOOP_USER_NAME is not set:
    14/08/27 01:11:01 DEBUG UserGroupInformation: hadoop login
    14/08/27 01:11:01 DEBUG UserGroupInformation: hadoop login commit
    14/08/27 01:11:01 DEBUG UserGroupInformation: using local user:UnixPrincipal: hdfs
    14/08/27 01:11:01 DEBUG UserGroupInformation: UGI loginUser:hdfs (auth:SIMPLE)
    
    HADOOP_USER_NAME is set:
    14/08/26 20:18:18 DEBUG SparkHadoopUtil: running as user: 1001079
    14/08/26 20:18:18 DEBUG SparkHadoopUtil: running hadoop client as user: 1001079
    14/08/26 20:18:18 DEBUG UserGroupInformation: hadoop login
    14/08/26 20:18:18 DEBUG UserGroupInformation: hadoop login commit
    14/08/26 20:18:18 DEBUG UserGroupInformation: UGI loginUser:1001079 (auth:SIMPLE)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2126#discussion_r19316607
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala ---
    @@ -19,19 +19,16 @@ package org.apache.spark.scheduler.cluster.mesos
     
     import java.io.File
     import java.util.{ArrayList => JArrayList, List => JList}
    -import java.util.Collections
     
    -import scala.collection.JavaConversions._
    -import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet}
    -
    -import org.apache.mesos.protobuf.ByteString
    -import org.apache.mesos.{Scheduler => MScheduler}
    -import org.apache.mesos._
     import org.apache.mesos.Protos.{TaskInfo => MesosTaskInfo, TaskState => MesosTaskState, _}
    -
    -import org.apache.spark.{Logging, SparkContext, SparkException, TaskState}
    +import org.apache.mesos.protobuf.ByteString
    +import org.apache.mesos.{Scheduler => MScheduler, _}
     import org.apache.spark.scheduler.{ExecutorExited, ExecutorLossReason, SchedulerBackend, SlaveLost, TaskDescription, TaskSchedulerImpl, WorkerOffer}
     import org.apache.spark.util.Utils
    +import org.apache.spark.{Logging, SparkContext, SparkException, TaskState}
    +
    +import scala.collection.JavaConversions._
    +import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet}
    --- End diff --
    
    Don't reorganize the imports like this, it's actually going against our preferred order (which should be java, scala, third-party, and then Spark).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by timothysc <gi...@git.apache.org>.

Github user timothysc commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-55804855
  
    You'll definitely either want to take the user of submitter with a possible config override. 
    
    I'm not aware of any side affects provided a users permissions have been properly configured. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53552876
  
    @tgravescs ,
    
    I've found the reason. :-)
    
    YARN node manager(exactly?) chnages LOGNAME, USER to id which run application while running applications. But mesos doesn't change yet. I'm printing logs at the first of security manager. Here is System.getproperties, System.getenv below. You can also find a USER and a LOGNAME, my id is also "1001079" and yarn's id is "hdfs".
    
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/tmp/mapred/hdfs/nm-local-dir/usercache/1001079/filecache/13/spark-assembly-1.0.3-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/tmp/mapred/hdfs/nm-local-dir/usercache/1001079/filecache/14/spark-examples-1.0.3-SNAPSHOT-hadoop2.3.0-cdh5.0.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    14/08/27 19:26:33 INFO YarnSparkHadoopUtil: java.runtime.name -> Java(TM) SE Runtime Environment
    ...
    user.home -> /app/home/hdfs
    ...
    user.name -> hdfs
    ...
    SPARK_USER -> 1001079
    ...
    HADOOP_CONF_DIR -> /app/hadoop-2.3.0-cdh5.0.1/conf
    HADOOP_DATANODE_OPTS -> ...
    SPARK_YARN_STAGING_DIR -> ...
    HADOOP_NAMENODE_OPTS -> ...
    SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1409135190565,1409135191296
    NM_PORT -> 38930
    LOGNAME -> 1001079
    YARN_CONF_DIR -> /app/hadoop-2.3.0-cdh5.0.1/conf
    ...
    SHELL -> /bin/bash
    SPARK_YARN_CACHE_FILES -> ...
    CLASSPATH -> ...
    USER -> 1001079
    HADOOP_HDFS_HOME -> /app/hadoop-2.3.0-cdh5.0.1
    CONTAINER_ID -> container_1409105908837_0012_01_000002
    HOME -> /home/
    ...
    YARN_NICENESS -> 0
    YARN_IDENT_STRING -> hdfs
    HADOOP_MAPRED_HOME -> /app/hadoop-2.3.0-cdh5.0.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul closed the pull request at:

    https://github.com/apache/spark/pull/2126


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60345204
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22113/consoleFull) for   PR 2126 at commit [`ea7e4cd`](https://github.com/apache/spark/commit/ea7e4cdca4666f958acb68aae0c88cf1e32f9481).
     * This patch **fails some tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53858688
  
    Do you take an advice about relations ( or issues ) between UserGroupInformation and JNI? Mesos is a framework written by C++. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60345207
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22113/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53882879
  
    which JNI call is it changing after?  Where are you printing the log statement from the Executor?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60427964
  
    This issue seems to appear only in PRs against older branches. If you look at the console the `run-tests-jenkins` script is not getting all expected environment variables somehow. I'll look into this shortly...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53434178
  
    I assume you are not using secure HDFS?  Is this problem something caused inside of Mesos? Like is it doing nested doAs calls.  I'm not familiar with the mesos deploys.
    
    You shouldn't need to set the HADOOP_USER_NAME variable because it is already creating a remote user and then doing a  doAs with that user, which should be setting the user without the need for the env variable.
    
    What is the debug statment returning for the user when you run into the problem?
     logDebug("running as user: " + user)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53584096
  
    First of all, to clarify my cluster setting, Hadoop cluster and mesos runs as "hdfs", and my account name is "1001079o". I also don't think LOGNAME affect selecting user, but USER may be related by choosing user on using UnixPrincipal. About printing logs means I add code logDebug() in a SecurityManager.scala. That's it.
    
    I'm agree that you mentioned that my patch is fragile. but I don't think it make a problem because runAsSparkUser always the first function running MesosExecutorBackend.main()
    
    According to your advice, I'm put logDebug() in the UserGroupInformation, Subject and so on. I'll try to code more concrete code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-61232689
  
    @tnachen I fix this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60327161
  
    This looks good to me based on my understanding of Mesos. @tnachen will this still work okay if Mesos is not running as root (and can't switch user)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-61927830
  
    This has been also closed because of merging this patch by PR #3034 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60563727
  
    @andrewor14 You mean that a clean way to merge this patch into main stream is that I code this patch from current master branch and make a pull requst again, isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-57602773
  
    +1 @tnachen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53458049
  
    Yes. I want to run the job as "1001079" above logs, and also write output on hdfs as "1001079" too.
    
    My settings are almost same as what you say. I haven't use yarn yet, and I cannot tell why that works properly.
    
    I may know what you say about this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2126#issuecomment-53452530

What does the spark log say is the user when it does the runAsUser: logDebug("running as user: " + user)? I assume the proper user like 1001079?

Right, what commit does after checking the HADOOP_USER_NAME is look at the os name and this is where I thought the doAs would properly set it. Perhaps I'm mistaken.

To clarify your setup is like this?
- on mesos the executors run as a super user like 'mesos'
- hdfs cluster is running as user 'hdfs'
- when it does the runAsUser it switches to try to use the actual user (SPARK_USER) - for example 'joe'

one other reason I ask about this is that it works fine on yarn. Running insecure hdfs and yarn as the user 'yarn' and and then access hdfs as the actual user (joe) works fine. permissions are set properly. So I'm trying to figure out what the differences is with mesos

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by timothysc <gi...@git.apache.org>.

Github user timothysc commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-55794806
  
    +1 @tnachen, 
    
    val fwInfo = FrameworkInfo.newBuilder().setUser("").setName(sc.appName).build()
    
    Is not set on either course of fine grained mode.  



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53377786
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60341385
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60362057
  
      [Test build #22131 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22131/consoleFull) for   PR 2126 at commit [`ea7e4cd`](https://github.com/apache/spark/commit/ea7e4cdca4666f958acb68aae0c88cf1e32f9481).
     * This patch **fails some tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60362063
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22131/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53609195
  
    @tgravescs 
    
    I found DFSClient set it's ugi from UserGroupInformation.getCurrentUser();, and getCurrentUser's description is "Return the current user, including any doAs in the current stack.". Thus ugi.doAs is ignored. I think that my patch is fragile but, for now, that's the only way to change DFSClient from SPARK_USER.
    
    Please tell me if you have better idea or solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-58031929
  
    thanks @jongyoul, the changes look fine to me, but I'll leave the final review to someone who knows the mesos scheduler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by timothysc <gi...@git.apache.org>.

Github user timothysc commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-59201681
  
    I believe @tgravescs is the only committer on this PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by timothysc <gi...@git.apache.org>.

Github user timothysc commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-58042192
  
    @tgravescs Seems ok, may I ask how you tested/verified @jongyoul? 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

https://github.com/apache/spark/pull/2126#issuecomment-53443546

Yes, I'm not using secure HDFS for some reasons. Mesos is just a resource manager so It doesn't care running program's id. mesos with switch_user option change the running program's id to an account of running spark-submit, but it may occurs another issue like every slave machine knows an account id of running spark-submit. So spark is changing their user id whatever option on mesos about switch_user.

HADOOP_USER_NAME is only valid for non-secure mode. In a secure mode, that property is meaningless and we must use switch_user option.

logDebug("running as user: " + user) changes and be changed remote user to SPARK_USER, and spark application runs as that user. But HDFS is not working like that. in non-secure mode. the user of Filesystem is decided by steps the following, check if hdfs runs a secure mode(KERBEROS) or not, then if it's not in secure mode, check if HADOOP_USER_NAME is set in System.getenv or System.getProperty, and finally, hdfs use system user.(UserGroupInformation.commit())

Spark on mesos runs in a non-secure hdfs mode, hdfs client use system user if HADOOP_USER_NAME is not set, and system user is mesos' id not SPARK_USER. Thus the driver's hdfs user name of running spark-submit is not as same as the id of executor's hdfs client name. this occurs a permission problem.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60355205
  
      [Test build #22131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22131/consoleFull) for   PR 2126 at commit [`ea7e4cd`](https://github.com/apache/spark/commit/ea7e4cdca4666f958acb68aae0c88cf1e32f9481).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-54694393
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53456838
  
    Perhaps there is an issue with what we have wrapped with runAsUser in MesosExecutorBackend


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60410309
  
    This might be some issue that snuck into master. @marmbrus have you seen this test failure?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-59214378
  
    @pwendell @mateiz   Any committers that are more familiar with the mesos stuff that could look at this?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53611043
  
    Right so as it says "including any doAs in the current stack", so the DFSClient is using the user set via the doAs.  The user set from Sparks runAsUser -> doAs is what you specified as SPARK_USER.  so it should be picking it up. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/2126#issuecomment-53575210

thanks for investigating. I don't think LOGNAME should really affect it. I thought it would use the Subject that has been created when you do the createRemoteuser and the doAs. When you say "I'm printing logs at the first of security manager." what do you mean?

One other note is that this patch is fragile because is someone creates a UGI before the HADOOP_USER_NAME is set then it will be ignored. I believe it only does the commit() routine at the first time the UGI is created.

Can you please clarify exactly how things are running:
What user are the spark processes running on mesos running as?
What user is the Hadoop cluster running as?

The reason I want this clarified is above you said "14/08/27 01:11:01 DEBUG UserGroupInformation: using local user:UnixPrincipal: hdfs" which seems to indicate that the spark processes are running as user 'hdfs' not the user 'mesos' like I thought.

Can you put some log statements in various places to see what the user is? For instance inside of the doAS in runAsUser can you log what the UserGroupInformation.getCurrentUser() is?
Could you also perhaps put some log statements in Executor.createClassLoader() to make sure the classloader isn't messing things up. Also perhaps in a few places like in MesosExecutorBackend.launchTask, Executor.launchTask, and TaskRunner.run.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53892835
  
    org.apache.spark.executor.MesosExecutorBackend is a main method for running spark on mesos and calls org.apache.spark.executor.Executor internally. MesosExecutorBackend's override methods are from org.apache.mesos.Executor, which is registered by org.apache.mesos.MesosExecutorDriver, which includes JNI methods.
    
    For example, see these code below,
    
        SparkHadoopUtil.get.runAsSparkUser { () =>
            MesosNativeLibrary.load()
            // Create a new Executor and start it running
            val runner = new MesosExecutorBackend()
            new MesosExecutorDriver(runner).run()
        }
    
    MesosExecutorDriver register runner as an executor for mesos framework. And all methods(register, launchTask and so on) are called from C++ JNI code(src/exec/exec.cpp from mesos source code). JNI calls java methods.
    
    My debugging message coded like this.
    private[spark] class MesosExecutorBackend
      extends MesosExecutor
      with ExecutorBackend
      with Logging {
    
      var executor: Executor = null
      var driver: ExecutorDriver = null
      logDebug(UserGroupInformation.getCurrectUser) // value is my id "1001079"
    ...
    
      override def registered(
          driver: ExecutorDriver,
          executorInfo: ExecutorInfo,
          frameworkInfo: FrameworkInfo,
          slaveInfo: SlaveInfo) {
        logDebug(UserGroupInformation.getCurrentUser) // value is mesos' id "hdfs"
        this.driver = driver
        val properties = Utils.deserialize[Array[(String, String)]](executorInfo.getData.toByteArray)
        executor = new Executor(
          executorInfo.getExecutorId.getValue,
          slaveInfo.getHostname,
          properties)
      }
    
      override def launchTask(d: ExecutorDriver, taskInfo: TaskInfo) {
        logDebug(UserGroupInformation.getCurrentUser) // value is mesos' id "hdfs"
        val taskId = taskInfo.getTaskId.getValue.toLong
        if (executor == null) {
          logError("Received launchTask but executor was null")
        } else {
          executor.launchTask(this, taskId, taskInfo.getData.asReadOnlyByteBuffer)
        }
      }
    
    Thus my result is that appropriate information about UserGroupInformation is not handled by JNI. Actually, because executor.launchTask executor task from spark tasks, doAs should be located inside executor.launchTask only, or mesos must support full jvm environment like UserGroupInformation.
    
    What do you think of it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-54155475
  
    How is the user field in mesos usually set?  
    
    Is mesos launching a separate process or using threads?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60332333
  
    @mateiz My IDE - Intellij - changes the orders of imports like that. I fixed them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53909021
  
    I'm not a JNI expert.  But it would have to be that whatever is calling registered/launch is running from a thread not launched within the doAs. Or it didn't propogate the environment properly through the JNI or pushed a subject on top of ours.  I think you need a mesos expert to look at this.
    
    You could wrap where it is launching tasks but this can cause other issues. see SPARK-1676 and https://github.com/apache/spark/pull/621
    
    @mateiz @pwendell  Are there any Mesos experts around that knows more about how exactly spark interacts with mesos?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53462756
  
    so can you dig into it a bit further and make sure the executors are called from the doAs and that your code access HDFS is inside the doAs?
    I would like to really understand what is going on before putting thing in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-55892735
  
    Ok, sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53853095
  
    Hm... Do you know why getCurrentUser is changed after JNI is called? before jni method calls, getCurrentUser is my id, but after that ID change.... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tnachen <gi...@git.apache.org>.

Github user tnachen commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-54013110
  
    I'm not really familiar with the user permissions with spark yet, but it seems like since we're setting the framework user as an empty string, Mesos automatically switches to the current os::user for you (src/sched/sched.cpp:1123)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tnachen <gi...@git.apache.org>.

Github user tnachen commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-61224117
  
    @jongyoul are you planning to rebase this or?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tnachen <gi...@git.apache.org>.

Github user tnachen commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-54213043
  
    @tgravescs Mesos is always running in a seperate process for each task. If there is a user field set for the framework we launch the framework and set the user to the specified user, and optionally you can also override the framework user by setting a user on the Task's command.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-57589522
  
    @tgravescs This code only apples in mesos mode, so another mode - yarn and standalone - is not affected.
    
    +1 @timothysc,
    
    val fwInfo = FrameworkInfo.newBuilder().setUser(sc.sparkUser).setName(sc.appName).build()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-55837925
  
    @tgravescs
    
    Can I deal with and test it on Aug? I've tested them, but i cannot commit them because I'm in a vacation thesedays.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-53463834
  
    Okay~ I also think there is a little more wonderful solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-55887296
  
    @jongyoul sorry I don't know what you mean by on Aug?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60634200
  
    @jongyoul yeah in general it's a good idea to make the changes in the master branch as well unless it's highly specific to a certain branch (an unlikely scenario), though we can keep this one since people might want this fix for 1.0 as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60341657
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22113/consoleFull) for   PR 2126 at commit [`ea7e4cd`](https://github.com/apache/spark/commit/ea7e4cdca4666f958acb68aae0c88cf1e32f9481).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by mateiz <gi...@git.apache.org>.

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60354910
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tnachen <gi...@git.apache.org>.

Github user tnachen commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60330890
  
    @mateiz Mesos will throw a TASK_FAILED whenever it can't chown the work directory, or when it was launched with the default mesos containerizer it will just fail with a cannot switch user message as well.
    Everything is logged in the slave log eventually, so we can tell from reading the logs. I think this looks good to me +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by tgravescs <gi...@git.apache.org>.

Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-55797660
  
    so it sounds like we should set the user to whoever we wan to access hdfs when we launch the task then.  There aren't any other side affects that will cause is there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-59161108
  
    Could anyone help me to be accepted this patch? Should I write unit test code? I think it needs to use mesos native library, any help?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-58133773
  
    @timothysc I don't understand the meaning of "stack integration", as I guess, you told me that my patch is not covered by unit tests. As you mentioned, unit tests is a good idea to develop code and reproducing useful information. But in this case, spark doesn't have any test code for mesos. only code exists. That's because testing about mesos and hadoop needs mesos and hadoop themselves. So I think It's too hard to set an environments to test it. I, totally, agree with your opinion, but detail advice will help me code. I already installed mesos and hadoop on my company cluster and tested them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: SPARK-3223 runAsSparkUser cannot change HDFS w...

Posted by jongyoul <gi...@git.apache.org>.

Github user jongyoul commented on the pull request:

    https://github.com/apache/spark/pull/2126#issuecomment-60362537
  
    @tnachen @mateiz Test fails as same reason. Please check it again and help me how I can fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org