You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Bill Sparks <js...@cray.com> on 2013/10/29 21:19:37 UTC

Why is my output directory owned by yarn?

I have a strange use case and I'm looking for some debugging help.


Use Case:

If I run the hadoop mapped example wordcount program and write the output
to HDFS, the output directory has the correct ownership.

E.g.

hadoop jar 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1

hdfs dfs -ls simple
Found 3 items
drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
-rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1

Where as if I write to a global filesystem my output directory is owned by
yarn


E.g.

hadoop jar 
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
ls -l /scratch/jdoe
total 8
drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1



I've looked at the container log files, and saw no errors. The only thing
I can think of, is the user authentication mode is "files:ldap" and the
nodemanager nodes do not have access to the corporate LDAP server so it's
working of local /etc/shadow which does not have my credentials - so it
might just default to "yarn".

I did find the following warning:

2013-10-29 14:58:52,184 INFO
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
OPERATION=Container Finished -
Succeeded	TARGET=ContainerImpl	RESULT=SUCCESS	APPID=application_13830201365
44_0005	CONTAINERID=container_1383020136544_0005_01_000001
...
2013-10-29 14:58:53,062 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManager
Impl: Trying to stop unknown container
container_1383020136544_0005_01_000001
2013-10-29 14:58:53,062 WARN
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
USER=UnknownUser	IP=10.128.0.17	OPERATION=Stop Container
Request	TARGET=ContainerManagerImpl	RESULT=FAILURE	DESCRIPTION=Trying to
stop unknown 
container!	APPID=application_1383020136544_0005	CONTAINERID=container_13830
20136544_0005_01_000001



 
Thanks,
   John


Re: Why is my output directory owned by yarn?

Posted by Harsh J <ha...@cloudera.com>.
The DefaultContainerExecutor isn't the one that can do setuid. The
LinuxContainerExecutor can do that.

On Fri, Nov 1, 2013 at 8:00 PM, Bill Sparks <js...@cray.com> wrote:
> We'll I thought I've set all this up correctly and on the NodeManager
> nodes can change to my user id, so general user authentication is working.
> But still the output is written as yarn. I guess my question is how to
> enable secure mode - I thought that was the default mode.
>
> When the containers are written they contain the correct user name
> (included).
>
> cat
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
> n_1383247324024_0005/container_1383247324024_0005_01_000001/launch_containe
> r.sh
> #!/bin/bash
>
> export
> YARN_LOCAL_DIRS="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/ap
> pcache/application_1383247324024_0005"
> export NM_HTTP_PORT="8042"
> export HADOOP_COMMON_HOME="/usr/lib/hadoop"
> export JAVA_HOME="/opt/java/jdk1.6.0_20"
> export HADOOP_YARN_HOME="/usr/lib/hadoop-yarn"
> export NM_HOST="nid00031"
> export
> CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/
> lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HA
> DOOP_MAPRED_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_
> MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapre
> duce/lib/*:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:$PWD/*"
> export
> HADOOP_TOKEN_FILE_LOCATION="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/userca
> che/jdoe/appcache/application_1383247324024_0005/container_1383247324024_00
> 05_01_000001/container_tokens"
> export APPLICATION_WEB_PROXY_BASE="/proxy/application_1383247324024_0005"
> export JVM_PID="$$"
> export USER="jdoe"
> export HADOOP_HDFS_HOME="/usr/lib/hadoop-hdfs"
> export
> PWD="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/appli
> cation_1383247324024_0005/container_1383247324024_0005_01_000001"
> export NM_PORT="36276"
> export HOME="/home/"
> export LOGNAME="jdoe"
> export APP_SUBMIT_TIME_ENV="1383312862021"
> export HADOOP_CONF_DIR="/etc/hadoop/conf"
> export MALLOC_ARENA_MAX="4"
> export AM_CONTAINER_ID="container_1383247324024_0005_01_000001"
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-300930022458385182/job.jar" "job.jar"
> mkdir -p jobSubmitDir
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-4297161085730400838/job.splitmetainfo"
> "jobSubmitDir/job.splitmetainfo"
> mkdir -p jobSubmitDir
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-3754219748389402012/job.split"
> "jobSubmitDir/job.split"
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/233482461420248540/job.xml" "job.xml"
> mkdir -p jobSubmitDir
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-8903348211231085224/appTokens"
> "jobSubmitDir/appTokens"
> exec /bin/bash -c "$JAVA_HOME/bin/java
> -Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.mapreduce.container.log.dir=/tmp/hadoop-yarn/containers/applicat
> ion_1383247324024_0005/container_1383247324024_0005_01_000001
> -Dyarn.app.mapreduce.container.log.filesize=0
> -Dhadoop.root.logger=INFO,CLA  -Xmx1024m
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> 1>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
> 247324024_0005_01_000001/stdout
> 2>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
> 247324024_0005_01_000001/stderr  "
>
> # cat
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
> n_1383247324024_0005/container_1383247324024_0005_01_000001/default_contain
> er_executor.sh
> #!/bin/bash
>
> echo $$ >
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
> 0005_01_000001.pid.tmp
> /bin/mv -f
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
> 0005_01_000001.pid.tmp
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
> 0005_01_000001.pid
> exec setsid /bin/bash
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/container_1383247324024_0005_01_000001/launch_contain
> er.sh"
>
>
>
> yarn-site.xml
> ...
> <property>
>    <name>yarn.nodemanager.container-executor.class</name>
>
> <value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</
> value>
>   </property>
>   <property>
>    <name>yarn.nodemanager.linux-container-executor.group</name>
>    <value>hadoop</value>
>   </property>
>
> hdfs-site.conf
> ...
> <property>
>   <name>dfs.permissions</name>
>   <value>true</value>
> </property>
>
>
> --
> Jonathan (Bill) Sparks
> Software Architecture
> Cray Inc.
>
>
>
>
>
> On 10/31/13 6:12 AM, "Harsh J" <ha...@cloudera.com> wrote:
>
>>In insecure mode the containers run as the daemon's owner, i.e.
>>"yarn". Since the LocalFileSystem implementation has no way to
>>impersonate any users (we don't run as root/etc.) it can create files
>>only as the "yarn" user. On HDFS, we can send the right username in as
>>a form of authentication, and its reflected on the created files.
>>
>>If you enable the LinuxContainerExecutor (or generally enable
>>security) then the containers run after being setuid'd to the
>>submitting user, and your files would appear with the right owner.
>>
>>On Wed, Oct 30, 2013 at 1:49 AM, Bill Sparks <js...@cray.com> wrote:
>>>
>>> I have a strange use case and I'm looking for some debugging help.
>>>
>>>
>>> Use Case:
>>>
>>> If I run the hadoop mapped example wordcount program and write the
>>>output
>>> to HDFS, the output directory has the correct ownership.
>>>
>>> E.g.
>>>
>>> hadoop jar
>>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>>> wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1
>>>
>>> hdfs dfs -ls simple
>>> Found 3 items
>>> drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
>>> -rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
>>> drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1
>>>
>>> Where as if I write to a global filesystem my output directory is owned
>>>by
>>> yarn
>>>
>>>
>>> E.g.
>>>
>>> hadoop jar
>>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>>> wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
>>> ls -l /scratch/jdoe
>>> total 8
>>> drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
>>> drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1
>>>
>>>
>>>
>>> I've looked at the container log files, and saw no errors. The only
>>>thing
>>> I can think of, is the user authentication mode is "files:ldap" and the
>>> nodemanager nodes do not have access to the corporate LDAP server so
>>>it's
>>> working of local /etc/shadow which does not have my credentials - so it
>>> might just default to "yarn".
>>>
>>> I did find the following warning:
>>>
>>> 2013-10-29 14:58:52,184 INFO
>>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
>>> OPERATION=Container Finished -
>>> Succeeded       TARGET=ContainerImpl    RESULT=SUCCESS
>>>APPID=application_13830201365
>>> 44_0005 CONTAINERID=container_1383020136544_0005_01_000001
>>> ...
>>> 2013-10-29 14:58:53,062 WARN
>>>
>>>org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManag
>>>er
>>> Impl: Trying to stop unknown container
>>> container_1383020136544_0005_01_000001
>>> 2013-10-29 14:58:53,062 WARN
>>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
>>> USER=UnknownUser        IP=10.128.0.17  OPERATION=Stop Container
>>> Request TARGET=ContainerManagerImpl     RESULT=FAILURE
>>>DESCRIPTION=Trying to
>>> stop unknown
>>> container!      APPID=application_1383020136544_0005
>>>CONTAINERID=container_13830
>>> 20136544_0005_01_000001
>>>
>>>
>>>
>>>
>>> Thanks,
>>>    John
>>>
>>
>>
>>
>>--
>>Harsh J
>



-- 
Harsh J

Re: Why is my output directory owned by yarn?

Posted by Harsh J <ha...@cloudera.com>.
The DefaultContainerExecutor isn't the one that can do setuid. The
LinuxContainerExecutor can do that.

On Fri, Nov 1, 2013 at 8:00 PM, Bill Sparks <js...@cray.com> wrote:
> We'll I thought I've set all this up correctly and on the NodeManager
> nodes can change to my user id, so general user authentication is working.
> But still the output is written as yarn. I guess my question is how to
> enable secure mode - I thought that was the default mode.
>
> When the containers are written they contain the correct user name
> (included).
>
> cat
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
> n_1383247324024_0005/container_1383247324024_0005_01_000001/launch_containe
> r.sh
> #!/bin/bash
>
> export
> YARN_LOCAL_DIRS="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/ap
> pcache/application_1383247324024_0005"
> export NM_HTTP_PORT="8042"
> export HADOOP_COMMON_HOME="/usr/lib/hadoop"
> export JAVA_HOME="/opt/java/jdk1.6.0_20"
> export HADOOP_YARN_HOME="/usr/lib/hadoop-yarn"
> export NM_HOST="nid00031"
> export
> CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/
> lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HA
> DOOP_MAPRED_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_
> MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapre
> duce/lib/*:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:$PWD/*"
> export
> HADOOP_TOKEN_FILE_LOCATION="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/userca
> che/jdoe/appcache/application_1383247324024_0005/container_1383247324024_00
> 05_01_000001/container_tokens"
> export APPLICATION_WEB_PROXY_BASE="/proxy/application_1383247324024_0005"
> export JVM_PID="$$"
> export USER="jdoe"
> export HADOOP_HDFS_HOME="/usr/lib/hadoop-hdfs"
> export
> PWD="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/appli
> cation_1383247324024_0005/container_1383247324024_0005_01_000001"
> export NM_PORT="36276"
> export HOME="/home/"
> export LOGNAME="jdoe"
> export APP_SUBMIT_TIME_ENV="1383312862021"
> export HADOOP_CONF_DIR="/etc/hadoop/conf"
> export MALLOC_ARENA_MAX="4"
> export AM_CONTAINER_ID="container_1383247324024_0005_01_000001"
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-300930022458385182/job.jar" "job.jar"
> mkdir -p jobSubmitDir
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-4297161085730400838/job.splitmetainfo"
> "jobSubmitDir/job.splitmetainfo"
> mkdir -p jobSubmitDir
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-3754219748389402012/job.split"
> "jobSubmitDir/job.split"
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/233482461420248540/job.xml" "job.xml"
> mkdir -p jobSubmitDir
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-8903348211231085224/appTokens"
> "jobSubmitDir/appTokens"
> exec /bin/bash -c "$JAVA_HOME/bin/java
> -Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.mapreduce.container.log.dir=/tmp/hadoop-yarn/containers/applicat
> ion_1383247324024_0005/container_1383247324024_0005_01_000001
> -Dyarn.app.mapreduce.container.log.filesize=0
> -Dhadoop.root.logger=INFO,CLA  -Xmx1024m
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> 1>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
> 247324024_0005_01_000001/stdout
> 2>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
> 247324024_0005_01_000001/stderr  "
>
> # cat
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
> n_1383247324024_0005/container_1383247324024_0005_01_000001/default_contain
> er_executor.sh
> #!/bin/bash
>
> echo $$ >
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
> 0005_01_000001.pid.tmp
> /bin/mv -f
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
> 0005_01_000001.pid.tmp
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
> 0005_01_000001.pid
> exec setsid /bin/bash
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/container_1383247324024_0005_01_000001/launch_contain
> er.sh"
>
>
>
> yarn-site.xml
> ...
> <property>
>    <name>yarn.nodemanager.container-executor.class</name>
>
> <value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</
> value>
>   </property>
>   <property>
>    <name>yarn.nodemanager.linux-container-executor.group</name>
>    <value>hadoop</value>
>   </property>
>
> hdfs-site.conf
> ...
> <property>
>   <name>dfs.permissions</name>
>   <value>true</value>
> </property>
>
>
> --
> Jonathan (Bill) Sparks
> Software Architecture
> Cray Inc.
>
>
>
>
>
> On 10/31/13 6:12 AM, "Harsh J" <ha...@cloudera.com> wrote:
>
>>In insecure mode the containers run as the daemon's owner, i.e.
>>"yarn". Since the LocalFileSystem implementation has no way to
>>impersonate any users (we don't run as root/etc.) it can create files
>>only as the "yarn" user. On HDFS, we can send the right username in as
>>a form of authentication, and its reflected on the created files.
>>
>>If you enable the LinuxContainerExecutor (or generally enable
>>security) then the containers run after being setuid'd to the
>>submitting user, and your files would appear with the right owner.
>>
>>On Wed, Oct 30, 2013 at 1:49 AM, Bill Sparks <js...@cray.com> wrote:
>>>
>>> I have a strange use case and I'm looking for some debugging help.
>>>
>>>
>>> Use Case:
>>>
>>> If I run the hadoop mapped example wordcount program and write the
>>>output
>>> to HDFS, the output directory has the correct ownership.
>>>
>>> E.g.
>>>
>>> hadoop jar
>>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>>> wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1
>>>
>>> hdfs dfs -ls simple
>>> Found 3 items
>>> drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
>>> -rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
>>> drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1
>>>
>>> Where as if I write to a global filesystem my output directory is owned
>>>by
>>> yarn
>>>
>>>
>>> E.g.
>>>
>>> hadoop jar
>>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>>> wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
>>> ls -l /scratch/jdoe
>>> total 8
>>> drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
>>> drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1
>>>
>>>
>>>
>>> I've looked at the container log files, and saw no errors. The only
>>>thing
>>> I can think of, is the user authentication mode is "files:ldap" and the
>>> nodemanager nodes do not have access to the corporate LDAP server so
>>>it's
>>> working of local /etc/shadow which does not have my credentials - so it
>>> might just default to "yarn".
>>>
>>> I did find the following warning:
>>>
>>> 2013-10-29 14:58:52,184 INFO
>>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
>>> OPERATION=Container Finished -
>>> Succeeded       TARGET=ContainerImpl    RESULT=SUCCESS
>>>APPID=application_13830201365
>>> 44_0005 CONTAINERID=container_1383020136544_0005_01_000001
>>> ...
>>> 2013-10-29 14:58:53,062 WARN
>>>
>>>org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManag
>>>er
>>> Impl: Trying to stop unknown container
>>> container_1383020136544_0005_01_000001
>>> 2013-10-29 14:58:53,062 WARN
>>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
>>> USER=UnknownUser        IP=10.128.0.17  OPERATION=Stop Container
>>> Request TARGET=ContainerManagerImpl     RESULT=FAILURE
>>>DESCRIPTION=Trying to
>>> stop unknown
>>> container!      APPID=application_1383020136544_0005
>>>CONTAINERID=container_13830
>>> 20136544_0005_01_000001
>>>
>>>
>>>
>>>
>>> Thanks,
>>>    John
>>>
>>
>>
>>
>>--
>>Harsh J
>



-- 
Harsh J

Re: Why is my output directory owned by yarn?

Posted by Harsh J <ha...@cloudera.com>.
The DefaultContainerExecutor isn't the one that can do setuid. The
LinuxContainerExecutor can do that.

On Fri, Nov 1, 2013 at 8:00 PM, Bill Sparks <js...@cray.com> wrote:
> We'll I thought I've set all this up correctly and on the NodeManager
> nodes can change to my user id, so general user authentication is working.
> But still the output is written as yarn. I guess my question is how to
> enable secure mode - I thought that was the default mode.
>
> When the containers are written they contain the correct user name
> (included).
>
> cat
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
> n_1383247324024_0005/container_1383247324024_0005_01_000001/launch_containe
> r.sh
> #!/bin/bash
>
> export
> YARN_LOCAL_DIRS="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/ap
> pcache/application_1383247324024_0005"
> export NM_HTTP_PORT="8042"
> export HADOOP_COMMON_HOME="/usr/lib/hadoop"
> export JAVA_HOME="/opt/java/jdk1.6.0_20"
> export HADOOP_YARN_HOME="/usr/lib/hadoop-yarn"
> export NM_HOST="nid00031"
> export
> CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/
> lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HA
> DOOP_MAPRED_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_
> MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapre
> duce/lib/*:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:$PWD/*"
> export
> HADOOP_TOKEN_FILE_LOCATION="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/userca
> che/jdoe/appcache/application_1383247324024_0005/container_1383247324024_00
> 05_01_000001/container_tokens"
> export APPLICATION_WEB_PROXY_BASE="/proxy/application_1383247324024_0005"
> export JVM_PID="$$"
> export USER="jdoe"
> export HADOOP_HDFS_HOME="/usr/lib/hadoop-hdfs"
> export
> PWD="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/appli
> cation_1383247324024_0005/container_1383247324024_0005_01_000001"
> export NM_PORT="36276"
> export HOME="/home/"
> export LOGNAME="jdoe"
> export APP_SUBMIT_TIME_ENV="1383312862021"
> export HADOOP_CONF_DIR="/etc/hadoop/conf"
> export MALLOC_ARENA_MAX="4"
> export AM_CONTAINER_ID="container_1383247324024_0005_01_000001"
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-300930022458385182/job.jar" "job.jar"
> mkdir -p jobSubmitDir
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-4297161085730400838/job.splitmetainfo"
> "jobSubmitDir/job.splitmetainfo"
> mkdir -p jobSubmitDir
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-3754219748389402012/job.split"
> "jobSubmitDir/job.split"
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/233482461420248540/job.xml" "job.xml"
> mkdir -p jobSubmitDir
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-8903348211231085224/appTokens"
> "jobSubmitDir/appTokens"
> exec /bin/bash -c "$JAVA_HOME/bin/java
> -Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.mapreduce.container.log.dir=/tmp/hadoop-yarn/containers/applicat
> ion_1383247324024_0005/container_1383247324024_0005_01_000001
> -Dyarn.app.mapreduce.container.log.filesize=0
> -Dhadoop.root.logger=INFO,CLA  -Xmx1024m
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> 1>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
> 247324024_0005_01_000001/stdout
> 2>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
> 247324024_0005_01_000001/stderr  "
>
> # cat
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
> n_1383247324024_0005/container_1383247324024_0005_01_000001/default_contain
> er_executor.sh
> #!/bin/bash
>
> echo $$ >
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
> 0005_01_000001.pid.tmp
> /bin/mv -f
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
> 0005_01_000001.pid.tmp
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
> 0005_01_000001.pid
> exec setsid /bin/bash
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/container_1383247324024_0005_01_000001/launch_contain
> er.sh"
>
>
>
> yarn-site.xml
> ...
> <property>
>    <name>yarn.nodemanager.container-executor.class</name>
>
> <value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</
> value>
>   </property>
>   <property>
>    <name>yarn.nodemanager.linux-container-executor.group</name>
>    <value>hadoop</value>
>   </property>
>
> hdfs-site.conf
> ...
> <property>
>   <name>dfs.permissions</name>
>   <value>true</value>
> </property>
>
>
> --
> Jonathan (Bill) Sparks
> Software Architecture
> Cray Inc.
>
>
>
>
>
> On 10/31/13 6:12 AM, "Harsh J" <ha...@cloudera.com> wrote:
>
>>In insecure mode the containers run as the daemon's owner, i.e.
>>"yarn". Since the LocalFileSystem implementation has no way to
>>impersonate any users (we don't run as root/etc.) it can create files
>>only as the "yarn" user. On HDFS, we can send the right username in as
>>a form of authentication, and its reflected on the created files.
>>
>>If you enable the LinuxContainerExecutor (or generally enable
>>security) then the containers run after being setuid'd to the
>>submitting user, and your files would appear with the right owner.
>>
>>On Wed, Oct 30, 2013 at 1:49 AM, Bill Sparks <js...@cray.com> wrote:
>>>
>>> I have a strange use case and I'm looking for some debugging help.
>>>
>>>
>>> Use Case:
>>>
>>> If I run the hadoop mapped example wordcount program and write the
>>>output
>>> to HDFS, the output directory has the correct ownership.
>>>
>>> E.g.
>>>
>>> hadoop jar
>>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>>> wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1
>>>
>>> hdfs dfs -ls simple
>>> Found 3 items
>>> drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
>>> -rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
>>> drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1
>>>
>>> Where as if I write to a global filesystem my output directory is owned
>>>by
>>> yarn
>>>
>>>
>>> E.g.
>>>
>>> hadoop jar
>>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>>> wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
>>> ls -l /scratch/jdoe
>>> total 8
>>> drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
>>> drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1
>>>
>>>
>>>
>>> I've looked at the container log files, and saw no errors. The only
>>>thing
>>> I can think of, is the user authentication mode is "files:ldap" and the
>>> nodemanager nodes do not have access to the corporate LDAP server so
>>>it's
>>> working of local /etc/shadow which does not have my credentials - so it
>>> might just default to "yarn".
>>>
>>> I did find the following warning:
>>>
>>> 2013-10-29 14:58:52,184 INFO
>>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
>>> OPERATION=Container Finished -
>>> Succeeded       TARGET=ContainerImpl    RESULT=SUCCESS
>>>APPID=application_13830201365
>>> 44_0005 CONTAINERID=container_1383020136544_0005_01_000001
>>> ...
>>> 2013-10-29 14:58:53,062 WARN
>>>
>>>org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManag
>>>er
>>> Impl: Trying to stop unknown container
>>> container_1383020136544_0005_01_000001
>>> 2013-10-29 14:58:53,062 WARN
>>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
>>> USER=UnknownUser        IP=10.128.0.17  OPERATION=Stop Container
>>> Request TARGET=ContainerManagerImpl     RESULT=FAILURE
>>>DESCRIPTION=Trying to
>>> stop unknown
>>> container!      APPID=application_1383020136544_0005
>>>CONTAINERID=container_13830
>>> 20136544_0005_01_000001
>>>
>>>
>>>
>>>
>>> Thanks,
>>>    John
>>>
>>
>>
>>
>>--
>>Harsh J
>



-- 
Harsh J

Re: Why is my output directory owned by yarn?

Posted by Harsh J <ha...@cloudera.com>.
The DefaultContainerExecutor isn't the one that can do setuid. The
LinuxContainerExecutor can do that.

On Fri, Nov 1, 2013 at 8:00 PM, Bill Sparks <js...@cray.com> wrote:
> We'll I thought I've set all this up correctly and on the NodeManager
> nodes can change to my user id, so general user authentication is working.
> But still the output is written as yarn. I guess my question is how to
> enable secure mode - I thought that was the default mode.
>
> When the containers are written they contain the correct user name
> (included).
>
> cat
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
> n_1383247324024_0005/container_1383247324024_0005_01_000001/launch_containe
> r.sh
> #!/bin/bash
>
> export
> YARN_LOCAL_DIRS="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/ap
> pcache/application_1383247324024_0005"
> export NM_HTTP_PORT="8042"
> export HADOOP_COMMON_HOME="/usr/lib/hadoop"
> export JAVA_HOME="/opt/java/jdk1.6.0_20"
> export HADOOP_YARN_HOME="/usr/lib/hadoop-yarn"
> export NM_HOST="nid00031"
> export
> CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/
> lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HA
> DOOP_MAPRED_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_
> MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapre
> duce/lib/*:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:$PWD/*"
> export
> HADOOP_TOKEN_FILE_LOCATION="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/userca
> che/jdoe/appcache/application_1383247324024_0005/container_1383247324024_00
> 05_01_000001/container_tokens"
> export APPLICATION_WEB_PROXY_BASE="/proxy/application_1383247324024_0005"
> export JVM_PID="$$"
> export USER="jdoe"
> export HADOOP_HDFS_HOME="/usr/lib/hadoop-hdfs"
> export
> PWD="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/appli
> cation_1383247324024_0005/container_1383247324024_0005_01_000001"
> export NM_PORT="36276"
> export HOME="/home/"
> export LOGNAME="jdoe"
> export APP_SUBMIT_TIME_ENV="1383312862021"
> export HADOOP_CONF_DIR="/etc/hadoop/conf"
> export MALLOC_ARENA_MAX="4"
> export AM_CONTAINER_ID="container_1383247324024_0005_01_000001"
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-300930022458385182/job.jar" "job.jar"
> mkdir -p jobSubmitDir
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-4297161085730400838/job.splitmetainfo"
> "jobSubmitDir/job.splitmetainfo"
> mkdir -p jobSubmitDir
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-3754219748389402012/job.split"
> "jobSubmitDir/job.split"
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/233482461420248540/job.xml" "job.xml"
> mkdir -p jobSubmitDir
> ln -sf
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/filecache/-8903348211231085224/appTokens"
> "jobSubmitDir/appTokens"
> exec /bin/bash -c "$JAVA_HOME/bin/java
> -Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.mapreduce.container.log.dir=/tmp/hadoop-yarn/containers/applicat
> ion_1383247324024_0005/container_1383247324024_0005_01_000001
> -Dyarn.app.mapreduce.container.log.filesize=0
> -Dhadoop.root.logger=INFO,CLA  -Xmx1024m
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> 1>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
> 247324024_0005_01_000001/stdout
> 2>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
> 247324024_0005_01_000001/stderr  "
>
> # cat
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
> n_1383247324024_0005/container_1383247324024_0005_01_000001/default_contain
> er_executor.sh
> #!/bin/bash
>
> echo $$ >
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
> 0005_01_000001.pid.tmp
> /bin/mv -f
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
> 0005_01_000001.pid.tmp
> /tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
> 0005_01_000001.pid
> exec setsid /bin/bash
> "/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
> on_1383247324024_0005/container_1383247324024_0005_01_000001/launch_contain
> er.sh"
>
>
>
> yarn-site.xml
> ...
> <property>
>    <name>yarn.nodemanager.container-executor.class</name>
>
> <value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</
> value>
>   </property>
>   <property>
>    <name>yarn.nodemanager.linux-container-executor.group</name>
>    <value>hadoop</value>
>   </property>
>
> hdfs-site.conf
> ...
> <property>
>   <name>dfs.permissions</name>
>   <value>true</value>
> </property>
>
>
> --
> Jonathan (Bill) Sparks
> Software Architecture
> Cray Inc.
>
>
>
>
>
> On 10/31/13 6:12 AM, "Harsh J" <ha...@cloudera.com> wrote:
>
>>In insecure mode the containers run as the daemon's owner, i.e.
>>"yarn". Since the LocalFileSystem implementation has no way to
>>impersonate any users (we don't run as root/etc.) it can create files
>>only as the "yarn" user. On HDFS, we can send the right username in as
>>a form of authentication, and its reflected on the created files.
>>
>>If you enable the LinuxContainerExecutor (or generally enable
>>security) then the containers run after being setuid'd to the
>>submitting user, and your files would appear with the right owner.
>>
>>On Wed, Oct 30, 2013 at 1:49 AM, Bill Sparks <js...@cray.com> wrote:
>>>
>>> I have a strange use case and I'm looking for some debugging help.
>>>
>>>
>>> Use Case:
>>>
>>> If I run the hadoop mapped example wordcount program and write the
>>>output
>>> to HDFS, the output directory has the correct ownership.
>>>
>>> E.g.
>>>
>>> hadoop jar
>>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>>> wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1
>>>
>>> hdfs dfs -ls simple
>>> Found 3 items
>>> drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
>>> -rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
>>> drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1
>>>
>>> Where as if I write to a global filesystem my output directory is owned
>>>by
>>> yarn
>>>
>>>
>>> E.g.
>>>
>>> hadoop jar
>>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>>> wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
>>> ls -l /scratch/jdoe
>>> total 8
>>> drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
>>> drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1
>>>
>>>
>>>
>>> I've looked at the container log files, and saw no errors. The only
>>>thing
>>> I can think of, is the user authentication mode is "files:ldap" and the
>>> nodemanager nodes do not have access to the corporate LDAP server so
>>>it's
>>> working of local /etc/shadow which does not have my credentials - so it
>>> might just default to "yarn".
>>>
>>> I did find the following warning:
>>>
>>> 2013-10-29 14:58:52,184 INFO
>>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
>>> OPERATION=Container Finished -
>>> Succeeded       TARGET=ContainerImpl    RESULT=SUCCESS
>>>APPID=application_13830201365
>>> 44_0005 CONTAINERID=container_1383020136544_0005_01_000001
>>> ...
>>> 2013-10-29 14:58:53,062 WARN
>>>
>>>org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManag
>>>er
>>> Impl: Trying to stop unknown container
>>> container_1383020136544_0005_01_000001
>>> 2013-10-29 14:58:53,062 WARN
>>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
>>> USER=UnknownUser        IP=10.128.0.17  OPERATION=Stop Container
>>> Request TARGET=ContainerManagerImpl     RESULT=FAILURE
>>>DESCRIPTION=Trying to
>>> stop unknown
>>> container!      APPID=application_1383020136544_0005
>>>CONTAINERID=container_13830
>>> 20136544_0005_01_000001
>>>
>>>
>>>
>>>
>>> Thanks,
>>>    John
>>>
>>
>>
>>
>>--
>>Harsh J
>



-- 
Harsh J

Re: Why is my output directory owned by yarn?

Posted by Bill Sparks <js...@cray.com>.
We'll I thought I've set all this up correctly and on the NodeManager
nodes can change to my user id, so general user authentication is working.
But still the output is written as yarn. I guess my question is how to
enable secure mode - I thought that was the default mode.

When the containers are written they contain the correct user name
(included).

cat 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
n_1383247324024_0005/container_1383247324024_0005_01_000001/launch_containe
r.sh 
#!/bin/bash

export 
YARN_LOCAL_DIRS="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/ap
pcache/application_1383247324024_0005"
export NM_HTTP_PORT="8042"
export HADOOP_COMMON_HOME="/usr/lib/hadoop"
export JAVA_HOME="/opt/java/jdk1.6.0_20"
export HADOOP_YARN_HOME="/usr/lib/hadoop-yarn"
export NM_HOST="nid00031"
export 
CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/
lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HA
DOOP_MAPRED_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_
MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapre
duce/lib/*:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:$PWD/*"
export 
HADOOP_TOKEN_FILE_LOCATION="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/userca
che/jdoe/appcache/application_1383247324024_0005/container_1383247324024_00
05_01_000001/container_tokens"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1383247324024_0005"
export JVM_PID="$$"
export USER="jdoe"
export HADOOP_HDFS_HOME="/usr/lib/hadoop-hdfs"
export 
PWD="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/appli
cation_1383247324024_0005/container_1383247324024_0005_01_000001"
export NM_PORT="36276"
export HOME="/home/"
export LOGNAME="jdoe"
export APP_SUBMIT_TIME_ENV="1383312862021"
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export MALLOC_ARENA_MAX="4"
export AM_CONTAINER_ID="container_1383247324024_0005_01_000001"
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-300930022458385182/job.jar" "job.jar"
mkdir -p jobSubmitDir
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-4297161085730400838/job.splitmetainfo"
"jobSubmitDir/job.splitmetainfo"
mkdir -p jobSubmitDir
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-3754219748389402012/job.split"
"jobSubmitDir/job.split"
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/233482461420248540/job.xml" "job.xml"
mkdir -p jobSubmitDir
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-8903348211231085224/appTokens"
"jobSubmitDir/appTokens"
exec /bin/bash -c "$JAVA_HOME/bin/java
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.mapreduce.container.log.dir=/tmp/hadoop-yarn/containers/applicat
ion_1383247324024_0005/container_1383247324024_0005_01_000001
-Dyarn.app.mapreduce.container.log.filesize=0
-Dhadoop.root.logger=INFO,CLA  -Xmx1024m
org.apache.hadoop.mapreduce.v2.app.MRAppMaster
1>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
247324024_0005_01_000001/stdout
2>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
247324024_0005_01_000001/stderr  "

# cat 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
n_1383247324024_0005/container_1383247324024_0005_01_000001/default_contain
er_executor.sh 
#!/bin/bash

echo $$ > 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
0005_01_000001.pid.tmp
/bin/mv -f 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
0005_01_000001.pid.tmp
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
0005_01_000001.pid
exec setsid /bin/bash
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/container_1383247324024_0005_01_000001/launch_contain
er.sh"



yarn-site.xml
...
<property>
   <name>yarn.nodemanager.container-executor.class</name>
   
<value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</
value>
  </property>
  <property>
   <name>yarn.nodemanager.linux-container-executor.group</name>
   <value>hadoop</value>
  </property>

hdfs-site.conf
...
<property>
  <name>dfs.permissions</name>
  <value>true</value>
</property>


-- 
Jonathan (Bill) Sparks
Software Architecture
Cray Inc.





On 10/31/13 6:12 AM, "Harsh J" <ha...@cloudera.com> wrote:

>In insecure mode the containers run as the daemon's owner, i.e.
>"yarn". Since the LocalFileSystem implementation has no way to
>impersonate any users (we don't run as root/etc.) it can create files
>only as the "yarn" user. On HDFS, we can send the right username in as
>a form of authentication, and its reflected on the created files.
>
>If you enable the LinuxContainerExecutor (or generally enable
>security) then the containers run after being setuid'd to the
>submitting user, and your files would appear with the right owner.
>
>On Wed, Oct 30, 2013 at 1:49 AM, Bill Sparks <js...@cray.com> wrote:
>>
>> I have a strange use case and I'm looking for some debugging help.
>>
>>
>> Use Case:
>>
>> If I run the hadoop mapped example wordcount program and write the
>>output
>> to HDFS, the output directory has the correct ownership.
>>
>> E.g.
>>
>> hadoop jar
>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>> wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1
>>
>> hdfs dfs -ls simple
>> Found 3 items
>> drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
>> -rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
>> drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1
>>
>> Where as if I write to a global filesystem my output directory is owned
>>by
>> yarn
>>
>>
>> E.g.
>>
>> hadoop jar
>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>> wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
>> ls -l /scratch/jdoe
>> total 8
>> drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
>> drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1
>>
>>
>>
>> I've looked at the container log files, and saw no errors. The only
>>thing
>> I can think of, is the user authentication mode is "files:ldap" and the
>> nodemanager nodes do not have access to the corporate LDAP server so
>>it's
>> working of local /etc/shadow which does not have my credentials - so it
>> might just default to "yarn".
>>
>> I did find the following warning:
>>
>> 2013-10-29 14:58:52,184 INFO
>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
>> OPERATION=Container Finished -
>> Succeeded       TARGET=ContainerImpl    RESULT=SUCCESS
>>APPID=application_13830201365
>> 44_0005 CONTAINERID=container_1383020136544_0005_01_000001
>> ...
>> 2013-10-29 14:58:53,062 WARN
>> 
>>org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManag
>>er
>> Impl: Trying to stop unknown container
>> container_1383020136544_0005_01_000001
>> 2013-10-29 14:58:53,062 WARN
>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
>> USER=UnknownUser        IP=10.128.0.17  OPERATION=Stop Container
>> Request TARGET=ContainerManagerImpl     RESULT=FAILURE
>>DESCRIPTION=Trying to
>> stop unknown
>> container!      APPID=application_1383020136544_0005
>>CONTAINERID=container_13830
>> 20136544_0005_01_000001
>>
>>
>>
>>
>> Thanks,
>>    John
>>
>
>
>
>-- 
>Harsh J


Re: Why is my output directory owned by yarn?

Posted by Bill Sparks <js...@cray.com>.
We'll I thought I've set all this up correctly and on the NodeManager
nodes can change to my user id, so general user authentication is working.
But still the output is written as yarn. I guess my question is how to
enable secure mode - I thought that was the default mode.

When the containers are written they contain the correct user name
(included).

cat 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
n_1383247324024_0005/container_1383247324024_0005_01_000001/launch_containe
r.sh 
#!/bin/bash

export 
YARN_LOCAL_DIRS="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/ap
pcache/application_1383247324024_0005"
export NM_HTTP_PORT="8042"
export HADOOP_COMMON_HOME="/usr/lib/hadoop"
export JAVA_HOME="/opt/java/jdk1.6.0_20"
export HADOOP_YARN_HOME="/usr/lib/hadoop-yarn"
export NM_HOST="nid00031"
export 
CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/
lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HA
DOOP_MAPRED_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_
MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapre
duce/lib/*:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:$PWD/*"
export 
HADOOP_TOKEN_FILE_LOCATION="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/userca
che/jdoe/appcache/application_1383247324024_0005/container_1383247324024_00
05_01_000001/container_tokens"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1383247324024_0005"
export JVM_PID="$$"
export USER="jdoe"
export HADOOP_HDFS_HOME="/usr/lib/hadoop-hdfs"
export 
PWD="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/appli
cation_1383247324024_0005/container_1383247324024_0005_01_000001"
export NM_PORT="36276"
export HOME="/home/"
export LOGNAME="jdoe"
export APP_SUBMIT_TIME_ENV="1383312862021"
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export MALLOC_ARENA_MAX="4"
export AM_CONTAINER_ID="container_1383247324024_0005_01_000001"
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-300930022458385182/job.jar" "job.jar"
mkdir -p jobSubmitDir
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-4297161085730400838/job.splitmetainfo"
"jobSubmitDir/job.splitmetainfo"
mkdir -p jobSubmitDir
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-3754219748389402012/job.split"
"jobSubmitDir/job.split"
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/233482461420248540/job.xml" "job.xml"
mkdir -p jobSubmitDir
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-8903348211231085224/appTokens"
"jobSubmitDir/appTokens"
exec /bin/bash -c "$JAVA_HOME/bin/java
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.mapreduce.container.log.dir=/tmp/hadoop-yarn/containers/applicat
ion_1383247324024_0005/container_1383247324024_0005_01_000001
-Dyarn.app.mapreduce.container.log.filesize=0
-Dhadoop.root.logger=INFO,CLA  -Xmx1024m
org.apache.hadoop.mapreduce.v2.app.MRAppMaster
1>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
247324024_0005_01_000001/stdout
2>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
247324024_0005_01_000001/stderr  "

# cat 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
n_1383247324024_0005/container_1383247324024_0005_01_000001/default_contain
er_executor.sh 
#!/bin/bash

echo $$ > 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
0005_01_000001.pid.tmp
/bin/mv -f 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
0005_01_000001.pid.tmp
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
0005_01_000001.pid
exec setsid /bin/bash
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/container_1383247324024_0005_01_000001/launch_contain
er.sh"



yarn-site.xml
...
<property>
   <name>yarn.nodemanager.container-executor.class</name>
   
<value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</
value>
  </property>
  <property>
   <name>yarn.nodemanager.linux-container-executor.group</name>
   <value>hadoop</value>
  </property>

hdfs-site.conf
...
<property>
  <name>dfs.permissions</name>
  <value>true</value>
</property>


-- 
Jonathan (Bill) Sparks
Software Architecture
Cray Inc.





On 10/31/13 6:12 AM, "Harsh J" <ha...@cloudera.com> wrote:

>In insecure mode the containers run as the daemon's owner, i.e.
>"yarn". Since the LocalFileSystem implementation has no way to
>impersonate any users (we don't run as root/etc.) it can create files
>only as the "yarn" user. On HDFS, we can send the right username in as
>a form of authentication, and its reflected on the created files.
>
>If you enable the LinuxContainerExecutor (or generally enable
>security) then the containers run after being setuid'd to the
>submitting user, and your files would appear with the right owner.
>
>On Wed, Oct 30, 2013 at 1:49 AM, Bill Sparks <js...@cray.com> wrote:
>>
>> I have a strange use case and I'm looking for some debugging help.
>>
>>
>> Use Case:
>>
>> If I run the hadoop mapped example wordcount program and write the
>>output
>> to HDFS, the output directory has the correct ownership.
>>
>> E.g.
>>
>> hadoop jar
>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>> wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1
>>
>> hdfs dfs -ls simple
>> Found 3 items
>> drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
>> -rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
>> drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1
>>
>> Where as if I write to a global filesystem my output directory is owned
>>by
>> yarn
>>
>>
>> E.g.
>>
>> hadoop jar
>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>> wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
>> ls -l /scratch/jdoe
>> total 8
>> drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
>> drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1
>>
>>
>>
>> I've looked at the container log files, and saw no errors. The only
>>thing
>> I can think of, is the user authentication mode is "files:ldap" and the
>> nodemanager nodes do not have access to the corporate LDAP server so
>>it's
>> working of local /etc/shadow which does not have my credentials - so it
>> might just default to "yarn".
>>
>> I did find the following warning:
>>
>> 2013-10-29 14:58:52,184 INFO
>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
>> OPERATION=Container Finished -
>> Succeeded       TARGET=ContainerImpl    RESULT=SUCCESS
>>APPID=application_13830201365
>> 44_0005 CONTAINERID=container_1383020136544_0005_01_000001
>> ...
>> 2013-10-29 14:58:53,062 WARN
>> 
>>org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManag
>>er
>> Impl: Trying to stop unknown container
>> container_1383020136544_0005_01_000001
>> 2013-10-29 14:58:53,062 WARN
>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
>> USER=UnknownUser        IP=10.128.0.17  OPERATION=Stop Container
>> Request TARGET=ContainerManagerImpl     RESULT=FAILURE
>>DESCRIPTION=Trying to
>> stop unknown
>> container!      APPID=application_1383020136544_0005
>>CONTAINERID=container_13830
>> 20136544_0005_01_000001
>>
>>
>>
>>
>> Thanks,
>>    John
>>
>
>
>
>-- 
>Harsh J


Re: Why is my output directory owned by yarn?

Posted by Bill Sparks <js...@cray.com>.
We'll I thought I've set all this up correctly and on the NodeManager
nodes can change to my user id, so general user authentication is working.
But still the output is written as yarn. I guess my question is how to
enable secure mode - I thought that was the default mode.

When the containers are written they contain the correct user name
(included).

cat 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
n_1383247324024_0005/container_1383247324024_0005_01_000001/launch_containe
r.sh 
#!/bin/bash

export 
YARN_LOCAL_DIRS="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/ap
pcache/application_1383247324024_0005"
export NM_HTTP_PORT="8042"
export HADOOP_COMMON_HOME="/usr/lib/hadoop"
export JAVA_HOME="/opt/java/jdk1.6.0_20"
export HADOOP_YARN_HOME="/usr/lib/hadoop-yarn"
export NM_HOST="nid00031"
export 
CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/
lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HA
DOOP_MAPRED_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_
MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapre
duce/lib/*:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:$PWD/*"
export 
HADOOP_TOKEN_FILE_LOCATION="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/userca
che/jdoe/appcache/application_1383247324024_0005/container_1383247324024_00
05_01_000001/container_tokens"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1383247324024_0005"
export JVM_PID="$$"
export USER="jdoe"
export HADOOP_HDFS_HOME="/usr/lib/hadoop-hdfs"
export 
PWD="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/appli
cation_1383247324024_0005/container_1383247324024_0005_01_000001"
export NM_PORT="36276"
export HOME="/home/"
export LOGNAME="jdoe"
export APP_SUBMIT_TIME_ENV="1383312862021"
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export MALLOC_ARENA_MAX="4"
export AM_CONTAINER_ID="container_1383247324024_0005_01_000001"
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-300930022458385182/job.jar" "job.jar"
mkdir -p jobSubmitDir
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-4297161085730400838/job.splitmetainfo"
"jobSubmitDir/job.splitmetainfo"
mkdir -p jobSubmitDir
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-3754219748389402012/job.split"
"jobSubmitDir/job.split"
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/233482461420248540/job.xml" "job.xml"
mkdir -p jobSubmitDir
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-8903348211231085224/appTokens"
"jobSubmitDir/appTokens"
exec /bin/bash -c "$JAVA_HOME/bin/java
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.mapreduce.container.log.dir=/tmp/hadoop-yarn/containers/applicat
ion_1383247324024_0005/container_1383247324024_0005_01_000001
-Dyarn.app.mapreduce.container.log.filesize=0
-Dhadoop.root.logger=INFO,CLA  -Xmx1024m
org.apache.hadoop.mapreduce.v2.app.MRAppMaster
1>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
247324024_0005_01_000001/stdout
2>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
247324024_0005_01_000001/stderr  "

# cat 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
n_1383247324024_0005/container_1383247324024_0005_01_000001/default_contain
er_executor.sh 
#!/bin/bash

echo $$ > 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
0005_01_000001.pid.tmp
/bin/mv -f 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
0005_01_000001.pid.tmp
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
0005_01_000001.pid
exec setsid /bin/bash
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/container_1383247324024_0005_01_000001/launch_contain
er.sh"



yarn-site.xml
...
<property>
   <name>yarn.nodemanager.container-executor.class</name>
   
<value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</
value>
  </property>
  <property>
   <name>yarn.nodemanager.linux-container-executor.group</name>
   <value>hadoop</value>
  </property>

hdfs-site.conf
...
<property>
  <name>dfs.permissions</name>
  <value>true</value>
</property>


-- 
Jonathan (Bill) Sparks
Software Architecture
Cray Inc.





On 10/31/13 6:12 AM, "Harsh J" <ha...@cloudera.com> wrote:

>In insecure mode the containers run as the daemon's owner, i.e.
>"yarn". Since the LocalFileSystem implementation has no way to
>impersonate any users (we don't run as root/etc.) it can create files
>only as the "yarn" user. On HDFS, we can send the right username in as
>a form of authentication, and its reflected on the created files.
>
>If you enable the LinuxContainerExecutor (or generally enable
>security) then the containers run after being setuid'd to the
>submitting user, and your files would appear with the right owner.
>
>On Wed, Oct 30, 2013 at 1:49 AM, Bill Sparks <js...@cray.com> wrote:
>>
>> I have a strange use case and I'm looking for some debugging help.
>>
>>
>> Use Case:
>>
>> If I run the hadoop mapped example wordcount program and write the
>>output
>> to HDFS, the output directory has the correct ownership.
>>
>> E.g.
>>
>> hadoop jar
>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>> wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1
>>
>> hdfs dfs -ls simple
>> Found 3 items
>> drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
>> -rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
>> drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1
>>
>> Where as if I write to a global filesystem my output directory is owned
>>by
>> yarn
>>
>>
>> E.g.
>>
>> hadoop jar
>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>> wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
>> ls -l /scratch/jdoe
>> total 8
>> drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
>> drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1
>>
>>
>>
>> I've looked at the container log files, and saw no errors. The only
>>thing
>> I can think of, is the user authentication mode is "files:ldap" and the
>> nodemanager nodes do not have access to the corporate LDAP server so
>>it's
>> working of local /etc/shadow which does not have my credentials - so it
>> might just default to "yarn".
>>
>> I did find the following warning:
>>
>> 2013-10-29 14:58:52,184 INFO
>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
>> OPERATION=Container Finished -
>> Succeeded       TARGET=ContainerImpl    RESULT=SUCCESS
>>APPID=application_13830201365
>> 44_0005 CONTAINERID=container_1383020136544_0005_01_000001
>> ...
>> 2013-10-29 14:58:53,062 WARN
>> 
>>org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManag
>>er
>> Impl: Trying to stop unknown container
>> container_1383020136544_0005_01_000001
>> 2013-10-29 14:58:53,062 WARN
>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
>> USER=UnknownUser        IP=10.128.0.17  OPERATION=Stop Container
>> Request TARGET=ContainerManagerImpl     RESULT=FAILURE
>>DESCRIPTION=Trying to
>> stop unknown
>> container!      APPID=application_1383020136544_0005
>>CONTAINERID=container_13830
>> 20136544_0005_01_000001
>>
>>
>>
>>
>> Thanks,
>>    John
>>
>
>
>
>-- 
>Harsh J


Re: Why is my output directory owned by yarn?

Posted by Bill Sparks <js...@cray.com>.
We'll I thought I've set all this up correctly and on the NodeManager
nodes can change to my user id, so general user authentication is working.
But still the output is written as yarn. I guess my question is how to
enable secure mode - I thought that was the default mode.

When the containers are written they contain the correct user name
(included).

cat 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
n_1383247324024_0005/container_1383247324024_0005_01_000001/launch_containe
r.sh 
#!/bin/bash

export 
YARN_LOCAL_DIRS="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/ap
pcache/application_1383247324024_0005"
export NM_HTTP_PORT="8042"
export HADOOP_COMMON_HOME="/usr/lib/hadoop"
export JAVA_HOME="/opt/java/jdk1.6.0_20"
export HADOOP_YARN_HOME="/usr/lib/hadoop-yarn"
export NM_HOST="nid00031"
export 
CLASSPATH="$PWD:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/
lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_MAPRED_HOME/*:$HA
DOOP_MAPRED_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*:$HADOOP_
MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapre
duce/lib/*:job.jar/job.jar:job.jar/classes/:job.jar/lib/*:$PWD/*"
export 
HADOOP_TOKEN_FILE_LOCATION="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/userca
che/jdoe/appcache/application_1383247324024_0005/container_1383247324024_00
05_01_000001/container_tokens"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1383247324024_0005"
export JVM_PID="$$"
export USER="jdoe"
export HADOOP_HDFS_HOME="/usr/lib/hadoop-hdfs"
export 
PWD="/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/appli
cation_1383247324024_0005/container_1383247324024_0005_01_000001"
export NM_PORT="36276"
export HOME="/home/"
export LOGNAME="jdoe"
export APP_SUBMIT_TIME_ENV="1383312862021"
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export MALLOC_ARENA_MAX="4"
export AM_CONTAINER_ID="container_1383247324024_0005_01_000001"
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-300930022458385182/job.jar" "job.jar"
mkdir -p jobSubmitDir
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-4297161085730400838/job.splitmetainfo"
"jobSubmitDir/job.splitmetainfo"
mkdir -p jobSubmitDir
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-3754219748389402012/job.split"
"jobSubmitDir/job.split"
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/233482461420248540/job.xml" "job.xml"
mkdir -p jobSubmitDir
ln -sf 
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/filecache/-8903348211231085224/appTokens"
"jobSubmitDir/appTokens"
exec /bin/bash -c "$JAVA_HOME/bin/java
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.mapreduce.container.log.dir=/tmp/hadoop-yarn/containers/applicat
ion_1383247324024_0005/container_1383247324024_0005_01_000001
-Dyarn.app.mapreduce.container.log.filesize=0
-Dhadoop.root.logger=INFO,CLA  -Xmx1024m
org.apache.hadoop.mapreduce.v2.app.MRAppMaster
1>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
247324024_0005_01_000001/stdout
2>/tmp/hadoop-yarn/containers/application_1383247324024_0005/container_1383
247324024_0005_01_000001/stderr  "

# cat 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicatio
n_1383247324024_0005/container_1383247324024_0005_01_000001/default_contain
er_executor.sh 
#!/bin/bash

echo $$ > 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
0005_01_000001.pid.tmp
/bin/mv -f 
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
0005_01_000001.pid.tmp
/tmp/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1383247324024_
0005_01_000001.pid
exec setsid /bin/bash
"/tmp/hadoop-yarn/cache/yarn/nm-local-dir/usercache/jdoe/appcache/applicati
on_1383247324024_0005/container_1383247324024_0005_01_000001/launch_contain
er.sh"



yarn-site.xml
...
<property>
   <name>yarn.nodemanager.container-executor.class</name>
   
<value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</
value>
  </property>
  <property>
   <name>yarn.nodemanager.linux-container-executor.group</name>
   <value>hadoop</value>
  </property>

hdfs-site.conf
...
<property>
  <name>dfs.permissions</name>
  <value>true</value>
</property>


-- 
Jonathan (Bill) Sparks
Software Architecture
Cray Inc.





On 10/31/13 6:12 AM, "Harsh J" <ha...@cloudera.com> wrote:

>In insecure mode the containers run as the daemon's owner, i.e.
>"yarn". Since the LocalFileSystem implementation has no way to
>impersonate any users (we don't run as root/etc.) it can create files
>only as the "yarn" user. On HDFS, we can send the right username in as
>a form of authentication, and its reflected on the created files.
>
>If you enable the LinuxContainerExecutor (or generally enable
>security) then the containers run after being setuid'd to the
>submitting user, and your files would appear with the right owner.
>
>On Wed, Oct 30, 2013 at 1:49 AM, Bill Sparks <js...@cray.com> wrote:
>>
>> I have a strange use case and I'm looking for some debugging help.
>>
>>
>> Use Case:
>>
>> If I run the hadoop mapped example wordcount program and write the
>>output
>> to HDFS, the output directory has the correct ownership.
>>
>> E.g.
>>
>> hadoop jar
>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>> wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1
>>
>> hdfs dfs -ls simple
>> Found 3 items
>> drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
>> -rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
>> drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1
>>
>> Where as if I write to a global filesystem my output directory is owned
>>by
>> yarn
>>
>>
>> E.g.
>>
>> hadoop jar
>> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
>> wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
>> ls -l /scratch/jdoe
>> total 8
>> drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
>> drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1
>>
>>
>>
>> I've looked at the container log files, and saw no errors. The only
>>thing
>> I can think of, is the user authentication mode is "files:ldap" and the
>> nodemanager nodes do not have access to the corporate LDAP server so
>>it's
>> working of local /etc/shadow which does not have my credentials - so it
>> might just default to "yarn".
>>
>> I did find the following warning:
>>
>> 2013-10-29 14:58:52,184 INFO
>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
>> OPERATION=Container Finished -
>> Succeeded       TARGET=ContainerImpl    RESULT=SUCCESS
>>APPID=application_13830201365
>> 44_0005 CONTAINERID=container_1383020136544_0005_01_000001
>> ...
>> 2013-10-29 14:58:53,062 WARN
>> 
>>org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManag
>>er
>> Impl: Trying to stop unknown container
>> container_1383020136544_0005_01_000001
>> 2013-10-29 14:58:53,062 WARN
>> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
>> USER=UnknownUser        IP=10.128.0.17  OPERATION=Stop Container
>> Request TARGET=ContainerManagerImpl     RESULT=FAILURE
>>DESCRIPTION=Trying to
>> stop unknown
>> container!      APPID=application_1383020136544_0005
>>CONTAINERID=container_13830
>> 20136544_0005_01_000001
>>
>>
>>
>>
>> Thanks,
>>    John
>>
>
>
>
>-- 
>Harsh J


Re: Why is my output directory owned by yarn?

Posted by Harsh J <ha...@cloudera.com>.
In insecure mode the containers run as the daemon's owner, i.e.
"yarn". Since the LocalFileSystem implementation has no way to
impersonate any users (we don't run as root/etc.) it can create files
only as the "yarn" user. On HDFS, we can send the right username in as
a form of authentication, and its reflected on the created files.

If you enable the LinuxContainerExecutor (or generally enable
security) then the containers run after being setuid'd to the
submitting user, and your files would appear with the right owner.

On Wed, Oct 30, 2013 at 1:49 AM, Bill Sparks <js...@cray.com> wrote:
>
> I have a strange use case and I'm looking for some debugging help.
>
>
> Use Case:
>
> If I run the hadoop mapped example wordcount program and write the output
> to HDFS, the output directory has the correct ownership.
>
> E.g.
>
> hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
> wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1
>
> hdfs dfs -ls simple
> Found 3 items
> drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
> -rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
> drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1
>
> Where as if I write to a global filesystem my output directory is owned by
> yarn
>
>
> E.g.
>
> hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
> wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
> ls -l /scratch/jdoe
> total 8
> drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
> drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1
>
>
>
> I've looked at the container log files, and saw no errors. The only thing
> I can think of, is the user authentication mode is "files:ldap" and the
> nodemanager nodes do not have access to the corporate LDAP server so it's
> working of local /etc/shadow which does not have my credentials - so it
> might just default to "yarn".
>
> I did find the following warning:
>
> 2013-10-29 14:58:52,184 INFO
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
> OPERATION=Container Finished -
> Succeeded       TARGET=ContainerImpl    RESULT=SUCCESS  APPID=application_13830201365
> 44_0005 CONTAINERID=container_1383020136544_0005_01_000001
> ...
> 2013-10-29 14:58:53,062 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManager
> Impl: Trying to stop unknown container
> container_1383020136544_0005_01_000001
> 2013-10-29 14:58:53,062 WARN
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
> USER=UnknownUser        IP=10.128.0.17  OPERATION=Stop Container
> Request TARGET=ContainerManagerImpl     RESULT=FAILURE  DESCRIPTION=Trying to
> stop unknown
> container!      APPID=application_1383020136544_0005    CONTAINERID=container_13830
> 20136544_0005_01_000001
>
>
>
>
> Thanks,
>    John
>



-- 
Harsh J

Re: Why is my output directory owned by yarn?

Posted by Harsh J <ha...@cloudera.com>.
In insecure mode the containers run as the daemon's owner, i.e.
"yarn". Since the LocalFileSystem implementation has no way to
impersonate any users (we don't run as root/etc.) it can create files
only as the "yarn" user. On HDFS, we can send the right username in as
a form of authentication, and its reflected on the created files.

If you enable the LinuxContainerExecutor (or generally enable
security) then the containers run after being setuid'd to the
submitting user, and your files would appear with the right owner.

On Wed, Oct 30, 2013 at 1:49 AM, Bill Sparks <js...@cray.com> wrote:
>
> I have a strange use case and I'm looking for some debugging help.
>
>
> Use Case:
>
> If I run the hadoop mapped example wordcount program and write the output
> to HDFS, the output directory has the correct ownership.
>
> E.g.
>
> hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
> wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1
>
> hdfs dfs -ls simple
> Found 3 items
> drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
> -rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
> drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1
>
> Where as if I write to a global filesystem my output directory is owned by
> yarn
>
>
> E.g.
>
> hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
> wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
> ls -l /scratch/jdoe
> total 8
> drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
> drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1
>
>
>
> I've looked at the container log files, and saw no errors. The only thing
> I can think of, is the user authentication mode is "files:ldap" and the
> nodemanager nodes do not have access to the corporate LDAP server so it's
> working of local /etc/shadow which does not have my credentials - so it
> might just default to "yarn".
>
> I did find the following warning:
>
> 2013-10-29 14:58:52,184 INFO
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
> OPERATION=Container Finished -
> Succeeded       TARGET=ContainerImpl    RESULT=SUCCESS  APPID=application_13830201365
> 44_0005 CONTAINERID=container_1383020136544_0005_01_000001
> ...
> 2013-10-29 14:58:53,062 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManager
> Impl: Trying to stop unknown container
> container_1383020136544_0005_01_000001
> 2013-10-29 14:58:53,062 WARN
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
> USER=UnknownUser        IP=10.128.0.17  OPERATION=Stop Container
> Request TARGET=ContainerManagerImpl     RESULT=FAILURE  DESCRIPTION=Trying to
> stop unknown
> container!      APPID=application_1383020136544_0005    CONTAINERID=container_13830
> 20136544_0005_01_000001
>
>
>
>
> Thanks,
>    John
>



-- 
Harsh J

Re: Why is my output directory owned by yarn?

Posted by Harsh J <ha...@cloudera.com>.
In insecure mode the containers run as the daemon's owner, i.e.
"yarn". Since the LocalFileSystem implementation has no way to
impersonate any users (we don't run as root/etc.) it can create files
only as the "yarn" user. On HDFS, we can send the right username in as
a form of authentication, and its reflected on the created files.

If you enable the LinuxContainerExecutor (or generally enable
security) then the containers run after being setuid'd to the
submitting user, and your files would appear with the right owner.

On Wed, Oct 30, 2013 at 1:49 AM, Bill Sparks <js...@cray.com> wrote:
>
> I have a strange use case and I'm looking for some debugging help.
>
>
> Use Case:
>
> If I run the hadoop mapped example wordcount program and write the output
> to HDFS, the output directory has the correct ownership.
>
> E.g.
>
> hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
> wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1
>
> hdfs dfs -ls simple
> Found 3 items
> drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
> -rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
> drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1
>
> Where as if I write to a global filesystem my output directory is owned by
> yarn
>
>
> E.g.
>
> hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
> wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
> ls -l /scratch/jdoe
> total 8
> drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
> drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1
>
>
>
> I've looked at the container log files, and saw no errors. The only thing
> I can think of, is the user authentication mode is "files:ldap" and the
> nodemanager nodes do not have access to the corporate LDAP server so it's
> working of local /etc/shadow which does not have my credentials - so it
> might just default to "yarn".
>
> I did find the following warning:
>
> 2013-10-29 14:58:52,184 INFO
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
> OPERATION=Container Finished -
> Succeeded       TARGET=ContainerImpl    RESULT=SUCCESS  APPID=application_13830201365
> 44_0005 CONTAINERID=container_1383020136544_0005_01_000001
> ...
> 2013-10-29 14:58:53,062 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManager
> Impl: Trying to stop unknown container
> container_1383020136544_0005_01_000001
> 2013-10-29 14:58:53,062 WARN
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
> USER=UnknownUser        IP=10.128.0.17  OPERATION=Stop Container
> Request TARGET=ContainerManagerImpl     RESULT=FAILURE  DESCRIPTION=Trying to
> stop unknown
> container!      APPID=application_1383020136544_0005    CONTAINERID=container_13830
> 20136544_0005_01_000001
>
>
>
>
> Thanks,
>    John
>



-- 
Harsh J

Re: Why is my output directory owned by yarn?

Posted by Harsh J <ha...@cloudera.com>.
In insecure mode the containers run as the daemon's owner, i.e.
"yarn". Since the LocalFileSystem implementation has no way to
impersonate any users (we don't run as root/etc.) it can create files
only as the "yarn" user. On HDFS, we can send the right username in as
a form of authentication, and its reflected on the created files.

If you enable the LinuxContainerExecutor (or generally enable
security) then the containers run after being setuid'd to the
submitting user, and your files would appear with the right owner.

On Wed, Oct 30, 2013 at 1:49 AM, Bill Sparks <js...@cray.com> wrote:
>
> I have a strange use case and I'm looking for some debugging help.
>
>
> Use Case:
>
> If I run the hadoop mapped example wordcount program and write the output
> to HDFS, the output directory has the correct ownership.
>
> E.g.
>
> hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
> wordcount /user/jdoe/simple/HF.txt /users/jdoe/simple/outtest1
>
> hdfs dfs -ls simple
> Found 3 items
> drwxr-xr-x - jdoe supergroup 0 2013-10-25 21:26 simple/HF.out
> -rw-r--r-- 1 jdoe supergroup 610157 2013-10-25 21:21 simple/HF.txt
> drwxr-xr-x - jdoe supergroup 0 2013-10-29 14:50 simple/outtest1
>
> Where as if I write to a global filesystem my output directory is owned by
> yarn
>
>
> E.g.
>
> hadoop jar
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.0.5-alpha.jar
> wordcount /user/jdoe/simple/HF.txt file:///scratch/jdoe/outtest1
> ls -l /scratch/jdoe
> total 8
> drwxr-xr-x 2 root root 4096 Oct 28 23:26 logs
> drwxr-xr-x 2 yarn yarn 4096 Oct 28 23:23 outtest1
>
>
>
> I've looked at the container log files, and saw no errors. The only thing
> I can think of, is the user authentication mode is "files:ldap" and the
> nodemanager nodes do not have access to the corporate LDAP server so it's
> working of local /etc/shadow which does not have my credentials - so it
> might just default to "yarn".
>
> I did find the following warning:
>
> 2013-10-29 14:58:52,184 INFO
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=jdoe
> OPERATION=Container Finished -
> Succeeded       TARGET=ContainerImpl    RESULT=SUCCESS  APPID=application_13830201365
> 44_0005 CONTAINERID=container_1383020136544_0005_01_000001
> ...
> 2013-10-29 14:58:53,062 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManager
> Impl: Trying to stop unknown container
> container_1383020136544_0005_01_000001
> 2013-10-29 14:58:53,062 WARN
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger:
> USER=UnknownUser        IP=10.128.0.17  OPERATION=Stop Container
> Request TARGET=ContainerManagerImpl     RESULT=FAILURE  DESCRIPTION=Trying to
> stop unknown
> container!      APPID=application_1383020136544_0005    CONTAINERID=container_13830
> 20136544_0005_01_000001
>
>
>
>
> Thanks,
>    John
>



-- 
Harsh J