You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Manuel Sopena Ballesteros <ma...@garvan.org.au> on 2019/09/20 05:44:42 UTC

question about spark impersonation

Dear Zeppelin user community,

I am trying to setup impersonation with AD integration. I followed this documentation https://github.com/sudheer0553/zeppelin-user-impersonation

And impersonation in shell interpreter works but for some reason when I print username through python using spark interpreter it won't work as it shows zeppelin as the user.

[cid:image001.png@01D56FCA.5171B120]

However YARN shows the right user

[cid:image002.png@01D56FCA.5171B120]

Why my python script shows zeppelin user but YARN shows mansop?

Thank you very much

Manuel Sopena Ballesteros

Big Data Engineer | Kinghorn Centre for Clinical Genomics

 [cid:image001.png@01D4C835.ED3C2230] <https://www.garvan.org.au/>

a: 384 Victoria Street, Darlinghurst NSW 2010
p: +61 2 9355 5760  |  +61 4 12 123 123
e: manuel.sb@garvan.org.au<ma...@garvan.org.au>

Like us on Facebook<http://www.facebook.com/garvaninstitute> | Follow us on Twitter<http://twitter.com/GarvanInstitute> and LinkedIn<http://www.linkedin.com/company/garvan-institute-of-medical-research>

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: question about spark impersonation

Posted by Jeff Zhang <zj...@gmail.com>.
Right, as far as I know there's no security issue including file permission.
Regarding accessing /home/mansop, I believe you can still access it via
absolute file path.



Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年9月23日周一 下午3:15写道:

> I will need to check because I don’t have enough experience to predict
> whether there is any impact or not.
>
>
>
> At the beginning I though this had an impact with file permissions but it
> looks I was wrong.
>
>
>
> For instance:
>
>
>
> %spark2.pyspark
>
>
>
> import os
>
>
>
> path = '/home/mansop'
>
>
>
> files = []
>
>
>
> for f in next(os.walk(path))[1]:
>
>                 files.append(os.path.join(f))
>
>
>
> for f in files:
>
>     print(f)
>
>
>
> # output:
>
>
>
> data
>
> .opera
>
> .oracle_jre_usage
>
> .cache
>
> .config
>
> .gradle
>
> http
>
> .ssh
>
> public_html
>
> .eclipse
>
> .idlerc
>
> .mozilla
>
> .pki
>
> .t_coffee
>
> .conda
>
> .local
>
> anaconda2
>
>
>
>
>
> so my next question is. How come I can access /home/mansop if user in
> python is zeppelin?
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Monday, September 23, 2019 11:46 AM
> *To:* users
> *Subject:* Re: question about spark impersonation
>
>
>
> I believe this is an issue of impersonation, because currently zeppelin
> will pass all the environment variables to its interpreter process.
>
> Is this USER environmental variable critical for you ? Because I don't
> think it will cause security issue.
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年9月23日周一 上午9:14
> 写道:
>
> Ok, I have the impression that spark and shell interpreter don’t see the
> same environmental variables
>
>
>
> For instance:
>
>
>
> %spark2.pyspark
>
> import socket
>
> import getpass
>
> import os
>
>
>
> print(os.environ['USER'])
>
>
>
> # Output:
>
> zeppelin
>
>
>
> And:
>
>
>
> %shell
>
> Printenv
>
>
>
> # Output:
>
> XDG_SESSION_ID=c20724
>
> MASTER=yarn-client
>
> HOSTNAME=zama-mlx.mlx
>
> SHELL=/bin/bash
>
> TERM=unknown
>
> HISTSIZE=1000
>
> QTDIR=/usr/lib64/qt-3.3
>
> KERBEROS_REFRESH_INTERVAL=1d
>
> USER=mansop
>
> SUDO_USER=zeppelin
>
> SUDO_UID=1012
>
> ZEPPELIN_PID_DIR=/var/run/zeppelin
>
> USERNAME=mansop
>
> ZEPPELIN_IMPERSONATE_CMD=sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c
>
> NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
>
> MAIL=/var/spool/mail/zeppelin
>
> PATH=/sbin:/bin:/usr/sbin:/usr/bin
>
> _=/bin/printenv
>
> PWD=/home/zeppelin
>
> JAVA_HOME=/usr/jdk64/jdk1.8.0_112
>
> LANG=en_AU.UTF-8
>
> HADOOP_CONF_DIR=/etc/hadoop/conf
>
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
>
> ZEPPELIN_LOG_DIR=/var/log/zeppelin
>
> KINIT_FAIL_THRESHOLD=5
>
> HOME=/home/mansop
>
> SUDO_COMMAND=/bin/bash -c  source
> /usr/hdp/current/zeppelin-server/conf/zeppelin-env.sh;
> /usr/jdk64/jdk1.8.0_112/bin/java -Dfile.encoding=UTF-8
> -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties
> -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-shell-mansop-zeppelin-zama-mlx.mlx.log
> -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -cp
> /etc/zeppelin/conf/external-dependency-conf:/usr/hdp/current/zeppelin-server/interpreter/sh/*:/usr/hdp/current/zeppelin-server/lib/interpreter/*:
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 10.0.11.18
> 42949 :
>
> SHLVL=2
>
>
> ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/zeppelin/conf/external-dependency-conf
>
> LOGNAME=mansop
>
> SUDO_GID=1005
>
>
>
>
>
> So the system variable ‘USER’ does not have the same value in %shell and %
> spark2.pyspark interpreter?
>
>
>
> Thank you
>
>
>
> Manuel
>
>
>
> *From:* Manuel Sopena Ballesteros [mailto:manuel.sb@garvan.org.au]
> *Sent:* Monday, September 23, 2019 10:43 AM
> *To:* users@zeppelin.apache.org
> *Subject:* RE: question about spark impersonation
>
>
>
> Thank you for your prompt response Jeff,
>
>
>
> May I ask which environment variable should I fix?
>
>
>
> These are the ones I can see:
>
>
>
> XDG_SESSION_ID=c20724
>
> MASTER=yarn-client
>
> HOSTNAME=zama-mlx.mlx
>
> SHELL=/bin/bash
>
> TERM=unknown
>
> HISTSIZE=1000
>
> QTDIR=/usr/lib64/qt-3.3
>
> KERBEROS_REFRESH_INTERVAL=1d
>
> USER=mansop
>
> SUDO_USER=zeppelin
>
> SUDO_UID=1012
>
> ZEPPELIN_PID_DIR=/var/run/zeppelin
>
> USERNAME=mansop
>
> ZEPPELIN_IMPERSONATE_CMD=sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c
>
> NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
>
> MAIL=/var/spool/mail/zeppelin
>
> PATH=/sbin:/bin:/usr/sbin:/usr/bin
>
> _=/bin/printenv
>
> PWD=/home/zeppelin
>
> JAVA_HOME=/usr/jdk64/jdk1.8.0_112
>
> LANG=en_AU.UTF-8
>
> HADOOP_CONF_DIR=/etc/hadoop/conf
>
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
>
> ZEPPELIN_LOG_DIR=/var/log/zeppelin
>
> KINIT_FAIL_THRESHOLD=5
>
> HOME=/home/mansop
>
> SUDO_COMMAND=/bin/bash -c  source
> /usr/hdp/current/zeppelin-server/conf/zeppelin-env.sh;
> /usr/jdk64/jdk1.8.0_112/bin/java -Dfile.encoding=UTF-8
> -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties
> -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-shell-mansop-zeppelin-zama-mlx.mlx.log
> -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -cp
> /etc/zeppelin/conf/external-dependency-conf:/usr/hdp/current/zeppelin-server/interpreter/sh/*:/usr/hdp/current/zeppelin-server/lib/interpreter/*:
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 10.0.11.18
> 42949 :
>
> SHLVL=2
>
>
> ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/zeppelin/conf/external-dependency-conf
>
> LOGNAME=mansop
>
> SUDO_GID=1005
>
>
>
> Thank you
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Friday, September 20, 2019 3:56 PM
> *To:* users
> *Subject:* Re: question about spark impersonation
>
>
>
> Actually the impersonation works, what you see in python interpreter is
> due to incorrect pass of environment variable.
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年9月20日周五 下午1:45
> 写道:
>
> Dear Zeppelin user community,
>
>
>
> I am trying to setup impersonation with AD integration. I followed this
> documentation https://github.com/sudheer0553/zeppelin-user-impersonation
>
>
>
> And impersonation in shell interpreter works but for some reason when I
> print username through python using spark interpreter it won’t work as it
> shows zeppelin as the user.
>
>
>
>
>
> However YARN shows the right user
>
>
>
>
>
> Why my python script shows zeppelin user but YARN shows mansop?
>
>
>
> Thank you very much
>
>
>
> Manuel Sopena Ballesteros
>
> Big Data Engineer | Kinghorn Centre for Clinical Genomics
>
>  [image: cid:image001.png@01D4C835.ED3C2230] <https://www.garvan.org.au/>
>
>
> *a:* 384 Victoria Street, Darlinghurst NSW 2010
> *p:* +61 2 9355 5760  |  +61 4 12 123 123
> *e:* manuel.sb@garvan.org.au
>
> Like us on Facebook <http://www.facebook.com/garvaninstitute> | Follow us
> on Twitter <http://twitter.com/GarvanInstitute> and LinkedIn
> <http://www.linkedin.com/company/garvan-institute-of-medical-research>
>
>
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


-- 
Best Regards

Jeff Zhang

RE: question about spark impersonation

Posted by Manuel Sopena Ballesteros <ma...@garvan.org.au>.
I will need to check because I don’t have enough experience to predict whether there is any impact or not.

At the beginning I though this had an impact with file permissions but it looks I was wrong.

For instance:

%spark2.pyspark

import os

path = '/home/mansop'

files = []

for f in next(os.walk(path))[1]:
                files.append(os.path.join(f))

for f in files:
    print(f)

# output:

data
.opera
.oracle_jre_usage
.cache
.config
.gradle
http
.ssh
public_html
.eclipse
.idlerc
.mozilla
.pki
.t_coffee
.conda
.local
anaconda2


so my next question is. How come I can access /home/mansop if user in python is zeppelin?

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Monday, September 23, 2019 11:46 AM
To: users
Subject: Re: question about spark impersonation

I believe this is an issue of impersonation, because currently zeppelin will pass all the environment variables to its interpreter process.
Is this USER environmental variable critical for you ? Because I don't think it will cause security issue.

Manuel Sopena Ballesteros <ma...@garvan.org.au>> 于2019年9月23日周一 上午9:14写道:
Ok, I have the impression that spark and shell interpreter don’t see the same environmental variables

For instance:

%spark2.pyspark
import socket
import getpass
import os

print(os.environ['USER'])

# Output:
zeppelin

And:

%shell
Printenv

# Output:
XDG_SESSION_ID=c20724
MASTER=yarn-client
HOSTNAME=zama-mlx.mlx
SHELL=/bin/bash
TERM=unknown
HISTSIZE=1000
QTDIR=/usr/lib64/qt-3.3
KERBEROS_REFRESH_INTERVAL=1d
USER=mansop
SUDO_USER=zeppelin
SUDO_UID=1012
ZEPPELIN_PID_DIR=/var/run/zeppelin
USERNAME=mansop
ZEPPELIN_IMPERSONATE_CMD=sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
MAIL=/var/spool/mail/zeppelin
PATH=/sbin:/bin:/usr/sbin:/usr/bin
_=/bin/printenv
PWD=/home/zeppelin
JAVA_HOME=/usr/jdk64/jdk1.8.0_112
LANG=en_AU.UTF-8
HADOOP_CONF_DIR=/etc/hadoop/conf
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
ZEPPELIN_LOG_DIR=/var/log/zeppelin
KINIT_FAIL_THRESHOLD=5
HOME=/home/mansop
SUDO_COMMAND=/bin/bash -c  source /usr/hdp/current/zeppelin-server/conf/zeppelin-env.sh; /usr/jdk64/jdk1.8.0_112/bin/java -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-shell-mansop-zeppelin-zama-mlx.mlx.log -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -cp /etc/zeppelin/conf/external-dependency-conf:/usr/hdp/current/zeppelin-server/interpreter/sh/*:/usr/hdp/current/zeppelin-server/lib/interpreter/*: org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 10.0.11.18 42949 :
SHLVL=2
ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/zeppelin/conf/external-dependency-conf
LOGNAME=mansop
SUDO_GID=1005


So the system variable ‘USER’ does not have the same value in %shell and % spark2.pyspark interpreter?

Thank you

Manuel

From: Manuel Sopena Ballesteros [mailto:manuel.sb@garvan.org.au<ma...@garvan.org.au>]
Sent: Monday, September 23, 2019 10:43 AM
To: users@zeppelin.apache.org<ma...@zeppelin.apache.org>
Subject: RE: question about spark impersonation

Thank you for your prompt response Jeff,

May I ask which environment variable should I fix?

These are the ones I can see:

XDG_SESSION_ID=c20724
MASTER=yarn-client
HOSTNAME=zama-mlx.mlx
SHELL=/bin/bash
TERM=unknown
HISTSIZE=1000
QTDIR=/usr/lib64/qt-3.3
KERBEROS_REFRESH_INTERVAL=1d
USER=mansop
SUDO_USER=zeppelin
SUDO_UID=1012
ZEPPELIN_PID_DIR=/var/run/zeppelin
USERNAME=mansop
ZEPPELIN_IMPERSONATE_CMD=sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
MAIL=/var/spool/mail/zeppelin
PATH=/sbin:/bin:/usr/sbin:/usr/bin
_=/bin/printenv
PWD=/home/zeppelin
JAVA_HOME=/usr/jdk64/jdk1.8.0_112
LANG=en_AU.UTF-8
HADOOP_CONF_DIR=/etc/hadoop/conf
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
ZEPPELIN_LOG_DIR=/var/log/zeppelin
KINIT_FAIL_THRESHOLD=5
HOME=/home/mansop
SUDO_COMMAND=/bin/bash -c  source /usr/hdp/current/zeppelin-server/conf/zeppelin-env.sh; /usr/jdk64/jdk1.8.0_112/bin/java -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-shell-mansop-zeppelin-zama-mlx.mlx.log -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -cp /etc/zeppelin/conf/external-dependency-conf:/usr/hdp/current/zeppelin-server/interpreter/sh/*:/usr/hdp/current/zeppelin-server/lib/interpreter/*: org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 10.0.11.18 42949 :
SHLVL=2
ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/zeppelin/conf/external-dependency-conf
LOGNAME=mansop
SUDO_GID=1005

Thank you

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
Sent: Friday, September 20, 2019 3:56 PM
To: users
Subject: Re: question about spark impersonation

Actually the impersonation works, what you see in python interpreter is due to incorrect pass of environment variable.

Manuel Sopena Ballesteros <ma...@garvan.org.au>> 于2019年9月20日周五 下午1:45写道:
Dear Zeppelin user community,

I am trying to setup impersonation with AD integration. I followed this documentation https://github.com/sudheer0553/zeppelin-user-impersonation

And impersonation in shell interpreter works but for some reason when I print username through python using spark interpreter it won’t work as it shows zeppelin as the user.

[cid:image001.png@01D57231.04A58CE0]

However YARN shows the right user

[cid:image002.png@01D57231.04A58CE0]

Why my python script shows zeppelin user but YARN shows mansop?

Thank you very much

Manuel Sopena Ballesteros

Big Data Engineer | Kinghorn Centre for Clinical Genomics

 [cid:image001.png@01D4C835.ED3C2230] <https://www.garvan.org.au/>

a: 384 Victoria Street, Darlinghurst NSW 2010
p: +61 2 9355 5760  |  +61 4 12 123 123
e: manuel.sb@garvan.org.au<ma...@garvan.org.au>

Like us on Facebook<http://www.facebook.com/garvaninstitute> | Follow us on Twitter<http://twitter.com/GarvanInstitute> and LinkedIn<http://www.linkedin.com/company/garvan-institute-of-medical-research>

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.


--
Best Regards

Jeff Zhang
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.


--
Best Regards

Jeff Zhang
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: question about spark impersonation

Posted by Jeff Zhang <zj...@gmail.com>.
I believe this is an issue of impersonation, because currently zeppelin
will pass all the environment variables to its interpreter process.
Is this USER environmental variable critical for you ? Because I don't
think it will cause security issue.

Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年9月23日周一 上午9:14写道:

> Ok, I have the impression that spark and shell interpreter don’t see the
> same environmental variables
>
>
>
> For instance:
>
>
>
> %spark2.pyspark
>
> import socket
>
> import getpass
>
> import os
>
>
>
> print(os.environ['USER'])
>
>
>
> # Output:
>
> zeppelin
>
>
>
> And:
>
>
>
> %shell
>
> Printenv
>
>
>
> # Output:
>
> XDG_SESSION_ID=c20724
>
> MASTER=yarn-client
>
> HOSTNAME=zama-mlx.mlx
>
> SHELL=/bin/bash
>
> TERM=unknown
>
> HISTSIZE=1000
>
> QTDIR=/usr/lib64/qt-3.3
>
> KERBEROS_REFRESH_INTERVAL=1d
>
> USER=mansop
>
> SUDO_USER=zeppelin
>
> SUDO_UID=1012
>
> ZEPPELIN_PID_DIR=/var/run/zeppelin
>
> USERNAME=mansop
>
> ZEPPELIN_IMPERSONATE_CMD=sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c
>
> NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
>
> MAIL=/var/spool/mail/zeppelin
>
> PATH=/sbin:/bin:/usr/sbin:/usr/bin
>
> _=/bin/printenv
>
> PWD=/home/zeppelin
>
> JAVA_HOME=/usr/jdk64/jdk1.8.0_112
>
> LANG=en_AU.UTF-8
>
> HADOOP_CONF_DIR=/etc/hadoop/conf
>
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
>
> ZEPPELIN_LOG_DIR=/var/log/zeppelin
>
> KINIT_FAIL_THRESHOLD=5
>
> HOME=/home/mansop
>
> SUDO_COMMAND=/bin/bash -c  source
> /usr/hdp/current/zeppelin-server/conf/zeppelin-env.sh;
> /usr/jdk64/jdk1.8.0_112/bin/java -Dfile.encoding=UTF-8
> -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties
> -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-shell-mansop-zeppelin-zama-mlx.mlx.log
> -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -cp
> /etc/zeppelin/conf/external-dependency-conf:/usr/hdp/current/zeppelin-server/interpreter/sh/*:/usr/hdp/current/zeppelin-server/lib/interpreter/*:
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 10.0.11.18
> 42949 :
>
> SHLVL=2
>
>
> ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/zeppelin/conf/external-dependency-conf
>
> LOGNAME=mansop
>
> SUDO_GID=1005
>
>
>
>
>
> So the system variable ‘USER’ does not have the same value in %shell and %
> spark2.pyspark interpreter?
>
>
>
> Thank you
>
>
>
> Manuel
>
>
>
> *From:* Manuel Sopena Ballesteros [mailto:manuel.sb@garvan.org.au]
> *Sent:* Monday, September 23, 2019 10:43 AM
> *To:* users@zeppelin.apache.org
> *Subject:* RE: question about spark impersonation
>
>
>
> Thank you for your prompt response Jeff,
>
>
>
> May I ask which environment variable should I fix?
>
>
>
> These are the ones I can see:
>
>
>
> XDG_SESSION_ID=c20724
>
> MASTER=yarn-client
>
> HOSTNAME=zama-mlx.mlx
>
> SHELL=/bin/bash
>
> TERM=unknown
>
> HISTSIZE=1000
>
> QTDIR=/usr/lib64/qt-3.3
>
> KERBEROS_REFRESH_INTERVAL=1d
>
> USER=mansop
>
> SUDO_USER=zeppelin
>
> SUDO_UID=1012
>
> ZEPPELIN_PID_DIR=/var/run/zeppelin
>
> USERNAME=mansop
>
> ZEPPELIN_IMPERSONATE_CMD=sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c
>
> NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
>
> MAIL=/var/spool/mail/zeppelin
>
> PATH=/sbin:/bin:/usr/sbin:/usr/bin
>
> _=/bin/printenv
>
> PWD=/home/zeppelin
>
> JAVA_HOME=/usr/jdk64/jdk1.8.0_112
>
> LANG=en_AU.UTF-8
>
> HADOOP_CONF_DIR=/etc/hadoop/conf
>
> XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
>
> ZEPPELIN_LOG_DIR=/var/log/zeppelin
>
> KINIT_FAIL_THRESHOLD=5
>
> HOME=/home/mansop
>
> SUDO_COMMAND=/bin/bash -c  source
> /usr/hdp/current/zeppelin-server/conf/zeppelin-env.sh;
> /usr/jdk64/jdk1.8.0_112/bin/java -Dfile.encoding=UTF-8
> -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties
> -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-shell-mansop-zeppelin-zama-mlx.mlx.log
> -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -cp
> /etc/zeppelin/conf/external-dependency-conf:/usr/hdp/current/zeppelin-server/interpreter/sh/*:/usr/hdp/current/zeppelin-server/lib/interpreter/*:
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 10.0.11.18
> 42949 :
>
> SHLVL=2
>
>
> ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/zeppelin/conf/external-dependency-conf
>
> LOGNAME=mansop
>
> SUDO_GID=1005
>
>
>
> Thank you
>
>
>
> Manuel
>
>
>
> *From:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *Sent:* Friday, September 20, 2019 3:56 PM
> *To:* users
> *Subject:* Re: question about spark impersonation
>
>
>
> Actually the impersonation works, what you see in python interpreter is
> due to incorrect pass of environment variable.
>
>
>
> Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年9月20日周五 下午1:45
> 写道:
>
> Dear Zeppelin user community,
>
>
>
> I am trying to setup impersonation with AD integration. I followed this
> documentation https://github.com/sudheer0553/zeppelin-user-impersonation
>
>
>
> And impersonation in shell interpreter works but for some reason when I
> print username through python using spark interpreter it won’t work as it
> shows zeppelin as the user.
>
>
>
>
>
> However YARN shows the right user
>
>
>
>
>
> Why my python script shows zeppelin user but YARN shows mansop?
>
>
>
> Thank you very much
>
>
>
> Manuel Sopena Ballesteros
>
> Big Data Engineer | Kinghorn Centre for Clinical Genomics
>
>  [image: cid:image001.png@01D4C835.ED3C2230] <https://www.garvan.org.au/>
>
>
> *a:* 384 Victoria Street, Darlinghurst NSW 2010
> *p:* +61 2 9355 5760  |  +61 4 12 123 123
> *e:* manuel.sb@garvan.org.au
>
> Like us on Facebook <http://www.facebook.com/garvaninstitute> | Follow us
> on Twitter <http://twitter.com/GarvanInstitute> and LinkedIn
> <http://www.linkedin.com/company/garvan-institute-of-medical-research>
>
>
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
>
> NOTICE
>
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


-- 
Best Regards

Jeff Zhang

RE: question about spark impersonation

Posted by Manuel Sopena Ballesteros <ma...@garvan.org.au>.
Ok, I have the impression that spark and shell interpreter don’t see the same environmental variables

For instance:

%spark2.pyspark
import socket
import getpass
import os

print(os.environ['USER'])

# Output:
zeppelin

And:

%shell
Printenv

# Output:
XDG_SESSION_ID=c20724
MASTER=yarn-client
HOSTNAME=zama-mlx.mlx
SHELL=/bin/bash
TERM=unknown
HISTSIZE=1000
QTDIR=/usr/lib64/qt-3.3
KERBEROS_REFRESH_INTERVAL=1d
USER=mansop
SUDO_USER=zeppelin
SUDO_UID=1012
ZEPPELIN_PID_DIR=/var/run/zeppelin
USERNAME=mansop
ZEPPELIN_IMPERSONATE_CMD=sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
MAIL=/var/spool/mail/zeppelin
PATH=/sbin:/bin:/usr/sbin:/usr/bin
_=/bin/printenv
PWD=/home/zeppelin
JAVA_HOME=/usr/jdk64/jdk1.8.0_112
LANG=en_AU.UTF-8
HADOOP_CONF_DIR=/etc/hadoop/conf
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
ZEPPELIN_LOG_DIR=/var/log/zeppelin
KINIT_FAIL_THRESHOLD=5
HOME=/home/mansop
SUDO_COMMAND=/bin/bash -c  source /usr/hdp/current/zeppelin-server/conf/zeppelin-env.sh; /usr/jdk64/jdk1.8.0_112/bin/java -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-shell-mansop-zeppelin-zama-mlx.mlx.log -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -cp /etc/zeppelin/conf/external-dependency-conf:/usr/hdp/current/zeppelin-server/interpreter/sh/*:/usr/hdp/current/zeppelin-server/lib/interpreter/*: org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 10.0.11.18 42949 :
SHLVL=2
ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/zeppelin/conf/external-dependency-conf
LOGNAME=mansop
SUDO_GID=1005


So the system variable ‘USER’ does not have the same value in %shell and % spark2.pyspark interpreter?

Thank you

Manuel

From: Manuel Sopena Ballesteros [mailto:manuel.sb@garvan.org.au]
Sent: Monday, September 23, 2019 10:43 AM
To: users@zeppelin.apache.org
Subject: RE: question about spark impersonation

Thank you for your prompt response Jeff,

May I ask which environment variable should I fix?

These are the ones I can see:

XDG_SESSION_ID=c20724
MASTER=yarn-client
HOSTNAME=zama-mlx.mlx
SHELL=/bin/bash
TERM=unknown
HISTSIZE=1000
QTDIR=/usr/lib64/qt-3.3
KERBEROS_REFRESH_INTERVAL=1d
USER=mansop
SUDO_USER=zeppelin
SUDO_UID=1012
ZEPPELIN_PID_DIR=/var/run/zeppelin
USERNAME=mansop
ZEPPELIN_IMPERSONATE_CMD=sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
MAIL=/var/spool/mail/zeppelin
PATH=/sbin:/bin:/usr/sbin:/usr/bin
_=/bin/printenv
PWD=/home/zeppelin
JAVA_HOME=/usr/jdk64/jdk1.8.0_112
LANG=en_AU.UTF-8
HADOOP_CONF_DIR=/etc/hadoop/conf
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
ZEPPELIN_LOG_DIR=/var/log/zeppelin
KINIT_FAIL_THRESHOLD=5
HOME=/home/mansop
SUDO_COMMAND=/bin/bash -c  source /usr/hdp/current/zeppelin-server/conf/zeppelin-env.sh; /usr/jdk64/jdk1.8.0_112/bin/java -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-shell-mansop-zeppelin-zama-mlx.mlx.log -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -cp /etc/zeppelin/conf/external-dependency-conf:/usr/hdp/current/zeppelin-server/interpreter/sh/*:/usr/hdp/current/zeppelin-server/lib/interpreter/*: org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 10.0.11.18 42949 :
SHLVL=2
ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/zeppelin/conf/external-dependency-conf
LOGNAME=mansop
SUDO_GID=1005

Thank you

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, September 20, 2019 3:56 PM
To: users
Subject: Re: question about spark impersonation

Actually the impersonation works, what you see in python interpreter is due to incorrect pass of environment variable.

Manuel Sopena Ballesteros <ma...@garvan.org.au>> 于2019年9月20日周五 下午1:45写道:
Dear Zeppelin user community,

I am trying to setup impersonation with AD integration. I followed this documentation https://github.com/sudheer0553/zeppelin-user-impersonation

And impersonation in shell interpreter works but for some reason when I print username through python using spark interpreter it won’t work as it shows zeppelin as the user.

[cid:image001.png@01D57200.03939DF0]

However YARN shows the right user

[cid:image002.png@01D57200.03939DF0]

Why my python script shows zeppelin user but YARN shows mansop?

Thank you very much

Manuel Sopena Ballesteros

Big Data Engineer | Kinghorn Centre for Clinical Genomics

 [cid:image001.png@01D4C835.ED3C2230] <https://www.garvan.org.au/>

a: 384 Victoria Street, Darlinghurst NSW 2010
p: +61 2 9355 5760  |  +61 4 12 123 123
e: manuel.sb@garvan.org.au<ma...@garvan.org.au>

Like us on Facebook<http://www.facebook.com/garvaninstitute> | Follow us on Twitter<http://twitter.com/GarvanInstitute> and LinkedIn<http://www.linkedin.com/company/garvan-institute-of-medical-research>

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.


--
Best Regards

Jeff Zhang
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

RE: question about spark impersonation

Posted by Manuel Sopena Ballesteros <ma...@garvan.org.au>.
Thank you for your prompt response Jeff,

May I ask which environment variable should I fix?

These are the ones I can see:

XDG_SESSION_ID=c20724
MASTER=yarn-client
HOSTNAME=zama-mlx.mlx
SHELL=/bin/bash
TERM=unknown
HISTSIZE=1000
QTDIR=/usr/lib64/qt-3.3
KERBEROS_REFRESH_INTERVAL=1d
USER=mansop
SUDO_USER=zeppelin
SUDO_UID=1012
ZEPPELIN_PID_DIR=/var/run/zeppelin
USERNAME=mansop
ZEPPELIN_IMPERSONATE_CMD=sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c
NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat
MAIL=/var/spool/mail/zeppelin
PATH=/sbin:/bin:/usr/sbin:/usr/bin
_=/bin/printenv
PWD=/home/zeppelin
JAVA_HOME=/usr/jdk64/jdk1.8.0_112
LANG=en_AU.UTF-8
HADOOP_CONF_DIR=/etc/hadoop/conf
XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt
ZEPPELIN_LOG_DIR=/var/log/zeppelin
KINIT_FAIL_THRESHOLD=5
HOME=/home/mansop
SUDO_COMMAND=/bin/bash -c  source /usr/hdp/current/zeppelin-server/conf/zeppelin-env.sh; /usr/jdk64/jdk1.8.0_112/bin/java -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///usr/hdp/current/zeppelin-server/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-shell-mansop-zeppelin-zama-mlx.mlx.log -Xms1024m -Xmx1024m -XX:MaxPermSize=512m -cp /etc/zeppelin/conf/external-dependency-conf:/usr/hdp/current/zeppelin-server/interpreter/sh/*:/usr/hdp/current/zeppelin-server/lib/interpreter/*: org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer 10.0.11.18 42949 :
SHLVL=2
ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/zeppelin/conf/external-dependency-conf
LOGNAME=mansop
SUDO_GID=1005

Thank you

Manuel

From: Jeff Zhang [mailto:zjffdu@gmail.com]
Sent: Friday, September 20, 2019 3:56 PM
To: users
Subject: Re: question about spark impersonation

Actually the impersonation works, what you see in python interpreter is due to incorrect pass of environment variable.

Manuel Sopena Ballesteros <ma...@garvan.org.au>> 于2019年9月20日周五 下午1:45写道:
Dear Zeppelin user community,

I am trying to setup impersonation with AD integration. I followed this documentation https://github.com/sudheer0553/zeppelin-user-impersonation

And impersonation in shell interpreter works but for some reason when I print username through python using spark interpreter it won’t work as it shows zeppelin as the user.

[cid:image001.png@01D571FB.2933FAA0]

However YARN shows the right user

[cid:image002.png@01D571FB.2933FAA0]

Why my python script shows zeppelin user but YARN shows mansop?

Thank you very much

Manuel Sopena Ballesteros

Big Data Engineer | Kinghorn Centre for Clinical Genomics

 [cid:image001.png@01D4C835.ED3C2230] <https://www.garvan.org.au/>

a: 384 Victoria Street, Darlinghurst NSW 2010
p: +61 2 9355 5760  |  +61 4 12 123 123
e: manuel.sb@garvan.org.au<ma...@garvan.org.au>

Like us on Facebook<http://www.facebook.com/garvaninstitute> | Follow us on Twitter<http://twitter.com/GarvanInstitute> and LinkedIn<http://www.linkedin.com/company/garvan-institute-of-medical-research>

NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.


--
Best Regards

Jeff Zhang
NOTICE
Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed.

Re: question about spark impersonation

Posted by Jeff Zhang <zj...@gmail.com>.
Actually the impersonation works, what you see in python interpreter is due
to incorrect pass of environment variable.

Manuel Sopena Ballesteros <ma...@garvan.org.au> 于2019年9月20日周五 下午1:45写道:

> Dear Zeppelin user community,
>
>
>
> I am trying to setup impersonation with AD integration. I followed this
> documentation https://github.com/sudheer0553/zeppelin-user-impersonation
>
>
>
> And impersonation in shell interpreter works but for some reason when I
> print username through python using spark interpreter it won’t work as it
> shows zeppelin as the user.
>
>
>
>
>
> However YARN shows the right user
>
>
>
>
>
> Why my python script shows zeppelin user but YARN shows mansop?
>
>
>
> Thank you very much
>
>
>
> Manuel Sopena Ballesteros
>
> Big Data Engineer | Kinghorn Centre for Clinical Genomics
>
>  [image: cid:image001.png@01D4C835.ED3C2230] <https://www.garvan.org.au/>
>
>
> *a:* 384 Victoria Street, Darlinghurst NSW 2010
> *p:* +61 2 9355 5760  |  +61 4 12 123 123
> *e:* manuel.sb@garvan.org.au
>
> Like us on Facebook <http://www.facebook.com/garvaninstitute> | Follow us
> on Twitter <http://twitter.com/GarvanInstitute> and LinkedIn
> <http://www.linkedin.com/company/garvan-institute-of-medical-research>
>
>
> NOTICE
> Please consider the environment before printing this email. This message
> and any attachments are intended for the addressee named and may contain
> legally privileged/confidential/copyright information. If you are not the
> intended recipient, you should not read, use, disclose, copy or distribute
> this communication. If you have received this message in error please
> notify us at once by return email and then delete both messages. We accept
> no liability for the distribution of viruses or similar in electronic
> communications. This notice should not be removed.
>


-- 
Best Regards

Jeff Zhang