You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@livy.apache.org by "Rabe, Jens" <je...@iwes.fraunhofer.de> on 2018/09/13 07:28:58 UTC
PYSPARK_GATEWAY_SECRET error when running Livy with Spark on YARN,
Cluster mode and using PySpark
Hello folks!
I am running CDH 5.15.0 with parcels and installed Spark 2 as per https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
I also updated the alternatives so that spark-shell, spark-submit and pyspark call spark2-shell, spark2-submit and pyspark2. I have not installed Spark 1.6.
I installed Livy manually on one of the nodes, and I can successfully use it with Scala code.
Now, I try to use it with pyspark. My first test was using Postman and sending requests directly. I sent:
{
"kind": "pyspark",
"proxyUser": "spark"
}
The session starts up fine on the cluster, I can eventually see the Spark UI coming up, but the log contains:
18/09/12 15:52:59 INFO driver.RSCDriver: Connecting to: controller.lama.nuc:10001
18/09/12 15:52:59 INFO driver.RSCDriver: Starting RPC server...
18/09/12 15:53:00 INFO rpc.RpcServer: Connected to the port 10000
18/09/12 15:53:00 WARN rsc.RSCConf: Your hostname, worker005.lama.nuc, resolves to a loopback address, but we couldn't find any external IP address!
18/09/12 15:53:00 WARN rsc.RSCConf: Set livy.rsc.rpc.server.address if you need to bind to another address.
18/09/12 15:53:00 INFO driver.RSCDriver: Received job request 4db9216d-355d-4a41-9365-32968f05e0a7
18/09/12 15:53:00 INFO driver.RSCDriver: SparkContext not yet up, queueing job request.
18/09/12 15:53:00 ERROR repl.PythonInterpreter: Process has died with 1
18/09/12 15:53:00 ERROR repl.PythonInterpreter: Traceback (most recent call last):
File "/yarn/nm/usercache/livy/appcache/application_1535188013308_0051/container_1535188013308_0051_01_000001/tmp/3015653701235928503", line 643, in <module>
sys.exit(main())
File "/yarn/nm/usercache/livy/appcache/application_1535188013308_0051/container_1535188013308_0051_01_000001/tmp/3015653701235928503", line 533, in main
exec('from pyspark.shell import sc', global_dict)
File "<string>", line 1, in <module>
File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/pyspark.zip/pyspark/shell.py", line 38, in <module>
File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 292, in _ensure_initialized
File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/pyspark.zip/pyspark/java_gateway.py", line 47, in launch_gateway
File "/usr/lib64/python2.7/UserDict.py", line 23, in __getitem__
raise KeyError(key)
KeyError: 'PYSPARK_GATEWAY_SECRET'
My livy-env.sh contains:
export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64
export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop
export HADOOP_CONF_DIR=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/conf/yarn-conf
export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
export SPARK_CONF_DIR=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/conf
export LIVY_LOG_DIR=/var/log/livy
My livy.conf contains:
livy.repl.jars = hdfs:///livy/repl/commons-codec-1.9.jar,hdfs:///livy/repl/livy-core_2.11-0.4.0-SNAPSHOT.jar,hdfs:///livy/repl/livy-repl_2.11-0.4.0-SNAPSHOT.jar,hdfs:///some/custom/library-i-wrote-myself.jar
livy.rsc.jars = hdfs:///livy/rsc/livy-api-0.4.0-SNAPSHOT.jar,hdfs:///livy/rsc/livy-rsc-0.4.0-SNAPSHOT.jar,hdfs:///livy/rsc/netty-all-4.0.29.Final.jar
livy.rsc.rpc.server.address = 192.168.42.200
livy.server.recovery.state-store.url = worker001.lama.nuc:2181,worker002.lama.nuc:2181,worker003.lama.nuc:2181
livy.server.recovery.state-store = zookeeper
livy.spark.deploy-mode = cluster
livy.spark.master = yarn
What am I missing?
Regards
Jens
RE: PYSPARK_GATEWAY_SECRET error when running Livy with Spark on
YARN, Cluster mode and using PySpark
Posted by "Rabe, Jens" <je...@iwes.fraunhofer.de>.
Hello,
thanks for clarifying this. Fortunately, we don't need pyspark with Livy so bad, Scala Spark is sufficient for us. We only use pyspark with spark-submit directly or in Zeppelin, so I can wait for the next release.
-----Original Message-----
From: Marcelo Vanzin <va...@cloudera.com>
Sent: Donnerstag, 13. September 2018 18:42
To: user@livy.incubator.apache.org
Subject: Re: PYSPARK_GATEWAY_SECRET error when running Livy with Spark on YARN, Cluster mode and using PySpark
That requires a Livy fix that's currently only in the master branch.
Another option is to use the previous version of the Cloudera parcel (which does not have the fix for CVE-2018-1334, which introduced this incompatibility in Livy).
On Thu, Sep 13, 2018 at 12:29 AM Rabe, Jens <je...@iwes.fraunhofer.de> wrote:
>
> Hello folks!
>
>
>
> I am running CDH 5.15.0 with parcels and installed Spark 2 as per
> https://www.cloudera.com/documentation/spark2/latest/topics/spark2_ins
> talling.html
>
> I also updated the alternatives so that spark-shell, spark-submit and pyspark call spark2-shell, spark2-submit and pyspark2. I have not installed Spark 1.6.
>
> I installed Livy manually on one of the nodes, and I can successfully use it with Scala code.
>
>
>
> Now, I try to use it with pyspark. My first test was using Postman and sending requests directly. I sent:
>
>
>
> {
>
> "kind": "pyspark",
>
> "proxyUser": "spark"
>
> }
>
>
>
> The session starts up fine on the cluster, I can eventually see the Spark UI coming up, but the log contains:
>
>
>
> 18/09/12 15:52:59 INFO driver.RSCDriver: Connecting to:
> controller.lama.nuc:10001
>
> 18/09/12 15:52:59 INFO driver.RSCDriver: Starting RPC server...
>
> 18/09/12 15:53:00 INFO rpc.RpcServer: Connected to the port 10000
>
> 18/09/12 15:53:00 WARN rsc.RSCConf: Your hostname, worker005.lama.nuc, resolves to a loopback address, but we couldn't find any external IP address!
>
> 18/09/12 15:53:00 WARN rsc.RSCConf: Set livy.rsc.rpc.server.address if you need to bind to another address.
>
> 18/09/12 15:53:00 INFO driver.RSCDriver: Received job request
> 4db9216d-355d-4a41-9365-32968f05e0a7
>
> 18/09/12 15:53:00 INFO driver.RSCDriver: SparkContext not yet up, queueing job request.
>
> 18/09/12 15:53:00 ERROR repl.PythonInterpreter: Process has died with
> 1
>
> 18/09/12 15:53:00 ERROR repl.PythonInterpreter: Traceback (most recent call last):
>
> File
> "/yarn/nm/usercache/livy/appcache/application_1535188013308_0051/conta
> iner_1535188013308_0051_01_000001/tmp/3015653701235928503", line 643,
> in <module>
>
> sys.exit(main())
>
> File
> "/yarn/nm/usercache/livy/appcache/application_1535188013308_0051/conta
> iner_1535188013308_0051_01_000001/tmp/3015653701235928503", line 533,
> in main
>
> exec('from pyspark.shell import sc', global_dict)
>
> File "<string>", line 1, in <module>
>
> File
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/li
> b/spark2/python/lib/pyspark.zip/pyspark/shell.py", line 38, in
> <module>
>
> File
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/li
> b/spark2/python/lib/pyspark.zip/pyspark/context.py", line 292, in
> _ensure_initialized
>
> File
> "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/li
> b/spark2/python/lib/pyspark.zip/pyspark/java_gateway.py", line 47, in
> launch_gateway
>
> File "/usr/lib64/python2.7/UserDict.py", line 23, in __getitem__
>
> raise KeyError(key)
>
> KeyError: 'PYSPARK_GATEWAY_SECRET'
>
>
>
> My livy-env.sh contains:
>
>
>
> export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64
>
> export
> HADOOP_HOME=/opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/had
> oop
>
> export
> HADOOP_CONF_DIR=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13
> .3.p0.458809/lib/spark2/conf/yarn-conf
>
> export
> SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0
> .458809/lib/spark2
>
> export
> SPARK_CONF_DIR=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.
> 3.p0.458809/lib/spark2/conf
>
> export LIVY_LOG_DIR=/var/log/livy
>
>
>
> My livy.conf contains:
>
> livy.repl.jars =
> hdfs:///livy/repl/commons-codec-1.9.jar,hdfs:///livy/repl/livy-core_2.
> 11-0.4.0-SNAPSHOT.jar,hdfs:///livy/repl/livy-repl_2.11-0.4.0-SNAPSHOT.
> jar,hdfs:///some/custom/library-i-wrote-myself.jar
>
> livy.rsc.jars =
> hdfs:///livy/rsc/livy-api-0.4.0-SNAPSHOT.jar,hdfs:///livy/rsc/livy-rsc
> -0.4.0-SNAPSHOT.jar,hdfs:///livy/rsc/netty-all-4.0.29.Final.jar
>
> livy.rsc.rpc.server.address = 192.168.42.200
>
> livy.server.recovery.state-store.url =
> worker001.lama.nuc:2181,worker002.lama.nuc:2181,worker003.lama.nuc:218
> 1
>
> livy.server.recovery.state-store = zookeeper
>
> livy.spark.deploy-mode = cluster
>
> livy.spark.master = yarn
>
>
>
> What am I missing?
>
>
>
> Regards
>
> Jens
--
Marcelo
Re: PYSPARK_GATEWAY_SECRET error when running Livy with Spark on
YARN, Cluster mode and using PySpark
Posted by Marcelo Vanzin <va...@cloudera.com>.
That requires a Livy fix that's currently only in the master branch.
Another option is to use the previous version of the Cloudera parcel
(which does not have the fix for CVE-2018-1334, which introduced this
incompatibility in Livy).
On Thu, Sep 13, 2018 at 12:29 AM Rabe, Jens
<je...@iwes.fraunhofer.de> wrote:
>
> Hello folks!
>
>
>
> I am running CDH 5.15.0 with parcels and installed Spark 2 as per https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
>
> I also updated the alternatives so that spark-shell, spark-submit and pyspark call spark2-shell, spark2-submit and pyspark2. I have not installed Spark 1.6.
>
> I installed Livy manually on one of the nodes, and I can successfully use it with Scala code.
>
>
>
> Now, I try to use it with pyspark. My first test was using Postman and sending requests directly. I sent:
>
>
>
> {
>
> "kind": "pyspark",
>
> "proxyUser": "spark"
>
> }
>
>
>
> The session starts up fine on the cluster, I can eventually see the Spark UI coming up, but the log contains:
>
>
>
> 18/09/12 15:52:59 INFO driver.RSCDriver: Connecting to: controller.lama.nuc:10001
>
> 18/09/12 15:52:59 INFO driver.RSCDriver: Starting RPC server...
>
> 18/09/12 15:53:00 INFO rpc.RpcServer: Connected to the port 10000
>
> 18/09/12 15:53:00 WARN rsc.RSCConf: Your hostname, worker005.lama.nuc, resolves to a loopback address, but we couldn't find any external IP address!
>
> 18/09/12 15:53:00 WARN rsc.RSCConf: Set livy.rsc.rpc.server.address if you need to bind to another address.
>
> 18/09/12 15:53:00 INFO driver.RSCDriver: Received job request 4db9216d-355d-4a41-9365-32968f05e0a7
>
> 18/09/12 15:53:00 INFO driver.RSCDriver: SparkContext not yet up, queueing job request.
>
> 18/09/12 15:53:00 ERROR repl.PythonInterpreter: Process has died with 1
>
> 18/09/12 15:53:00 ERROR repl.PythonInterpreter: Traceback (most recent call last):
>
> File "/yarn/nm/usercache/livy/appcache/application_1535188013308_0051/container_1535188013308_0051_01_000001/tmp/3015653701235928503", line 643, in <module>
>
> sys.exit(main())
>
> File "/yarn/nm/usercache/livy/appcache/application_1535188013308_0051/container_1535188013308_0051_01_000001/tmp/3015653701235928503", line 533, in main
>
> exec('from pyspark.shell import sc', global_dict)
>
> File "<string>", line 1, in <module>
>
> File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/pyspark.zip/pyspark/shell.py", line 38, in <module>
>
> File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/pyspark.zip/pyspark/context.py", line 292, in _ensure_initialized
>
> File "/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/python/lib/pyspark.zip/pyspark/java_gateway.py", line 47, in launch_gateway
>
> File "/usr/lib64/python2.7/UserDict.py", line 23, in __getitem__
>
> raise KeyError(key)
>
> KeyError: 'PYSPARK_GATEWAY_SECRET'
>
>
>
> My livy-env.sh contains:
>
>
>
> export JAVA_HOME=/usr/java/jdk1.8.0_181-amd64
>
> export HADOOP_HOME=/opt/cloudera/parcels/CDH-5.15.0-1.cdh5.15.0.p0.21/lib/hadoop
>
> export HADOOP_CONF_DIR=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/conf/yarn-conf
>
> export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2
>
> export SPARK_CONF_DIR=/opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.13.3.p0.458809/lib/spark2/conf
>
> export LIVY_LOG_DIR=/var/log/livy
>
>
>
> My livy.conf contains:
>
> livy.repl.jars = hdfs:///livy/repl/commons-codec-1.9.jar,hdfs:///livy/repl/livy-core_2.11-0.4.0-SNAPSHOT.jar,hdfs:///livy/repl/livy-repl_2.11-0.4.0-SNAPSHOT.jar,hdfs:///some/custom/library-i-wrote-myself.jar
>
> livy.rsc.jars = hdfs:///livy/rsc/livy-api-0.4.0-SNAPSHOT.jar,hdfs:///livy/rsc/livy-rsc-0.4.0-SNAPSHOT.jar,hdfs:///livy/rsc/netty-all-4.0.29.Final.jar
>
> livy.rsc.rpc.server.address = 192.168.42.200
>
> livy.server.recovery.state-store.url = worker001.lama.nuc:2181,worker002.lama.nuc:2181,worker003.lama.nuc:2181
>
> livy.server.recovery.state-store = zookeeper
>
> livy.spark.deploy-mode = cluster
>
> livy.spark.master = yarn
>
>
>
> What am I missing?
>
>
>
> Regards
>
> Jens
--
Marcelo