You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Rose Aysina (Jira)" <ji...@apache.org> on 2023/06/09 09:01:00 UTC
[jira] [Created] (ZEPPELIN-5928) Spark Interpreter fails with "has no attribute '_wrapped'" in Kubernetes with Zeppelin

Rose Aysina created ZEPPELIN-5928:
-------------------------------------

             Summary: Spark Interpreter fails with "has no attribute '_wrapped'" in Kubernetes with Zeppelin
                 Key: ZEPPELIN-5928
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-5928
             Project: Zeppelin
          Issue Type: Bug
          Components: Kubernetes, pySpark, spark
    Affects Versions: 0.10.1
            Reporter: Rose Aysina
             Fix For: 0.11.0
         Attachments: Dockerfile, Screenshot 2023-06-09 at 11.39.01 AM.png, Screenshot 2023-06-09 at 11.39.34 AM.png, log.txt

Hola!

We discovered bug while deploying Zeppelin on Kubernetes with Spark Interpreter also in Kubernetes mode (also attached full log to this issue):

 
{noformat}
INFO [2023-06-09 07:59:49,053] ({FIFOScheduler-interpreter_537057292-Worker-1} PythonInterpreter.java[bootstrapInterpreter]:562) - Bootstrap interpreter via python/zeppelin_pyspark.py
ERROR [2023-06-09 07:59:50,416] ({FIFOScheduler-interpreter_537057292-Worker-1} PySparkInterpreter.java[open]:104) - Fail to bootstrap pyspark
java.io.IOException: Fail to run bootstrap script: python/zeppelin_pyspark.py
%text Fail to execute line 54:   sqlc = __zSqlc__ = __zSpark__._wrapped
Traceback (most recent call last):
  File "/tmp/python129188975973677791/zeppelin_python.py", line 162, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 54, in <module>
AttributeError: 'SparkSession' object has no attribute '_wrapped'

	at org.apache.zeppelin.python.PythonInterpreter.bootstrapInterpreter(PythonInterpreter.java:579)
	at org.apache.zeppelin.spark.PySparkInterpreter.open(PySparkInterpreter.java:102)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:844)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:752)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
	at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132)
	at org.apache.zeppelin.scheduler.FIFOScheduler.lambda$runJobInScheduler$0(FIFOScheduler.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
 INFO [2023-06-09 07:59:50,418] ({FIFOScheduler-interpreter_537057292-Worker-1} PySparkInterpreter.java[close]:112) - Close PySparkInterpreter
 INFO [2023-06-09 07:59:50,418] ({FIFOScheduler-interpreter_537057292-Worker-1} PythonInterpreter.java[close]:259) - Kill python process
 INFO [2023-06-09 07:59:50,423] ({FIFOScheduler-interpreter_537057292-Worker-1} AbstractScheduler.java[runJob]:154) - Job 20230605-095507_490419065 finished by scheduler interpreter_537057292 with status ERROR
 WARN [2023-06-09 07:59:50,425] ({Exec Default Executor} ProcessLauncher.java[onProcessFailed]:134) - Process with cmd [python, /tmp/python129188975973677791/zeppelin_python.py, 10.165.178.231, 41565] is failed due to
org.apache.commons.exec.ExecuteException: Process exited with an error: 143 (Exit value: 143)
	at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
	at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48)
	at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200)
	at java.lang.Thread.run(Thread.java:750)
 INFO [2023-06-09 07:59:50,426] ({Exec Default Executor} ProcessLauncher.java[transition]:109) - Process state is transitioned to TERMINATED
{noformat}
 

!Screenshot 2023-06-09 at 11.39.01 AM.png|width=383,height=209!
 
**

*Exact problem on Stack Overflow from another user:* [https://stackoverflow.com/q/75949679]

 

How I configure Spark in the notebook in Zeppelin Server (also see screenshots):
{code:java}
%spark.conf

spark.executor.instances  5
spark.kubernetes.container.image.pullSecrets docker-registry
spark.app.name SparkTESTSparkTESTSparkTESTSparkTESTSparkTESTSparkTESTSparkTESTSparkTESTSparkTEST
spark.jars.ivy /tmp
{code}
 

Then Spark Interpreter starts, then it starts spark Executors (all of them in K8S), then raises the error above. 
Executors don't fail after the error, it's just Zeppelin Server can't connect to Interpreter and prints the same error in the notebook cell.

!Screenshot 2023-06-09 at 11.39.34 AM.png|width=383,height=209!

 

*Environment:*
 * Zeppelin version: *0.10.1*
 * Spark version: *3.4.0*
 * Python version: *3.11*
 * Dockerfile for Zeppelin Interpreter is attached.

 

*What do we do now:* we use version *0.11.0-SNAPSHOT* and build docker images from Zeppelin sources on latest master in GitHub. {color:#de350b}In the newest version there is no such bug!{color}

 

*So the questions are:*
 - Will patch to similar issues be released? If so, then when?
 - Can the issue be fixed in the current version?

 

Thank you! Any help will be appreciated.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)