You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Anirudh Ramanathan (JIRA)" <ji...@apache.org> on 2018/06/21 00:12:00 UTC

[jira] [Resolved] (SPARK-24547) Spark on K8s docker-image-tool.sh improvements

     [ https://issues.apache.org/jira/browse/SPARK-24547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anirudh Ramanathan resolved SPARK-24547.
----------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.4.0

> Spark on K8s docker-image-tool.sh improvements
> ----------------------------------------------
>
>                 Key: SPARK-24547
>                 URL: https://issues.apache.org/jira/browse/SPARK-24547
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes
>    Affects Versions: 2.4.0
>            Reporter: Ray Burgemeestre
>            Priority: Minor
>              Labels: docker, kubernetes, spark
>             Fix For: 2.4.0
>
>
> *Context*
> PySpark support for Spark on k8s was merged with [https://github.com/apache/spark/pull/21092/files] few days ago
> There is a helper script that can be used to create docker containers to run java and now also python jobs. It works like this:
> {{/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 build}}
>  {{/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 push}}
> *Problem*
> I ran into three two issues. First time I generated images for 2.4.0 Docker was using it's cache, so actually when running jobs, old jars where still in the Docker image. This produces errors like this in the executors:
> {code:java}
> 2018-06-13 10:27:52 INFO NettyBlockTransferService:54 - Server created on 172.29.3.4:44877^M 2018-06-13 10:27:52 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy^M 2018-06-13 10:27:52 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(1, 172.29.3.4, 44877, None)^M 2018-06-13 10:27:52 ERROR CoarseGrainedExecutorBackend:91 - Executor self-exiting due to : Unable to create executor due to Exception thrown in awaitResult: ^M org.apache.spark.SparkException: Exception thrown in awaitResult: ^M ^Iat org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)^M ^Iat org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)^M ^Iat org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)^M ^Iat org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)^M ^Iat org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:64)^M ^Iat org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:241)^M ^Iat org.apache.spark.executor.Executor.<init>(Executor.scala:116)^M ^Iat org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)^M ^Iat org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)^M ^Iat org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)^M ^Iat org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)^M ^Iat org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)^M ^Iat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)^M ^Iat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)^M ^Iat java.lang.Thread.run(Thread.java:748)^M Caused by: java.lang.RuntimeException: java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 6155820641931972169, local class serialVersionUID = -3720498261147521051^M ^Iat java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)^M ^Iat java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)^M ^Iat java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)^M
> {code}
> To avoid that Docker has to build without it's cache, but only if you have build for an older version in the past...
> The second problem was that the spark container is pushed, but the spark-py container wasn't yet. This was just forgotten in the initial PR.
> (A third problem I also ran into because I had an older docker was [https://github.com/apache/spark/pull/21551] so I have not included a fix for that in this ticket.)
> Other than that it works great!
> *Solution*
> I've added an extra flag so it's possible to call build with `-n` for --no-cache`.
> And I've added the extra push for the spark-py container.
> *Example*
> ./bin/docker-image-tool.sh -r docker.io/myrepo -t v2.3.0 -n build
> Snippet from the help output:
> {code:java}
> Options:
> -f file Dockerfile to build for JVM based Jobs. By default builds the Dockerfile shipped with Spark.
> -p file Dockerfile with Python baked in. By default builds the Dockerfile shipped with Spark.
> -r repo Repository address.
> -t tag Tag to apply to the built image, or to identify the image to be pushed.
> -m Use minikube's Docker daemon.
> -n Build docker image with --no-cache{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org