You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by foxish <gi...@git.apache.org> on 2017/12/21 00:26:42 UTC

[GitHub] spark pull request #19946: [SPARK-22648] [K8S] Spark on Kubernetes - Documen...

Github user foxish commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19946#discussion_r158170697
  
    --- Diff: docs/running-on-kubernetes.md ---
    @@ -0,0 +1,573 @@
    +---
    +layout: global
    +title: Running Spark on Kubernetes
    +---
    +* This will become a table of contents (this text will be scraped).
    +{:toc}
    +
    +Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). This feature makes use of native
    +Kubernetes scheduler that has been added to Spark.
    +
    +# Prerequisites
    +
    +* A runnable distribution of Spark 2.3 or above.
    +* A running Kubernetes cluster at version >= 1.6 with access configured to it using
    +[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not already have a working Kubernetes cluster,
    +you may setup a test cluster on your local machine using
    +[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
    +  * We recommend using the latest release of minikube with the DNS addon enabled.
    +* You must have appropriate permissions to list, create, edit and delete
    +[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources
    +by running `kubectl auth can-i <list|create|edit|delete> pods`.
    +  * The service account credentials used by the driver pods must be allowed to create pods, services and configmaps.
    +* You must have [Kubernetes DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) configured in your cluster.
    +
    +# How it works
    +
    +<p style="text-align: center;">
    +  <img src="img/k8s-cluster-mode.png" title="Spark cluster components" alt="Spark cluster components" />
    +</p>
    +
    +<code>spark-submit</code> can be directly used to submit a Spark application to a Kubernetes cluster.
    +The submission mechanism works as follows:
    +
    +* Spark creates a Spark driver running within a [Kubernetes pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
    +* The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code.
    +* When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists
    +logs and remains in "completed" state in the Kubernetes API until it's eventually garbage collected or manually cleaned up.
    +
    +Note that in the completed state, the driver pod does *not* use any computational or memory resources.
    +
    +The driver and executor pod scheduling is handled by Kubernetes. It will be possible to affect Kubernetes scheduling
    +decisions for driver and executor pods using advanced primitives like
    +[node selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
    +and [node/pod affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
    +in a future release.
    +
    +# Submitting Applications to Kubernetes
    +
    +## Docker Images
    +
    +Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to
    +be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is
    +frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized
    +and built for your usage.
    +
    +You may build these docker images from sources.
    +There is a script, `sbin/build-push-docker-images.sh` that you can use to build and push
    +customized Spark distribution images consisting of all the above components.
    +
    +Example usage is:
    +
    +    ./sbin/build-push-docker-images.sh -r <repo> -t my-tag build
    +    ./sbin/build-push-docker-images.sh -r <repo> -t my-tag push
    +
    +Docker files are under the `kubernetes/dockerfiles/` directory and can be customized further before
    +building using the supplied script, or manually.
    +
    +## Cluster Mode
    +
    +To launch Spark Pi in cluster mode,
    +
    +{% highlight bash %}
    +$ bin/spark-submit \
    +    --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \
    +    --deploy-mode cluster \
    +    --name spark-pi \
    --- End diff --
    
    Good point, will update with caveat.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org