You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by GitBox <gi...@apache.org> on 2022/10/06 21:26:41 UTC
[GitHub] [yunikorn-site] wilfred-s commented on a diff in pull request #191: [YUNIKORN-1345] Updating the Spark tutorial in yunikorn website

wilfred-s commented on code in PR #191:
URL: https://github.com/apache/yunikorn-site/pull/191#discussion_r989483896


##########
docs/user_guide/workloads/run_spark.md:
##########
@@ -75,6 +78,12 @@ rules:
 - apiGroups: [""]
   resources: ["configmaps"]
   verbs: ["get", "create", "delete"]
+- apiGroups: [""]
+  resources: ["services"]
+  verbs: ["get", "create", "delete"]
+- apiGroups: [""]
+  resources: ["persistentvolumeclaims"]
+  verbs: ["get", "create", "delete"]

Review Comment:
   For default simple example runs we should not need this. Not even sure the `configmaps` is needed either.



##########
docs/user_guide/workloads/run_spark.md:
##########
@@ -37,8 +37,11 @@ team, or 2) build one from scratch. If you want to build your own Spark docker i
 * Download a Spark version that has Kubernetes support, URL: https://github.com/apache/spark
 * Build spark with Kubernetes support:
 ```shell script
-mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.4 -Phive -Pkubernetes -Phive-thriftserver -DskipTests package
+git clone https://github.com/apache/spark.git
+cd spark

Review Comment:
   We should not force people to clone git. There are options on the Spark build page that allow using a runnable distro also. Fixing line 37 with the download should be included.
   https://spark.apache.org/docs/latest/building-spark.html#building-a-runnable-distribution



##########
docs/user_guide/workloads/run_spark.md:
##########
@@ -106,26 +115,36 @@ kubectl proxy
 
 Run a simple SparkPi job (this assumes that the Spark binaries are installed to `/usr/local` directory).
 ```shell script
-export SPARK_HOME=/usr/local/spark-2.4.4-bin-hadoop2.7/
+export SPARK_HOME=/usr/local/spark/
 ${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi \
    --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.namespace=spark-test \
    --conf spark.kubernetes.executor.request.cores=1 \
-   --conf spark.kubernetes.container.image=apache/yunikorn:spark-2.4.4 \
-   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-test:spark \
-   local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar
+   --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.3.0 \

Review Comment:
   If we use the docker hosted image we do not need to build anything. The build instructions were for the local image. Need to make that clear.



##########
docs/user_guide/workloads/run_spark.md:
##########
@@ -106,26 +115,36 @@ kubectl proxy
 
 Run a simple SparkPi job (this assumes that the Spark binaries are installed to `/usr/local` directory).
 ```shell script
-export SPARK_HOME=/usr/local/spark-2.4.4-bin-hadoop2.7/
+export SPARK_HOME=/usr/local/spark/
 ${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi \
    --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.namespace=spark-test \
    --conf spark.kubernetes.executor.request.cores=1 \
-   --conf spark.kubernetes.container.image=apache/yunikorn:spark-2.4.4 \
-   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-test:spark \
-   local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar
+   --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.3.0 \
+   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
+   local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0.jar
 ```
 
+:::note
+There are more options for setting the driver and executor in the [spark](https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration).
+Assigning the applicationId and the queue path are possible.
+```
+--conf spark.kubernetes.executor.label.applicationId=application-spark-0001
+--conf spark.kubernetes.driver.label.applicationId=application-spark-0001  
+--conf spark.kubernetes.executor.label.queue=default.root.sandbox
+--conf spark.kubernetes.driver.label.queue=default.root.sandbox
+```
+:::
+
 You'll see Spark driver and executors been created on Kubernetes:
 
-![spark-pods](./../../assets/spark-pods.png)
+![spark-pods](./../../assets/RunningSparkOnK8s.png)

Review Comment:
   If this replaces the current image you need to remove the unused image.



##########
docs/user_guide/workloads/run_spark.md:
##########
@@ -106,26 +115,36 @@ kubectl proxy
 
 Run a simple SparkPi job (this assumes that the Spark binaries are installed to `/usr/local` directory).
 ```shell script
-export SPARK_HOME=/usr/local/spark-2.4.4-bin-hadoop2.7/
+export SPARK_HOME=/usr/local/spark/
 ${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi \
    --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.namespace=spark-test \
    --conf spark.kubernetes.executor.request.cores=1 \
-   --conf spark.kubernetes.container.image=apache/yunikorn:spark-2.4.4 \
-   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-test:spark \
-   local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar
+   --conf spark.kubernetes.container.image=docker.io/apache/spark:v3.3.0 \
+   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
+   local:///opt/spark/examples/jars/spark-examples_2.12-3.3.0.jar
 ```
 
+:::note
+There are more options for setting the driver and executor in the [spark](https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration).
+Assigning the applicationId and the queue path are possible.
+```
+--conf spark.kubernetes.executor.label.applicationId=application-spark-0001
+--conf spark.kubernetes.driver.label.applicationId=application-spark-0001  
+--conf spark.kubernetes.executor.label.queue=default.root.sandbox
+--conf spark.kubernetes.driver.label.queue=default.root.sandbox
+```
+:::
+
 You'll see Spark driver and executors been created on Kubernetes:
 
-![spark-pods](./../../assets/spark-pods.png)
+![spark-pods](./../../assets/RunningSparkOnK8s.png)
 
-You can also view the job info from YuniKorn UI. If you do not know how to access the YuniKorn UI, please read the document
-[here](../../get_started/get_started.md#access-the-web-ui).
+The spark-pi result is in the driver pod.
 
-![spark-jobs-on-ui](./../../assets/spark-jobs-on-ui.png)
+![spark-pods](./../../assets/sparkResult.png)

Review Comment:
   If this replaces the current image you need to remove the unused image.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@yunikorn.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org