You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chetan Khatri <ch...@gmail.com> on 2022/05/08 16:36:48 UTC

Need help on migrating Spark on Hortonworks to Kubernetes Cluster

Hi Everyone, I need help on my Airflow DAG which has Spark Submit and Now I
have Kubernetes Cluster instead Hortonworks Linux Distributed Spark Cluster.My
existing Spark-Submit is through BashOperator as below:

calculation1 = '/usr/hdp/2.6.5.0-292/spark2/bin/spark-submit  --conf
spark.yarn.maxAppAttempts=1  --conf
spark.dynamicAllocation.executorAllocationRatio=1  --conf
spark.executor.heartbeatInterval=30s  --conf
spark.dynamicAllocation.executorIdleTimeout=60s  --conf
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=15s --conf
spark.network.timeout=800s  --conf
spark.dynamicAllocation.schedulerBacklogTimeout=15s --conf
spark.shuffle.service.enabled=true --conf
spark.dynamicAllocation.enabled=true --conf
spark.dynamicAllocation.minExecutors=4 --conf
spark.dynamicAllocation.initialExecutors=4 --conf
spark.dynamicAllocation.maxExecutors=8  --conf
"spark.driver.extraJavaOptions=-Djava.util.logging.config.file=/opt/airflow/dags/logging.properties"
 --executor-cores 4  --executor-memory 8g --driver-memory 12g --master
yarn --class com.wkelms.phoenix.incremental.invoice.Calculations
/opt/airflow/dags/nextgen-phoenix-incremental-assembly-0.1.jar 1
"Incremental" "/opt/airflow/dags/load_batch_configuration.json"'
tCalculateBatch1 = BashOperator(
    task_id="calculate_batch_1",
    dag=dag,
    trigger_rule="all_success",
    bash_command=calculation1,
)

But Now I have Kubernetes Cluster and SparkMaster, SparkWorker, and Airflow
are pods, so How it should be written/designed?from airflow-scheduler how
can I submit the Spark Job on spark-worker?
*Kubernetes Pods are as below*

[root@spark-phoenix ~]# kubectl get pods -A
NAMESPACE     NAME                                      READY   STATUS
     RESTARTS   AGE
kube-system   helm-install-traefik-crd-dn82j            0/1
Completed   0          37d
kube-system   helm-install-traefik-vrcz8                0/1
Completed   1          37d
kube-system   local-path-provisioner-5ff76fc89d-mrgzd   1/1
Running     16         37d
kube-system   coredns-7448499f4d-92xhx                  1/1
Running     11         37d
airflow       airflow-statsd-7586f9998-j29h7            1/1
Running     1          2d10h
kube-system   metrics-server-86cbb8457f-q9tt2           1/1
Running     11         37d
kube-system   svclb-traefik-vt9xw                       2/2
Running     22         37d
airflow       airflow-postgresql-0                      1/1
Running     1          2d10h
kube-system   traefik-6b84f7cbc-csffr                   1/1
Running     11         37d
spark         spark-worker-0                            1/1
Running     11         37d
spark         spark-master-0                            1/1
Running     11         37d
spark         spark-worker-1                            1/1
Running     11         37d
airflow       airflow-triggerer-6cc8c54495-w4jzz        1/1
Running     1          2d10h
airflow       airflow-scheduler-7694ccf55-5r9kw         2/2
Running     2          2d10h
airflow       airflow-webserver-68655785c7-lmgzg        1/1
Running     0          21h