You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chetan Khatri <ch...@gmail.com> on 2022/05/08 16:36:48 UTC
Need help on migrating Spark on Hortonworks to Kubernetes Cluster
Hi Everyone, I need help on my Airflow DAG which has Spark Submit and Now I
have Kubernetes Cluster instead Hortonworks Linux Distributed Spark Cluster.My
existing Spark-Submit is through BashOperator as below:
calculation1 = '/usr/hdp/2.6.5.0-292/spark2/bin/spark-submit --conf
spark.yarn.maxAppAttempts=1 --conf
spark.dynamicAllocation.executorAllocationRatio=1 --conf
spark.executor.heartbeatInterval=30s --conf
spark.dynamicAllocation.executorIdleTimeout=60s --conf
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=15s --conf
spark.network.timeout=800s --conf
spark.dynamicAllocation.schedulerBacklogTimeout=15s --conf
spark.shuffle.service.enabled=true --conf
spark.dynamicAllocation.enabled=true --conf
spark.dynamicAllocation.minExecutors=4 --conf
spark.dynamicAllocation.initialExecutors=4 --conf
spark.dynamicAllocation.maxExecutors=8 --conf
"spark.driver.extraJavaOptions=-Djava.util.logging.config.file=/opt/airflow/dags/logging.properties"
--executor-cores 4 --executor-memory 8g --driver-memory 12g --master
yarn --class com.wkelms.phoenix.incremental.invoice.Calculations
/opt/airflow/dags/nextgen-phoenix-incremental-assembly-0.1.jar 1
"Incremental" "/opt/airflow/dags/load_batch_configuration.json"'
tCalculateBatch1 = BashOperator(
task_id="calculate_batch_1",
dag=dag,
trigger_rule="all_success",
bash_command=calculation1,
)
But Now I have Kubernetes Cluster and SparkMaster, SparkWorker, and Airflow
are pods, so How it should be written/designed?from airflow-scheduler how
can I submit the Spark Job on spark-worker?
*Kubernetes Pods are as below*
[root@spark-phoenix ~]# kubectl get pods -A
NAMESPACE NAME READY STATUS
RESTARTS AGE
kube-system helm-install-traefik-crd-dn82j 0/1
Completed 0 37d
kube-system helm-install-traefik-vrcz8 0/1
Completed 1 37d
kube-system local-path-provisioner-5ff76fc89d-mrgzd 1/1
Running 16 37d
kube-system coredns-7448499f4d-92xhx 1/1
Running 11 37d
airflow airflow-statsd-7586f9998-j29h7 1/1
Running 1 2d10h
kube-system metrics-server-86cbb8457f-q9tt2 1/1
Running 11 37d
kube-system svclb-traefik-vt9xw 2/2
Running 22 37d
airflow airflow-postgresql-0 1/1
Running 1 2d10h
kube-system traefik-6b84f7cbc-csffr 1/1
Running 11 37d
spark spark-worker-0 1/1
Running 11 37d
spark spark-master-0 1/1
Running 11 37d
spark spark-worker-1 1/1
Running 11 37d
airflow airflow-triggerer-6cc8c54495-w4jzz 1/1
Running 1 2d10h
airflow airflow-scheduler-7694ccf55-5r9kw 2/2
Running 2 2d10h
airflow airflow-webserver-68655785c7-lmgzg 1/1
Running 0 21h