You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/20 08:22:59 UTC

[GitHub] [airflow] tirkarthi commented on issue #27147: SparkKubernetesOperator: Dag fails when application_file sent as a ".yaml" file

tirkarthi commented on issue #27147:
URL: https://github.com/apache/airflow/issues/27147#issuecomment-1285131240

   Can you please add a sample dag file to reproduce this? I tried below code with "config.yaml" relative to the dag file and with a print statement in airflow code before the yaml parsing at https://github.com/apache/airflow/blob/b9e133e40c2848b0d555051a99bf8d2816fd28a7/airflow/providers/cncf/kubernetes/hooks/kubernetes.py#L281-L284. I was able to see the yaml content.
   
   ```python
   import datetime
   
   from airflow.decorators import dag
   from airflow.providers.cncf.kubernetes.operators.spark_kubernetes import SparkKubernetesOperator
   
   @dag(start_date=datetime.datetime(2021, 1, 1))
   def mydag():
       op = SparkKubernetesOperator(
               application_file="config.yaml",
               kubernetes_conn_id='kubernetes_with_namespace',
               task_id='test_task_id',
           )
   
   mydag()
   ```
   
   ```
   airflow dags test mydag
   [2022-10-20 08:18:58,772] {dagbag.py:537} INFO - Filling up the DagBag from /files/dags
   [2022-10-20 08:18:58,864] {dag.py:3654} INFO - dagrun id: mydag
   /opt/airflow/airflow/models/dag.py:3669 RemovedInAirflow3Warning: Calling `DAG.create_dagrun()` without an explicit data interval is deprecated
   [2022-10-20 08:18:58,893] {dag.py:3671} INFO - created dagrun <DagRun mydag @ 2022-10-20T08:18:58.772022+00:00: manual__2022-10-20T08:18:58.772022+00:00, state:running, queued_at: None. externally triggered: False>
   [2022-10-20 08:18:58,905] {dag.py:3621} INFO - *****************************************************
   [2022-10-20 08:18:58,905] {dag.py:3625} INFO - Running task test_task_id
   [2022-10-20 08:18:59,323] {taskinstance.py:1587} INFO - Exporting the following env vars:
   AIRFLOW_CTX_DAG_OWNER=airflow
   AIRFLOW_CTX_DAG_ID=mydag
   AIRFLOW_CTX_TASK_ID=test_task_id
   AIRFLOW_CTX_EXECUTION_DATE=2022-10-20T08:18:58.772022+00:00
   AIRFLOW_CTX_TRY_NUMBER=1
   AIRFLOW_CTX_DAG_RUN_ID=manual__2022-10-20T08:18:58.772022+00:00
   [2022-10-20 08:18:59,323] {taskinstance.py:1587} INFO - Exporting the following env vars:
   AIRFLOW_CTX_DAG_OWNER=airflow
   AIRFLOW_CTX_DAG_ID=mydag
   AIRFLOW_CTX_TASK_ID=test_task_id
   AIRFLOW_CTX_EXECUTION_DATE=2022-10-20T08:18:58.772022+00:00
   AIRFLOW_CTX_TRY_NUMBER=1
   AIRFLOW_CTX_DAG_RUN_ID=manual__2022-10-20T08:18:58.772022+00:00
   [2022-10-20 08:18:59,323] {spark_kubernetes.py:70} INFO - Creating sparkApplication
   [2022-10-20 08:18:59,323] {spark_kubernetes.py:70} INFO - Creating sparkApplication
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: spark-pi
     namespace: default
   spec:
     type: Scala
     mode: cluster
     image: "gcr.io/spark-operator/spark:v2.4.5"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.SparkPi
     mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar"
     sparkVersion: "2.4.5"
     restartPolicy:
       type: Never
     volumes:
       - name: "test-volume"
         hostPath:
           path: "/tmp"
           type: Directory
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 2.4.5
       serviceAccount: spark
       volumeMounts:
         - name: "test-volume"
           mountPath: "/tmp"
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 2.4.5
       volumeMounts:
         - name: "test-volume"
           mountPath: "/tmp"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org