You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/09 11:16:15 UTC

[GitHub] [airflow] mgorsk1 commented on issue #17238: Unexpected skipped state for tasks run with KubernetesExecutor

mgorsk1 commented on issue #17238:
URL: https://github.com/apache/airflow/issues/17238#issuecomment-895140926


   Ok I think we've more-less figured what was the reason. I am curious to know what do you think about our findings:
   1. We had a DAGs code containing `dagrun_timeout=timedelta(minutes=60)`. This code was running on prod for 2 years with Airflow `1.10+` 
   2. According to docs (https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/models/index.html) this configuration option should not take effect if DAG has `schedule=None` which was the case for all of our DAGs. 
   ```
   dagrun_timeout (datetime.timedelta) – specify how long a DagRun should be up before timing out / failing, so that new DagRuns can be created. **The timeout is only enforced for scheduled DagRuns.**
   ```
   3. After migrating to Airflow 2 our DAGs, which previously took more than 1 hour has noted significant improvement in DAG run so it became highly unlikely to reach 60 minutes of a timeout, but whenever it happened indeed the tasks were marked as skipped and subsequently as failed.
   Below code shows this inconsistency between docs and actual behavior (timeout should not happened but actually tasks are killed mid-run, ):
   ```
   import time
   from datetime import datetime, timedelta
   
   from airflow import DAG
   from airflow.operators.dummy import DummyOperator
   from airflow.operators.python import PythonOperator
   
   default_args = {"start_date": datetime(2021, 7, 30)}
   
   with DAG("dagrun_timeout_error", default_args=default_args,
            schedule_interval=None,
            dagrun_timeout=timedelta(seconds=60)) as dag:
       start = DummyOperator(task_id="start")
       end = DummyOperator(task_id="end")
       for i in range(5):
           prev = start
           for j in range(3):
               t = PythonOperator(
                   task_id=f"t-{i}-{j}", python_callable=lambda: time.sleep(120)
               )
               prev >> t
               prev = t
           t >> end
   ```
   So I understand it's either change of behavior that wasn't documented properly or it's a bug with undesired behavior. Let me know WDYT.
   
   Thanks to @dechoma for debugging this together.
   
   cc @jedcunningham @ephraimbuddy  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org