You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/26 17:11:19 UTC

[GitHub] [airflow] CSammy opened a new issue #19844: catchup=False is ignored to some extent, backfill happens

CSammy opened a new issue #19844:
URL: https://github.com/apache/airflow/issues/19844


   ### Apache Airflow version
   
   2.2.2 (latest released)
   
   ### Operating System
   
   Debian GNU/Linux 10 (buster) / official Airflow Docker image
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==2.4.0
   apache-airflow-providers-celery==2.1.0
   apache-airflow-providers-cncf-kubernetes==2.1.0
   apache-airflow-providers-docker==2.3.0
   apache-airflow-providers-elasticsearch==2.1.0
   apache-airflow-providers-ftp==2.0.1
   apache-airflow-providers-google==6.1.0
   apache-airflow-providers-grpc==2.0.1
   apache-airflow-providers-hashicorp==2.1.1
   apache-airflow-providers-http==2.0.1
   apache-airflow-providers-imap==2.0.1
   apache-airflow-providers-microsoft-azure==3.3.0
   apache-airflow-providers-mysql==2.1.1
   apache-airflow-providers-odbc==2.0.1
   apache-airflow-providers-postgres==2.3.0
   apache-airflow-providers-redis==2.0.1
   apache-airflow-providers-sendgrid==2.0.1
   apache-airflow-providers-sftp==2.2.0
   apache-airflow-providers-slack==4.1.0
   apache-airflow-providers-sqlite==2.0.1
   apache-airflow-providers-ssh==2.3.0
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Deployment via Helm chart on GKE. Helm chart v 1.3.0, pinned Docker tag to `2.2.2-python3.9`. Isolated namespace on Kubernetes 1.16.
   
   Customization:
   - git-sync activated
   
   I can provide the full output of `airflow info` if desired.
   
   Since the question arose in previous conversation: Executor is the `CeleryExecutor`.
   
   ### What happened
   
   In a DAG with KubernetesPodOperators, following settings were used:
   ```python
       schedule_interval="0 0 * * 6",
       start_date=datetime.datetime(2021, 11, 1),
       catchup=False,
   ```
   
   When running the DAG via the Airflow UI, backfill jobs for the dates `2021-11-13` and `2021-11-20` are created and run.
   
   ### What you expected to happen
   
   I expected one job for today being created and run, no backfill jobs.
   
   ### How to reproduce
   
   Complete DAG file:
   
   ```python
   import datetime
   import os
   
   from airflow import DAG
   from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator
   
   with DAG(
       dag_id="debug_dag",
       # Saturday midnight
       schedule_interval="0 0 * * 6",
       start_date=datetime.datetime(2021, 11, 1),
       catchup=False,
       tags=["debug dag for catchup tests"],
       default_args=default_args,
   ) as dag:
       gcp_test_task = KubernetesPodOperator(
           # Task name in Airflow 2 UI
           task_id="gcp-test-task",
           # Pod name
           name="task-gcp-test-task",
           "image": "google/cloud-sdk:slim",
           cmds=["sleep", "300"],
           "namespace": os.environ["K8S_NAMESPACE"],
           # K8s service account linked to the GCP service account
           "service_account_name": "airflow2-dag-default",
           "image_pull_policy": "Always",
           "get_logs": True,
       )
   
   gcp_test_task
   ```
   
   Click on the "Run" button to see backfill jobs being created.
   
   ### Anything else
   
   This behaviour has been reproducible with multiple DAGs having this `schedule_interval` and `start_date`.
   
   It is not reproducible in the same way however with `schedule_interval="10 3 * * *", start_date=datetime.datetime(2021, 11, 1), catchup=False`. For this one, it shows "Next Run: 2021-11-25 03:10:00" (which is still not what I expected, but it is not backfilling the entire month).
   
   Possibly this is a misunderstanding about scheduling and/or backfill on my part.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #19844: catchup=False is ignored to some extent, backfill happens

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #19844:
URL: https://github.com/apache/airflow/issues/19844#issuecomment-980163993


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org