You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/27 01:00:59 UTC

[GitHub] [airflow] wolfier opened a new issue, #26000: `start_date` for an existing dagrun is not set when ran with backfill flags ` --reset-dagruns --yes`

wolfier opened a new issue, #26000:
URL: https://github.com/apache/airflow/issues/26000

   ### Apache Airflow version
   
   2.3.4
   
   ### What happened
   
   When the dagrun already exists and is backfilled with the flags `--reset-dagruns --yes`, the dag run will not have a start date. This is because reset_dagruns calls [clear_task_instances](https://github.com/apache/airflow/blob/main/airflow/models/dag.py#L2020) which [sets the dagrun start date to None](https://github.com/apache/airflow/blob/main/airflow/models/taskinstance.py#L286-L291).
   
   Since the dagrun goes into running via BackfillJob rather than the SchedulerJob, the start date is not set. This doesn't happen to a new dagrun created by a BackfillJob because the [start date is determined at creation](https://github.com/apache/airflow/blob/main/airflow/jobs/backfill_job.py#L310-L320).
   
   Here is a recreation of the behaviour.
   
   First run of the backfill dagrun. No odd warnings and start date exists for Airflow to calculate the duration.
   
   ```
   astro@75512ab5e882:/usr/local/airflow$ airflow dags backfill -s 2021-12-01 -e 2021-12-01 test_module
   /usr/local/lib/python3.9/site-packages/airflow/configuration.py:528: DeprecationWarning: The sql_alchemy_conn option in [core] has been moved to the sql_alchemy_conn option in [database] - the old setting has been used, but please update your config.
     option = self._get_environment_variables(deprecated_key, deprecated_section, key, section)
   /usr/local/lib/python3.9/site-packages/airflow/configuration.py:528 DeprecationWarning: The sql_alchemy_conn option in [core] has been moved to the sql_alchemy_conn option in [database] - the old setting has been used, but please update your config.
   /usr/local/lib/python3.9/site-packages/airflow/cli/commands/dag_command.py:57 PendingDeprecationWarning: --ignore-first-depends-on-past is deprecated as the value is always set to True
   [2022-08-25 21:29:55,574] {dagbag.py:508} INFO - Filling up the DagBag from /usr/local/airflow/dags
   /usr/local/lib/python3.9/site-packages/airflow/configuration.py:528 DeprecationWarning: The sql_alchemy_conn option in [core] has been moved to the sql_alchemy_conn option in [database] - the old setting has been used, but please update your config.
   Nothing to clear.
   [2022-08-25 21:29:55,650] {executor_loader.py:105} INFO - Loaded executor: LocalExecutor
   [2022-08-25 21:29:55,896] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'test_module', 'run_python', 'backfill__2021-12-01T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/test_module.py', '--cfg-path', '/tmp/tmp_nuoic9m']
   [2022-08-25 21:30:00,665] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'test_module', 'run_python', 'backfill__2021-12-01T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/test_module.py', '--cfg-path', '/tmp/tmp_nuoic9m']
   [2022-08-25 21:30:00,679] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 1 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 1
   [2022-08-25 21:30:00,695] {dagbag.py:508} INFO - Filling up the DagBag from /usr/local/airflow/dags/test_module.py
   [2022-08-25 21:30:00,759] {task_command.py:371} INFO - Running <TaskInstance: test_module.run_python backfill__2021-12-01T00:00:00+00:00 [queued]> on host 75512ab5e882
   [2022-08-25 21:30:05,686] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 1 | succeeded: 1 | running: 0 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 1
   [2022-08-25 21:30:05,709] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'test_module', 'test', 'backfill__2021-12-01T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/test_module.py', '--cfg-path', '/tmp/tmp3w9pm1jj']
   [2022-08-25 21:30:10,659] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'test_module', 'test', 'backfill__2021-12-01T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/test_module.py', '--cfg-path', '/tmp/tmp3w9pm1jj']
   [2022-08-25 21:30:10,668] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 0 | succeeded: 1 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 0
   [2022-08-25 21:30:10,693] {dagbag.py:508} INFO - Filling up the DagBag from /usr/local/airflow/dags/test_module.py
   [2022-08-25 21:30:10,765] {task_command.py:371} INFO - Running <TaskInstance: test_module.test backfill__2021-12-01T00:00:00+00:00 [queued]> on host 75512ab5e882
   [2022-08-25 21:30:15,678] {dagrun.py:564} INFO - Marking run <DagRun test_module @ 2021-12-01T00:00:00+00:00: backfill__2021-12-01T00:00:00+00:00, externally triggered: False> successful
   [2022-08-25 21:30:15,679] {dagrun.py:609} INFO - DagRun Finished: dag_id=test_module, execution_date=2021-12-01T00:00:00+00:00, run_id=backfill__2021-12-01T00:00:00+00:00, run_start_date=2022-08-25 21:29:55.815199+00:00, run_end_date=2022-08-25 21:30:15.679256+00:00, run_duration=19.864057, state=success, external_trigger=False, run_type=backfill, data_interval_start=2021-12-01T00:00:00+00:00, data_interval_end=2021-12-02T00:00:00+00:00, dag_hash=None
   [2022-08-25 21:30:15,680] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 0 | succeeded: 2 | running: 0 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 0
   [2022-08-25 21:30:15,684] {local_executor.py:390} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
   [2022-08-25 21:30:15,829] {backfill_job.py:879} INFO - Backfill done. Exiting.
   ```
   
   Second run of the backfill dagrun with the flags `--reset-dagruns --yes`. There is a warning about start_date is not set.
   
   ```
   astro@75512ab5e882:/usr/local/airflow$ airflow dags backfill -s 2021-12-01 -e 2021-12-01 --reset-dagruns --yes test_module
   /usr/local/lib/python3.9/site-packages/airflow/configuration.py:528: DeprecationWarning: The sql_alchemy_conn option in [core] has been moved to the sql_alchemy_conn option in [database] - the old setting has been used, but please update your config.
     option = self._get_environment_variables(deprecated_key, deprecated_section, key, section)
   /usr/local/lib/python3.9/site-packages/airflow/configuration.py:528 DeprecationWarning: The sql_alchemy_conn option in [core] has been moved to the sql_alchemy_conn option in [database] - the old setting has been used, but please update your config.
   /usr/local/lib/python3.9/site-packages/airflow/cli/commands/dag_command.py:57 PendingDeprecationWarning: --ignore-first-depends-on-past is deprecated as the value is always set to True
   [2022-08-25 21:30:46,895] {dagbag.py:508} INFO - Filling up the DagBag from /usr/local/airflow/dag
   /usr/local/lib/python3.9/site-packages/airflow/configuration.py:528 DeprecationWarning: The sql_alchemy_conn option in [core] has been moved to the sql_alchemy_conn option in [database] - the old setting has been used, but please update your config.
   [2022-08-25 21:30:46,996] {executor_loader.py:105} INFO - Loaded executor: LocalExecutor
   [2022-08-25 21:30:47,275] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'test_module', 'run_python', 'backfill__2021-12-01T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/test_module.py', '--cfg-path', '/tmp/tmp3s_3bn80']
   [2022-08-25 21:30:52,010] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'test_module', 'run_python', 'backfill__2021-12-01T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/test_module.py', '--cfg-path', '/tmp/tmp3s_3bn80']
   [2022-08-25 21:30:52,029] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 1 | succeeded: 0 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 1
   [2022-08-25 21:30:52,045] {dagbag.py:508} INFO - Filling up the DagBag from /usr/local/airflow/dags/test_module.py
   [2022-08-25 21:30:52,140] {task_command.py:371} INFO - Running <TaskInstance: test_module.run_python backfill__2021-12-01T00:00:00+00:00 [queued]> on host 75512ab5e882
   [2022-08-25 21:30:57,028] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 1 | succeeded: 1 | running: 0 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 1
   [2022-08-25 21:30:57,048] {base_executor.py:91} INFO - Adding to queue: ['airflow', 'tasks', 'run', 'test_module', 'test', 'backfill__2021-12-01T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/test_module.py', '--cfg-path', '/tmp/tmprxg7g5o8']
   [2022-08-25 21:31:02,024] {local_executor.py:79} INFO - QueuedLocalWorker running ['airflow', 'tasks', 'run', 'test_module', 'test', 'backfill__2021-12-01T00:00:00+00:00', '--ignore-depends-on-past', '--local', '--pool', 'default_pool', '--subdir', 'DAGS_FOLDER/test_module.py', '--cfg-path', '/tmp/tmprxg7g5o8']
   [2022-08-25 21:31:02,032] {backfill_job.py:367} INFO - [backfill progress] | finished run 0 of 1 | tasks waiting: 0 | succeeded: 1 | running: 1 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 0
   [2022-08-25 21:31:02,085] {dagbag.py:508} INFO - Filling up the DagBag from /usr/local/airflow/dags/test_module.py
   [2022-08-25 21:31:02,178] {task_command.py:371} INFO - Running <TaskInstance: test_module.test backfill__2021-12-01T00:00:00+00:00 [queued]> on host 75512ab5e882
   [2022-08-25 21:31:07,039] {dagrun.py:564} INFO - Marking run <DagRun test_module @ 2021-12-01 00:00:00+00:00: backfill__2021-12-01T00:00:00+00:00, externally triggered: False> successful
   [2022-08-25 21:31:07,039] {dagrun.py:609} INFO - DagRun Finished: dag_id=test_module, execution_date=2021-12-01 00:00:00+00:00, run_id=backfill__2021-12-01T00:00:00+00:00, run_start_date=None, run_end_date=2022-08-25 21:31:07.039737+00:00, run_duration=None, state=success, external_trigger=False, run_type=backfill, data_interval_start=2021-12-01 00:00:00+00:00, data_interval_end=2021-12-02 00:00:00+00:00, dag_hash=None
   [2022-08-25 21:31:07,040] {dagrun.py:795} WARNING - Failed to record duration of <DagRun test_module @ 2021-12-01 00:00:00+00:00: backfill__2021-12-01T00:00:00+00:00, externally triggered: False>: start_date is not set.
   [2022-08-25 21:31:07,040] {backfill_job.py:367} INFO - [backfill progress] | finished run 1 of 1 | tasks waiting: 0 | succeeded: 2 | running: 0 | failed: 0 | skipped: 0 | deadlocked: 0 | not ready: 0
   [2022-08-25 21:31:07,043] {local_executor.py:390} INFO - Shutting down LocalExecutor; waiting for running tasks to finish.  Signal again if you don't want to wait.
   [2022-08-25 21:31:07,177] {backfill_job.py:879} INFO - Backfill done. Exiting.
   ```
   
   ### What you think should happen instead
   
   When the BackfillJob fetches the dagrun, it will also need to set the start date.
   
   It can be done right after setting the run variable. ([source](https://github.com/apache/airflow/blob/main/airflow/jobs/backfill_job.py#L310-L320))
   
   ### How to reproduce
   
   Run the backfill command first without `--reset-dagruns --yes` flags.
   
   ```
   airflow dags backfill -s 2021-12-01 -e 2021-12-01 test_module
   ```
   
   Run the backfill command with the `--reset-dagruns --yes` flags.
   
   ```
   airflow dags backfill -s 2021-12-01 -e 2021-12-01 --reset-dagruns --yes test_module
   ```
   
   ### Operating System
   
   N/A
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ephraimbuddy commented on issue #26000: `start_date` for an existing dagrun is not set when ran with backfill flags ` --reset-dagruns --yes`

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #26000:
URL: https://github.com/apache/airflow/issues/26000#issuecomment-1232692194

   > It can be done right after setting the run variable. ([source](https://github.com/apache/airflow/blob/main/airflow/jobs/backfill_job.py#L310-L320))
   
   I would prefer setting the start_date after the `run.conf` here: https://github.com/apache/airflow/blob/c982080ca1c824dd26c452bcb420df0f3da1afa8/airflow/jobs/backfill_job.py#L301 Let me know if you are working on this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #26000: `start_date` for an existing dagrun is not set when ran with backfill flags ` --reset-dagruns --yes`

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #26000:
URL: https://github.com/apache/airflow/issues/26000#issuecomment-1229084975

   Feel free to fix :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ephraimbuddy closed issue #26000: `start_date` for an existing dagrun is not set when ran with backfill flags ` --reset-dagruns --yes`

Posted by GitBox <gi...@apache.org>.
ephraimbuddy closed issue #26000: `start_date` for an existing dagrun is not set when ran with backfill flags ` --reset-dagruns --yes`
URL: https://github.com/apache/airflow/issues/26000


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org