You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/10/11 13:43:03 UTC

[GitHub] [airflow] kcphila opened a new issue, #26991: Jobs scheduled with cron-syntax for monthly or weekly execution are not triggering

kcphila opened a new issue, #26991:
URL: https://github.com/apache/airflow/issues/26991

   ### Apache Airflow version
   
   2.4.1
   
   ### What happened
   
   On our Airflow 2.4.1 instance, we have 1 weekly report (schedule `17 10 * * 1`) and 1 monthly report (schedule `30 0 9 * *`) (and no others). The dags didn't have their schedules changed (which appears to be related to the 1.10 issue). They are both built in python files that have other dags, most of which are daily. 
   
   Neither of them execute.  Today (2022-10-11), the weekly dag still reports `Next Run: 2022-10-10, 10:17:00`.  
   
   The monthly DAG (which should have run on the 9th), has iterated to next month. I pinpointed the point in which it rescheduled the next execution to the next month, which was exactly 24 hours after the scheduled runtime. I therefore suspect the weekly dag also to be rescheduled later today
   
   ```conf
   [2022-10-11T00:29:40.561-0400] {logging_mixin.py:117} INFO - [2022-10-11T00:29:40.561-0400] {dag.py:3324} INFO - Setting next_dagrun for crons_update_data_dictionary to 2022-10-09T04:30:00+00:00, run_after=2022-11-09T05:30:00+00:00
   [2022-10-11T00:30:11.563-0400] {processor.py:768} INFO - DAG(s) dict_keys(['crons_update_airflow', 'crons_heartbeat', 'crons_update_markdown_documentation', 'crons_update_data_dictionary']) retrieved from /srv/local/git/airflow/dags/crons_system.py
   ... after the completion of the normal processing of the crons_system.py when nothing is run ...
   [2022-10-11T00:30:11.754-0400] {logging_mixin.py:117} INFO - [2022-10-11T00:30:11.754-0400] {dag.py:3324} INFO - Setting next_dagrun for crons_update_data_dictionary to 2022-11-09T05:30:00+00:00, run_after=2022-12-09T05:30:00+00:00
   [2022-10-11T00:30:42.756-0400] {processor.py:768} INFO - DAG(s) dict_keys(['crons_update_markdown_documentation', 'crons_heartbeat', 'crons_update_airflow', 'crons_update_data_dictionary']) retrieved from /srv/local/git/airflow/dags/crons_system.py
   ```
   
   
   ### What you think should happen instead
   
   The dags should execute based on their schedule.
   
   ### How to reproduce
   
   Our airflow distribution is in a git repo and can be easily deployed as temporary development servers. I can confirm this gets reproduced in duplicate deployments, and so I do not believe this behavior is specific to our instance.
   
   ### Operating System
   
   Ubuntu 22.04 running on AWS
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-celery==3.0.0
   apache-airflow-providers-common-sql==1.2.0
   apache-airflow-providers-ftp==3.1.0
   apache-airflow-providers-http==4.0.0
   apache-airflow-providers-imap==3.0.0
   apache-airflow-providers-postgres==5.2.1
   apache-airflow-providers-sqlite==3.2.1
   apache-airflow-providers-ssh==3.1.0
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   We have a production instance that primarily servers as our Airflow hub, and so this is a pip-based global install
   
   ### Anything else
   
   I'd be happy to submit a PR if we can identify the issue.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kcphila closed issue #26991: Jobs scheduled with cron-syntax for monthly or weekly execution are not triggering

Posted by GitBox <gi...@apache.org>.
kcphila closed issue #26991: Jobs scheduled with cron-syntax for monthly or weekly execution are not triggering
URL: https://github.com/apache/airflow/issues/26991


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kcphila commented on issue #26991: Jobs scheduled with cron-syntax for monthly or weekly execution are not triggering

Posted by GitBox <gi...@apache.org>.
kcphila commented on issue #26991:
URL: https://github.com/apache/airflow/issues/26991#issuecomment-1277945251

   @ephraimbuddy, Aha! 
   
   I had the `start_date` set to ` datetime.datetime.now(tz=localtz) - datetime.timedelta(days=2)` and `catchup = False`.  I changed the start_date to a couple months ago and reset the crontab to run and it ran.
   
   I've generally not used the start_date for anything meaningful and we don't have tasks that should backfill, and so those were standard parameters (so standard that I have a separate constructor that sets the defaults if not explicitly set). Should I infer from this that the `start_date` must be some time before the last full interval that is being run? 
   
   Is there any utility to `start_date` except in conjunction with `catchup` in order to backfill iterative tasks? Would it be better to set the start date to, say, the start of epoch if it's not used to avoid issues like this, or would that have other side effects?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kcphila commented on issue #26991: Jobs scheduled with cron-syntax for monthly or weekly execution are not triggering

Posted by GitBox <gi...@apache.org>.
kcphila commented on issue #26991:
URL: https://github.com/apache/airflow/issues/26991#issuecomment-1277958478

   Thank you for your insight, @ephraimbuddy! It's nice to have this mystery solved so easily.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ephraimbuddy commented on issue #26991: Jobs scheduled with cron-syntax for monthly or weekly execution are not triggering

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #26991:
URL: https://github.com/apache/airflow/issues/26991#issuecomment-1276530538

   Can you share the dag code? I suspect that the DAG's `start_date` is dynamic. If so, consider setting the `start_date` to a date in the past and have `catchup=False`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ephraimbuddy commented on issue #26991: Jobs scheduled with cron-syntax for monthly or weekly execution are not triggering

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on issue #26991:
URL: https://github.com/apache/airflow/issues/26991#issuecomment-1277956129

   > @ephraimbuddy, Aha!
   > 
   > I had the `start_date` set to ` datetime.datetime.now(tz=localtz) - datetime.timedelta(days=2)` and `catchup = False`. I changed the start_date to a couple months ago and reset the crontab to run and it ran.
   > 
   > I've generally not used the start_date for anything meaningful and we don't have tasks that should backfill, and so those were standard parameters (so standard that I have a separate constructor that sets the defaults if not explicitly set). Should I infer from this that the `start_date` must be some time before the last full interval that is being run?
   > 
   > Is there any utility to `start_date` except in conjunction with `catchup` in order to backfill iterative tasks (or to hard-code a future start date)? Would it be better to set the start date to, say, the start of epoch if it's not used to avoid issues like this, or would that have other side effects?
   
   `start_date` is very important to airflow, it's what is used to calculate when your dag should run. Whether you need backfilling or not, it's important to set the start_date to a date in the past. Not necessary to set it to the start of epoch, the important thing is that it's static and in the past. 
   Due to how databases handle date & time, I won't recommend using the start of epoch. You can have the dates be the start of a year in the past e.g datetime(2022,1,1)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org