You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/20 13:20:28 UTC

[GitHub] [airflow] raj-manvar opened a new issue #13788: Add support for "H" syntax in cron scheduling

raj-manvar opened a new issue #13788:
URL: https://github.com/apache/airflow/issues/13788


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the following questions.
   Don't worry if they're not all applicable; just try to include what you can :-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   **Description**
   
   It could be beneficial for Airflow to support Jenkins' "H" cron syntax in Airflow scheduling. The reason for this is to mitigate a stampede of tasks at the top of every hour / some interval, which can currently straining resources based on applications. 
   
   "H" syntax specifies to run the DAG during a window of time, allowing the scheduler to spread out jobs based on a hash value. For instance, the syntax "H(0-15) * * * *" means to schedule any time in the first 15 minutes, or "H * * * *" would mean to schedule during any minute of the hour.
   <!-- A short description of your feature -->
   
   **Use case / motivation**
   
   <!-- What do you want to happen?
   
   Rather than telling us how you might implement this solution, try to take a
   step back and describe what you are trying to achieve.
   
   -->
   
   Aim is to resolve the stampede of task occuring at some hour of day or at some midnight of day of month. 
   Currently we need to reserve more resources for Airflow to handle peaks of many tasks trying to schedule because of this. 
   H syntax will help with better distribution of load with time and save resources.
   
   **Are you willing to submit a PR?**
   
   <!--- We accept contributions! -->
   Yup. from some code digging, it looks like Airflow does the crontab scheduling using some Python library. If the library already supports H syntax, it'd be simpler, but if not I'd need some more guidance / research support
   
   **Related Issues**
   
   <!-- Is there currently another issue associated with this? -->
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #13788: Add support for "H" syntax in cron scheduling

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #13788:
URL: https://github.com/apache/airflow/issues/13788#issuecomment-804786325


   @pgrandjean  you might want to take a look at [AIP-39 Richer scheduler_interval](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-39+Richer+scheduler_interval)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] pgrandjean commented on issue #13788: Add support for "H" syntax in cron scheduling

Posted by GitBox <gi...@apache.org>.
pgrandjean commented on issue #13788:
URL: https://github.com/apache/airflow/issues/13788#issuecomment-783707086


   What would be the impact of replacing "date intervals" with a CRON-like behaviour?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] pgrandjean edited a comment on issue #13788: Add support for "H" syntax in cron scheduling

Posted by GitBox <gi...@apache.org>.
pgrandjean edited a comment on issue #13788:
URL: https://github.com/apache/airflow/issues/13788#issuecomment-783707086


   What would be the impact of replacing "date intervals" with a CRON-like behaviour? What important features of Airflow would be lost?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #13788: Add support for "H" syntax in cron scheduling

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #13788:
URL: https://github.com/apache/airflow/issues/13788#issuecomment-763601243


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #13788: Add support for "H" syntax in cron scheduling

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #13788:
URL: https://github.com/apache/airflow/issues/13788#issuecomment-763609646


   I think this one might be difficult before switching (or rather enabling) suport for "regular" cron behaviour. Airflow does NOT work like cron even if the specification is cron-like. Airflow works on "data intervals" rather than. on CRON schedule. It means that the time specified in DAG is not the schedule, but rather indication which data interval Airflow should work on. It indicates the "beginnin" of the data interval each run should work on. This mean tha Airflow starts at midnight finishing Monday if you want to process Monday's data.
   
   This is not intuitive and we discuss if it should be allowed to run "cron jobs" regularly. And while I can imagine H () might be used there as well it is gonna be even more confusing (as the data interval should still be covering full day till midnight even if the job starts 15 minutes later. For me having a "random delay" in the DAG definition as separate parameter would be better.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org