You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/04/30 15:35:51 UTC

[GitHub] [airflow] jeffolsi opened a new issue #8649: Add support for more than 1 corn exp per DAG

jeffolsi opened a new issue #8649:
URL: https://github.com/apache/airflow/issues/8649


   
   **Description**
   Allow DAG to accept list of cron expression and schedule the dag in correlation to all of them.
   Similar to how it can be done in cron job
   
   **Use case / motivation**
   Some scheduling like: every 10 min between 16:30 to 18:10 can not be obtained with single cron expression. The idea is that DAG will have the ability to be set according to more than 1 cron but without duplicating the DAG code or the DAG entry in the UI
   
   
   Even simple scheduling which is common for ETL : bi-weekly can not be done with single cron expression: https://serverfault.com/questions/404398/how-to-schedule-a-biweekly-cronjob
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-763720363


   I've started a discussion thread on this on the dev mailing list to scope out what a solution to this will look like https://lists.apache.org/thread.html/rb4e004e68574e5fb77ee5b51f4fd5bfb4b3392d884c178bc767681bf%40%3Cdev.airflow.apache.org%3E
   
   Use cases there would be ace (and feedback once we come up with a design)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] BasPH commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
BasPH commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-622068812


   Can you provide an example (screenshot/code/whatever) where that happens? As far as I know, the next execution date is always computed with the `start_date` and `schedule_interval`, not the execution date of the last DAG run.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jeffolsi commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
jeffolsi commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-623064361


   @BasPH 
   This is the DAG defintion:
   ```
   with DAG(
       dag_id=DAG_NAME,
       default_args=default_args,
       schedule_interval=timedelta(minutes=60),
       max_active_runs=1,
       catchup=False
   ) as dag:
   ```
   
   This is an example for the execution times:
   ![delay](https://user-images.githubusercontent.com/64190742/80907885-3cb3a780-8d23-11ea-8f96-a3d646aefa3e.jpg)
   
   As you can this DAG is hourly by `timedelta(minutes=60)` but it's not the same as specifying `@hourly` or `0 * * * * `.  You can also see the gap in times (marked in red) when Airflow was down. When it got up again it gave a "new" timestamp to the execution_date.
   
   I'm sure you can understand that there is no business logic behind the time stamp of `XX:46:10.998426`
   
   So as said before `timedelta(minutes=60)` **is not equivalent to** `@hourly` or cron job experssion.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jeffolsi commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
jeffolsi commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-623311926


   @BasPH I'm running 1.10.3 
   I'm not sure what exactly to report on the new issue. I don't consider this a bug but maybe i'm wrong. I just wanted to explain why the suggestion to use  `timedelta()` does not solve this issue so Airflow needs to support multipule cron expressions for single DAG.
   
   I think this is a very important feature for Airflow. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] BasPH commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
BasPH commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-622061008


   An immediate solution to your last sentence is to use timedelta. This is also supported: `schedule_interval=timedelta(weeks=2)`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] BasPH commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
BasPH commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-623074363


   Thanks for pointing this out @jeffolsi, that indeed makes no sense and seems like a fundamental error which should be fixed. What version are you running on? Let's make a separate issue for it.
   
   Regarding the multiple cron expressions, I've seen the request multiple times and think it would be a good addition. The [apscheduler library](https://apscheduler.readthedocs.io/en/stable/index.html) has something for combining intervals: https://apscheduler.readthedocs.io/en/stable/modules/triggers/combining.html. I think similar behaviour would be nice to integrate in Airflow too.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tambulkar commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
tambulkar commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-661879550


   Is there any update on this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mdediana commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
mdediana commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-634780827


   @mik-laj Sure, I will do that, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sarit-si edited a comment on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
sarit-si edited a comment on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-668283438


   @mdediana 
   > I would like to work on this.
   > 
   > The idea would be to allow a list of cron expressions as a `schedule_interval`. For example, the scheduling in the description would be defined as `schedule_interval = ['30/10 16 * * *', '*/10 17 * * *', '0,10 18 * * *']`. Do you think this is the way to go?
   
   This will be of great help. Instead of creating separate DAGs for the same job (like what currently I am doing), this would reduce to just 1 DAG taking care of multiple schedules. One workaround right now is if the crons are not strict, one can tweak multiple crons to have the minutes dimension same for all, for ex : "45 0,8,13 * * *", this will run for 0045, 0845 and 1345 Hrs respectively.
   Unfortunately, the crons in my case are strict (0100, 0815 and 1330 Hrs), hence have to create 3 separate DAGs. 
   Enabling schedule interval to accept list of crons would be very helpful :) 👍 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal closed issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
eladkal closed issue #8649:
URL: https://github.com/apache/airflow/issues/8649


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-1015838739


   I think the request as described here (bi-weekly job) is covered fully by AIP 39 already using Timetables
   https://airflow.apache.org/docs/apache-airflow/stable/concepts/timetable.html
   
   Closing as issue solved
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-634531019


   @mdediana We had long discussions about whether to support multiple scheduler intervals.  Many people think that this can affect the presentation and readability of the collected data.  This can also complicate the scheduler logic.  Can you describe your idea on the mailing list?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] themantalope commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
themantalope commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-634766517


   @mik-laj 
   
   I would recommend that the user be allow to supply a list of cron strings or cron strings with comma separation. I would then implement a object that has internal logic like [this](https://github.com/kiorky/croniter/pull/23#issuecomment-555828306) implementation of scheduling with multiple `croniter` objects. The object should also have a `get_next()` function similar to the one [currently used by the `DAG` object (see `following` implementation)](https://github.com/apache/airflow/blob/738667082d32d3ef93ec2cd6c3735ff3691ba1cc/airflow/models/dag.py#L488). If just one cron string is supplied then the DAG uses the `croniter` object as is currently implemented. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mdediana commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
mdediana commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-633981061


   I would like to work on this.
   
   The idea would be to allow a list of cron expressions as a `schedule_interval`. For example, the scheduling in the description would be defined as `schedule_interval = ['30/10 16 * * *', '*/10 17 * * *', '0,10 18 * * *']`. Do you think this is the way to go?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jeffolsi commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
jeffolsi commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-622065472


   > An immediate solution to your last sentence is to use timedelta. This is also supported: `schedule_interval=timedelta(weeks=2)`.
   
   It's not the same. When specifying cron exp you guaranty that tasks will be fired when the time comes. If you use `timedelta(weeks=2)` you are risking that a delay in running of one task will cause further delay in others as it always look for 2 weeks difference than the last task
   
   to explain lets use daily for simplicity:
   2020-04-28 0 0 * * * - this will run every day:
   
   2020-04-29 00:00:00
   2020-05-01 00:00:00
   
    Now lets say that airflow was down  and the run of 2020-04-29 00:00:00 started to run on 2020-04-29 04:00:00, the next run will still be on 2020-05-01 00:00:00
   
   
   On the other hand with:
   2020-04-28 timedelta(days=1)
   if the run of 2020-04-29 00:00:00 started to run on 2020-04-29 04:00:00, the next run will still be on **2020-05-01 04:00:00** The whole schedule is shifted because of the delay!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sarit-si commented on issue #8649: Add support for more than 1 cron exp per DAG

Posted by GitBox <gi...@apache.org>.
sarit-si commented on issue #8649:
URL: https://github.com/apache/airflow/issues/8649#issuecomment-668283438


   > I would like to work on this.
   > 
   > The idea would be to allow a list of cron expressions as a `schedule_interval`. For example, the scheduling in the description would be defined as `schedule_interval = ['30/10 16 * * *', '*/10 17 * * *', '0,10 18 * * *']`. Do you think this is the way to go?
   
   This will be of great help. Instead of creating separate DAGs for the same job (like what currently I am doing), this would reduce to just 1 DAG taking care of multiple schedules. One workaround right now is if the crons are not strict, one can tweak multiple crons to have the minutes dimension same for all, for ex : "45 0,8,13 * * *", this will run for 0045, 0845 and 1345 Hrs respectively.
   Unfortunately, the crons in my case are strict (0100, 0815 and 1330 Hrs), hence have to create 3 separate DAGs. 
   Enabling schedule interval to accept list of crons would be very helpful :) 👍 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org