You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/02 07:26:56 UTC

[GitHub] [airflow] malthe opened a new pull request #19361: Implement task-level include/exclude time filter

malthe opened a new pull request #19361:
URL: https://github.com/apache/airflow/pull/19361


   This adds support for including or excluding a task execution during a DAG run based on a filtering mechanism that matches on the execution date.
   
   The changeset adds two optional parameters `include_date` and `exclude_date` to the base operator class – along with a new method `include_in_dagrun(execution_date)` which determines whether or not to _schedule_ a task instance or immediately mark it as _skipped_.
   
   The time filters are instances of classes under `airflow.timefilters` and can be combined using the bitwise and/or operators.
   
   Currently implemented time filters:
   
   - `airflow.timefilters.cron.CronTimeFilter` – matches on a cron expression


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958992438






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957613452


   Yeah, this is definitely a hard no from me now as proposed.
   
   The way this is achieved right now is to use a BranchOperator or similar to skip a task when the condition is false. That coupled with the recently added "edge labels" on the graphs makes that quite a nice solution. Is there something you don't like about doing it this way?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957490564


   I will be able to take a closer look later this week. I have not yet looked at it, but it seems that this feature is something that we need a very good look at adn make sure we understand the consequences of it especially when we consider future ideas of being able to make Airflow more 'task-oriented" than "Dag-oriented" which we know is something we want to work on.
   
   Again - I have not looked at it so it is hard to comment yet and I do not know the scope and consequences of that, But I generally see three options we could treat this kind of change (since it was not discussed in an AIP/devlist before:
   
   a) if it will fit into the long term idea and is an incremental change without foreesen big impact on target architecture we "envision" (even if it is not specified yet) we could iterate on it and merge it as "tactical" solution for task scheduling
   
   b) if it will be risky - in terms of blocking some ideas and directins - we should discuss those and see if the ideas are viable/whether we can turn it into a).  If we cannot and the risk is too big (and we foresee to work on more "task-based" scheduling soon then this PR might be good one to discuss stuff and eventull go to solution c)
   
   c) we could start discussing more "fully featured" solution for task-based scheduling and start discussin in the devlist and start drafting AIP - in which case this PR might serve as a good "starting point" for discussion - either as an example of things we want to avoid and implement differently or showing where we still need to get missing puzzles to fit in order to know that the direction we take is good.
   
   I have no idea whatsoever which of the options above it can fit in - it can be either, but I think it's good to agree and let @malthe know soon from the committers who would be interested, which direction is most likely here (I am at my hometown today and coming back tomorrow - so i will take a look some time later this week) - but also I'd love to hear what others who thought about the "task-based airflow" to state their opinions here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-971871586


   Also you might take a look at https://github.com/apache/airflow/pull/17421 @malthe - this one adds greater flexibility to the ShortCircuit operator, without complicating the "true/false" output. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-982035727


   @dstandish we can close this; I think from my part what resulted from the discussions here is that a short-circuit operator that's based on cron is nice in that it can gradually move from having to actually execute in a worker to being considered for special treatment in the scheduler (given that we do not execute user code, but merely evaluated cron expressions).
   
   Also, I think there are some nice thoughts here in terms of composable time filters which I believe can be useful to extend our ability to due time-based scheduling. Lots of use-cases involve scheduling that can't be expressed using cron.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957613452


   Yeah, this is definitely a hard now from me now as proposed.
   
   The way this is achieved right now is to use a BranchOperator or similar to skip a task when the condition is false. That coupled with the recently added "edge labels" on the graphs makes that quite a nice solution. Is there something you don't like about doing it this way?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958992438


   > If the goal is richer task scheduling, I wonder if we should do something similar to timetables and just allow the user to define custom trigger rules.
   
   Assuming current scope of trigger rules which just checks the status of a task I'm not sure if custom trigger rules has real value. I started a discussion about it in https://github.com/apache/airflow/issues/17010
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957748951


   @ashb well interestingly, along the lines of the optimization around `DummyOperator` in which the scheduler is able in the common case to immediately mark the task as successful (rather than actually execute the operator which does nothing), to me task-level scheduling (i.e., the proposed `include_date` and `exclude_date` parameters) is also something that belongs as the scheduling level:
   
   - It avoids running an operator merely to immediately check the execution date and mark the task as skipped.
   - The ergonomics are much better. Want to skip Sundays? We could have `exclude_date=SUNDAY` (where `SUNDAY` might simply be a ready-to-use `CronTimeFilter` instance).
   
   I think branching operators are best used to make decisions based on the result of a task run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957490564


   I will be able to take a closer look later this week. I have not yet looked at it, but it seems that this feature is something that we need a very good look at adn make sure we understand the consequences of it especially when we consider future ideas of being able to make Airflow more 'task-oriented" than "Dag-oriented" which we know is something we want to work on.
   
   Again - I have not looked at it so it is hard to comment yet and I do not know the scope and consequences of that, But I generally see three options we could treat this kind of change (since it was not discussed in an AIP/devlist before:
   
   a) if it will fit into the long term idea and is an incremental change without foreesen big impact on target architecture we "envision" (even if it is not specified yet) we could iterate on it and merge it as "tactical" solution for task scheduling
   
   b) if it will be risky - in terms of blocking some ideas and directins - we should discuss those and see if the ideas are viable/whether we can turn it into a)  if we cannot and the risk is too big (and we foresee to work on more "task-based" scheduling soon then this PR might be good one to discuss stuff and eventull go to solution c)
   
   c) we could start discussing more "fully featured" solution for task-based scheduling and start discussin in the devlist and start drafting AIP - in which case this PR might serve as a good "starting point" for discussion - either as an example of things we want to avoid and implement differently or showing where we still need to get missing puzzles to fit in order to know that the direction we take is good.
   
   I have no idea whatsoever which of the options above it can fit in - it can be either, but I think it's good to agree and let @malthe know soon from the committers who would be interested, which direction is most likely here (I am at my hometown today and coming back tomorrow - so i will take a look some time later this week) - but also I'd love to hear what others who thought about the "task-based airflow" to state their opinions here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-984018010


   > I would say that this is something for SLA breach callback which is dedicated feature that eventually we will need to do. The current SLA feature in Airflow works really weird and very limited.
   
   Totally agree on that one :) . 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-981939463


   If the goal is to skip a Task without ever needing a worker, then   `execute` or `pre_execute` doesn't make much difference -- both would run user code and so not be able to be run in the scheduler securely.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958077223


   >    Something like `trigger_rule={"state": "all_success", "include_date": ...}` – ?
   
   I can understand the concept of trigger rule to support more than just check against state.
   I mentioned it also in:
   https://github.com/apache/airflow/pull/17576#issuecomment-916429784
   
   That said, I'm not sure I follow on the concept of include/exclude. It feels dangerous to me. For example if the task has external sensor in another waiting for it to be executed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957613452


   Yeah, this is definitely a hard now from me now as proposed.
   
   The way this is achieved right now is to use a BranchOperator or similar to skip a task when the condition is false. That coupled with the recently added "edge labels" on the graphs makes that quite a nice solution. Is there something you don't like about doing it this way?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957490564






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957613452


   Yeah, this is definitely a hard now from me now as proposed.
   
   The way this is achieved right now is to use a BranchOperator or similar to skip a task when the condition is false. That coupled with the recently added "edge labels" on the graphs makes that quite a nice solution. Is there something you don't like about doing it this way?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958723293


   @potiuk with regards to how this might fit in then I do think it lives up to the "tactical" solution described in (a):
   
   - The implementation is simple and follows an existing pattern in the codebase.
   
   - If the functionality is supplanted by a new and better scheduling system, then I would assume this new system supports the same functionality. In this case, the migration path should be clear.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-971859483


   > Also [compare the ergonomics](https://stackoverflow.com/questions/67666968/can-we-configure-different-schedule-interval-for-different-tasks-within-a-dag) between the short-circuit operator and [BranchDayOfWeekOperator](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/weekday.html), the latter having in my opinion a rather awkward interface.
   
   Quite agree. Declartativ ShortCircuitCron (kind of) seems even better candidate for this "optimized" scheduling behaviour - could be evaluated entirely in scheduler as well and is much simpler to implement and reason about.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-971944639


   @potiuk compared to `ShortCircuitOperator` which needs a callable and thus real execution time, the benefit of our cron-based skip operator is that it can send the task itself into a skipped state. Without having thought further about it, this should simplify things in terms of downstream operators.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-981926878


   One thing that is not mentioned here is that it's not necessary to use branch operator.  In any operator you can raise AirflowSkipException and it will skip.  And therefore you can subclass apply your skipping logic prior to calling `super().execute`.
   
   Given that you can implement this skipping logic easily in a subclass, we can think of this PR as just about implementing time-based skipping logic without subclassing operators.
   
   Along these lines, one thing that might be more palatable is to add to BaseOperator the ability to inject (using params) a callable that would be called in `pre_execute`.  E.g. call it `pre_execute_hook: Optional[Callable]=None`.  Then users could call arbitrary logic in pre_execute without having to subclass.  And this would open up some interesting possibilities where  we could inject pre_execute and  post_execute callables with global task policy, which I think could be very useful.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-981990177


   > Well, @dstandish. We've added it in #17576 (after discussion with @malthe) and then there was a follow-up discussion as results of which @ashb marked it as experimental (see Ash's comment there #18140)
   I wonder what's your opinion :P
   
   Hah... hard to stay on top of everything :)   And yeah to me it seems like a good / useful thing to have in there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958723293


   @potiuk with regards to how this might fit in then I do think it lives up to the "tactical" solution described in (a):
   
   - The implementation is simple and follows an existing pattern in the codebase.
   
   - If the functionality is supplanted by a new and better scheduling system, then I would assume this new system supports the same functionality. In this case, the migration path should be clear.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957545051






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957490564


   I will be able to take a closer look later this week. I have not yet looked at it, but it seems that this feature is something that we need a very good look at adn make sure we understand the consequences of it especially when we consider future ideas of being able to make Airflow more 'task-oriented" than "Dag-oriented" which we know is something we want to work on.
   
   Again - I have not looked at it so it is hard to comment yet and I do not know the scope and consequences of that, But I generally see three options we could treat this kind of change (since it was not discussed in an AIP/devlist before:
   
   a) if it will fit into the long term idea and is an incremental change without foreesen big impact on target architecture we "envision" (even if it is not specified yet) we could iterate on it and merge it as "tactical" solution for task scheduling
   
   b) if it will be risky - in terms of blocking some ideas and directins - we should discuss those and see if the ideas are viable/whether we can turn it into a).  If we cannot and the risk is too big (and we foresee to work on more "task-based" scheduling soon then this PR might be good one to discuss stuff and eventually go to solution c)
   
   c) we could start discussing more "fully featured" solution for task-based scheduling and start discussin in the devlist and start drafting AIP - in which case this PR might serve as a good "starting point" for discussion - either as an example of things we want to avoid and implement differently or showing where we still need to get missing puzzles to fit in order to know that the direction we take is good.
   
   I have no idea whatsoever which of the options above it can fit in - it can be either, but I think it's good to agree and let @malthe know soon from the committers who would be interested, which direction is most likely here (I am at my hometown today and coming back tomorrow - so i will take a look some time later this week) - but also I'd love to hear what others who thought about the "task-based airflow" to state their opinions here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957613452


   Yeah, this is definitely a hard no from me now as proposed.
   
   The way this is achieved right now is to use a BranchOperator or similar to skip a task when the condition is false. That coupled with the recently added "edge labels" on the graphs makes that quite a nice solution. Is there something you don't like about doing it this way?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958077223


   >    Something like `trigger_rule={"state": "all_success", "include_date": ...}` – ?
   
   I can understand the concept of trigger rule to support more than just check against state.
   I mentioned it also in:
   https://github.com/apache/airflow/pull/17576#issuecomment-916429784
   
   That said, I'm not sure I follow on the concept of include/exclude. It feels dangerous to me. For example if the task has external sensor in another waiting for it to be executed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-968392532


   I thought a bit more about it and also looked at  #17010 and I side with @ashb on that one. I really think adding time-based task scheduling bring very little value and we should not not do it this way.
   
   I think there are two things here:
   
   1) being able to choose (based on a crontab) to determine the "path" the DAG should follow for the "current date" (we need to define what the current date is here BTW)
   2)  We want to make the scheduling decision in this case much faster and with less overhead that could be done by - say PythonBranch operator. Such decisions could be taken by scheduler if the task-based scheduling is implemented and it shoud be done in "declarative" rather than "imperative" way.
   
   Now - I think 2) is really an optimisation, and I would not not really like to change whole Task definition and add complexity to scheduler.  And I think we can do it much simpler - following the implementation we'v done for DummyOperator.
   
   Why don't we make specialized `CronBranchOperator`?  This operator could make Branching decisions similarly as PythonBranchOperator, but we could optimize it in the scheduler and rather than execute the task, the scheduler could evaluate the declarative specification of the operator during scheduling and set the state of the tasks apropriately witout the overhead of running the task.
   
   I am not 100% sure - but I think this woudl be a relatively simple change in Scheduler and it would not necessary require any serious modification in Airflow's behaviour. We could even add a set of other similar "FastScheduled, declarative" tasks for other similar cases, where the scheduler could "evaluate" such task and perform all the subsequent scheduling decision immediatley after. And from Dag writer point of view it is even better, because it feels much more natural IMHO to choose a "path" in the DAG based on such Task whch clearly specifies "I decide on the path based on time", rather than decide based on "task attribute" as would be in case we introduce task-level include/exclude. It would also decouple the "time-dag-processing" from all the tasks and has the opportunity of better showing all time-based decisions in the UI. IMHO tasks should only do one thing and we seem to ask them to do more than one thing - decide whether to run and run.
   
   Correct me if I am wrong @ashb but I believe the way how Scheduler works now is ideally suited for this kind of optimisation - it runs in "mini-batches" and such state update for declarative evaluation of "fixed" task types is something that scheduler might cope with very well. I think tihs is eventually even more "extensible" than embedding task-level scheduling, because we could have more "specialized fast evaluated" tasks if we find the pattern works well.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal edited a comment on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
eladkal edited a comment on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-983964535


   > BTW. An interesting use-case for pre-execute:
   > 
   > https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1638380797413000?thread_ts=1638342676.385300&cid=CCQ7EGB1P
   > 
   > Someone wants to get notified mid-flight when the task is running too long. For me sounds like a perfect case for pre-execute calllable (and possibly we could also add a decorator to do the same) .
   
   I would say that this is something for SLA breach callback which is dedicated feature that eventually we will need to do. The current SLA feature in Airflow works really weird and very limited.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-983964535


   > BTW. An interesting use-case for pre-execute:
   > 
   > https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1638380797413000?thread_ts=1638342676.385300&cid=CCQ7EGB1P
   > 
   > Someone wants to get notified mid-flight when the task is running too long. For me sounds like a perfect case for pre-execute calllable (and possibly we could also add a decorator to do the same) .
   
   I would say that this is something for SLA breach callback which is dedicated feature that eventually we will need to do. The SLA feature in Airflow works really weird.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-981937009


   > Along these lines, one thing that might be more palatable is to add to BaseOperator the ability to inject (using params) a callable that would be called in `pre_execute`. E.g. call it `pre_execute_hook: Optional[Callable]=None`. Then users could call arbitrary logic in pre_execute without having to subclass. And this would open up some interesting possibilities where we could inject pre_execute and post_execute callables with global task policy, which I think could be very useful.
   
   Well, @dstandish. We've added it in https://github.com/apache/airflow/pull/17576 (after discussion with @malthe) and then there was a follow-up discussion as results of which @ashb marked it as experimental (see Ash's comment there https://github.com/apache/airflow/pull/18140) 
   
   I wonder what's your opinion :P 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish closed pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
dstandish closed pull request #19361:
URL: https://github.com/apache/airflow/pull/19361


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dstandish commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
dstandish commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-981928148


   separately, ok to close this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958723293


   @potiuk with regards to how this might fit in then I do think it lives up to the "tactical" solution described in (a):
   
   - The implementation is simple and follows an existing pattern in the codebase.
   
   - If the functionality is supplanted by a new and better scheduling system, then I would assume this new system supports the same functionality. In this case, the migration path should be clear.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958947470


   If the goal is richer task scheduling, I wonder if we should do something similar to timetables and just allow the user to define custom trigger rules.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957490564






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957545051


   @kaxil while I think the functionality could be baked into trigger rules, I can see two problems with that:
   
   1. How would the API look? Today you have something like 
   
       ```python
       my_task = PythonOperator(
           task_id='my_task',
           trigger_rule='all_success'
       )
       ```
       Something like `trigger_rule={"state": "all_success", "include_date": ...}` – ?
   
       I think the interface suggested in this pull request fits better with the existing `start_date` and `end_date` parameters.
   
   2. The `include_date` and `exclude_date` parameters match on the DAG run (specifically, the execution date) and are not tied to the task dependencies (unlike trigger rules).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958992438


   > If the goal is richer task scheduling, I wonder if we should do something similar to timetables and just allow the user to define custom trigger rules.
   
   Assuming current scope of trigger rules which just checks the status of a task I'm not sure if custom trigger rules has real value. I started a discussion about it in https://github.com/apache/airflow/issues/17010
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957545051






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-968392532


   I thought a bit more about it and also looked at  #17010 and I side with @ashb on that one. I really think adding time-based task scheduling based on include/exclude filters brings very little value and we should not not do it this way.
   
   I think there are two goals here:
   
   1) being able to choose (based on a crontab) to determine the "path" the DAG should follow for the "current date" (we need to define what the current date is here BTW)
   2)  We want to make the scheduling decision in this case much faster and with less overhead that could be done by - say PythonBranch operator. Such decisions could be taken by scheduler if the task-based scheduling is implemented and it shoud be done in "declarative" rather than "imperative" way.
   
   Now - I think 2) is really an optimisation, and I would not not really like to change whole Task definition and add complexity to scheduler.  And I think we can do it much simpler - following the implementation we'v done for DummyOperator.
   
   Why don't we make specialized `CronBranchOperator`?  This operator could make Branching decisions similarly as PythonBranchOperator, but we could optimize it in the scheduler and rather than execute the task, the scheduler could evaluate the declarative specification of the operator during scheduling and set the state of the tasks apropriately witout the overhead of running the task.
   
   I am not 100% sure - but I think this woudl be a relatively simple change in Scheduler and it would not necessary require any serious modification in Airflow's behaviour. We could even add a set of other similar "FastScheduled, declarative" tasks for other similar cases, where the scheduler could "evaluate" such task and perform all the subsequent scheduling decision immediatley after. And from Dag writer point of view it is even better, because it feels much more natural IMHO to choose a "path" in the DAG based on such Task whch clearly specifies "I decide on the path based on time", rather than decide based on "task attribute" as would be in case we introduce task-level include/exclude. It would also decouple the "time-dag-processing" from all the tasks and has the opportunity of better showing all time-based decisions in the UI. IMHO tasks should only do one thing and we seem to ask them to do more than one thing - decide whether to run and run.
   
   Correct me if I am wrong @ashb but I believe the way how Scheduler works now is ideally suited for this kind of optimisation - it runs in "mini-batches" and such state update for declarative evaluation of "fixed" task types is something that scheduler might cope with very well. I think tihs is eventually even more "extensible" than embedding task-level scheduling, because we could have more "specialized fast evaluated" tasks if we find the pattern works well.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-969094410


   @potiuk thanks for the detailed thoughts and comments.
   
   I think the behavior of the `ShortCircuitOperator` is better than the branching operator in that it works with statically defined dependencies, e.g. "if not Sunday, skip".
   
   Also [compare the ergonomics](https://stackoverflow.com/questions/67666968/can-we-configure-different-schedule-interval-for-different-tasks-within-a-dag) between the short-circuit operator and [BranchDayOfWeekOperator](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/weekday.html), the latter having in my opinion a rather awkward interface.
   
   I think you make some good points and a benefit is that the _skip reason_ is very obvious (when an operator is used to check the execution time).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-971946117


   Yep


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957613452


   Yeah, this is definitely a hard no from me now as proposed.
   
   The way this is achieved right now is to use a BranchOperator or similar to skip a task when the condition is false. That coupled with the recently added "edge labels" on the graphs makes that quite a nice solution. Is there something you don't like about doing it this way?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe closed pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe closed pull request #19361:
URL: https://github.com/apache/airflow/pull/19361


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958723293


   @potiuk with regards to how this might fit in then I do think it lives up to the "tactical" solution described in (a):
   
   - The implementation is simple and follows an existing pattern in the codebase.
   
   - If the functionality is supplanted by a new and better scheduling system, then I would assume this new system supports the same functionality. In this case, the migration path should be clear.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-983889125


   BTW. An interesting use-case for pre-execute:
   
   https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1638380797413000?thread_ts=1638342676.385300&cid=CCQ7EGB1P
   
   Someone wants to get notified mid-flight when the task is running too long. For me sounds like a perfect case for pre-execute calllable (and possibly we could also add a decorator to do the same) .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958947470






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957490564






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958257892


   @eladkal so in my implementation it's using the same logic as when a `DummyOperator` is assigned a _success_ state, but instead of success it is assigned a state of _skipped_.
   
   You could use something like `failed_states=['failed', 'skipped']` in your `ExternalTaskSensor` – but of course you can shoot yourself in the foot with an external task sensor. There are many ways to do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957490564






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-957499576


   @malthe  - also FYI - in the past we implemented a few of those solutions as "experimental" even if we did not know how it fit into the bigger picture. An example of that is the "poke_reschedule" mode which was realy "experimental" and is replaced with "Deferrable Tasks" which got more complete and implements much bigger range of async tasks in a much more comprehensive way. I have a feeling (but this is an intuition now) that at most it qualifies to that "tactical", "experimental" solution) - but again, It's more of a "feeling" than anything else. I just wanted you also to realise how it works and set the expectations on what can happen with it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #19361: Implement task-level include/exclude time filter

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #19361:
URL: https://github.com/apache/airflow/pull/19361#issuecomment-958947470


   If the goal is richer task scheduling, I wonder if we should do something similar to timetables and just allow the user to define custom trigger rules.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org