You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "alejandrofm (via GitHub)" <gi...@apache.org> on 2023/02/14 02:53:31 UTC

[GitHub] [airflow] alejandrofm opened a new issue, #29524: Possible deadlock when max_active_runs maxed + depends_on_past = True

alejandrofm opened a new issue, #29524:
URL: https://github.com/apache/airflow/issues/29524

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   Currently on 2.4.2
   When some tasks of a run fail and the next start we can have this situation:
   - All green on the first run, but 1 red, run status = Failed
   - Next run starts and executes everything but the task that had an error before, which stays with the status None and the task is still on status Running.
   - I clear the failed tasks of the first run, but the run doesn't execute because the max_active_runs property is maxed, for example, 1/1.
   Result:
   The DAG gets deadlocked and needs to mark the first run as success for the second to finish and then clear the previous one for it to start running the cleared tasks.
   
   ### What you think should happen instead
   
   When runs == max_active_runs and depends_on_past == True and there are tasks with status=None (Cleared) on previous runs, the current run has to stop and give resources to the run that is locking the current.
   
   ### How to reproduce
   
   Set DAG config as:
   max_active_runs == 1
   depends_on_past == True
   Start two runs, and mark some tasks on the first as failed.
   This can happen with a higher value of max_active_run too, but 1 is the easiest to reproduce.
   
   ### Operating System
   
   Linux image
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #29524: Possible deadlock when max_active_runs maxed + depends_on_past = True

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #29524:
URL: https://github.com/apache/airflow/issues/29524#issuecomment-1430351066

   I can't say that this is a bug. Everything work as expected.
   With `depends_on_past = True` scheduler can't start task if state for previous interval is `failed`
   When you restart tasks there is new active DagRun created but with `max_active_runs` set to `1` it queued.
   
   The same situation could happen even if you set `max_active_runs` greater that `1`. Simple sample you have schedule interval 10 minutes and `max_active_runs` set to `16` in midnight one task failed, and after 3 hour you have 16 DagRun in `running` state which wait each other, and 20 in `queued`
   
   But again, this expected behaviour when `depends_on_past` set to `True`
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] alejandrofm commented on issue #29524: Possible deadlock when max_active_runs maxed + depends_on_past = True

Posted by "alejandrofm (via GitHub)" <gi...@apache.org>.
alejandrofm commented on issue #29524:
URL: https://github.com/apache/airflow/issues/29524#issuecomment-1430354446

   Is expected and logic, but the solution to get out of this is somehow strange.
   Have to set the current run as failed/succeded clear the previous run and then...clear the previously succeded/failed run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #29524: Possible deadlock when max_active_runs maxed + depends_on_past = True

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #29524:
URL: https://github.com/apache/airflow/issues/29524#issuecomment-1430383187

   Potential solution it allow to set reserved DagRun for this purpose, the same as PostgreSQL have reserved connections for superuser and ext4 filesystem has reserved (5% 🤦) space for root user.
   
   But this is only an idea, behind the scene there is quite a few changes would required for allow to do this thing, and one general how to decide which DagRun should use reserved `max_active_runs` and which should use regular one. Because right now there is no difference if DagRun created manually or by scheduler or some task cleared in previous DagRun


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #29524: Possible deadlock when max_active_runs maxed + depends_on_past = True

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on issue #29524:
URL: https://github.com/apache/airflow/issues/29524#issuecomment-1436621796

   Turning that into discussion then./


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #29524: Possible deadlock when max_active_runs maxed + depends_on_past = True

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #29524: Possible deadlock when max_active_runs maxed + depends_on_past = True
URL: https://github.com/apache/airflow/issues/29524


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] alejandrofm commented on issue #29524: Possible deadlock when max_active_runs maxed + depends_on_past = True

Posted by "alejandrofm (via GitHub)" <gi...@apache.org>.
alejandrofm commented on issue #29524:
URL: https://github.com/apache/airflow/issues/29524#issuecomment-1430389712

   You are right, just that "depends_on_past" is VERY useful for...the cases it was created for and then it generates these locks, that could be 1 run or 10 and the backfilling is messy.
   the reserved DagRun has to be "smart" too and go to the run that is blocking the current run... which can be 5 runs behind.
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org