You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2020/02/22 01:14:00 UTC

[jira] [Commented] (AIRFLOW-6874) There are risks that subprocesses not killed when a task failed

    [ https://issues.apache.org/jira/browse/AIRFLOW-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042318#comment-17042318 ] 

ASF GitHub Bot commented on AIRFLOW-6874:
-----------------------------------------

YingboWang commented on pull request #7498: [AIRFLOW-6874] Reap cgroup procs when terminate in cgroup taskrunner
URL: https://github.com/apache/airflow/pull/7498
 
 
   ---
   Issue link: WILL BE INSERTED BY [boring-cyborg](https://github.com/kaxil/boring-cyborg)
   
   Make sure to mark the boxes below before creating PR: [x]
   
   Many airflow tasks create subprocesses and these subprocesses may create more subprocesses. In our experience, there is a risk that although a task failed and tried to reap the process group, there are still left over processes running and cause issues with both resources and correctness.
   
   Propose to improve the cgroup task runner to reap all processes for current node on node termination. 
   - [x] Description above provides context of the change
   - [x] Commit message/PR title starts with `[AIRFLOW-NNNN]`. AIRFLOW-NNNN = JIRA ID<sup>*</sup>
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [ ] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   <sup>*</sup> For document-only changes commit message can start with `[AIRFLOW-XXXX]`.
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> There are risks that subprocesses not killed when a task failed
> ---------------------------------------------------------------
>
>                 Key: AIRFLOW-6874
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6874
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: worker
>    Affects Versions: 1.10.4
>            Reporter: Yingbo Wang
>            Assignee: Yingbo Wang
>            Priority: Major
>
> Many airflow tasks create subprocesses and these subprocesses may create more subprocesses. In our experience, there is a risk that although a task failed and tried to reap the process group, there are still left over processes running and cause issues with both resources and correctness.
> Propose to improve the cgroup task runner to reap all processes for current node on node termination. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)