You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ignite.apache.org by "Anton Vinogradov (Jira)" <ji...@apache.org> on 2022/05/05 09:43:00 UTC

[jira] [Updated] (IGNITE-16916) Make nodes more resilient in case of a job cancellation

     [ https://issues.apache.org/jira/browse/IGNITE-16916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anton Vinogradov updated IGNITE-16916:
--------------------------------------
    Attachment: screenshot-1.png

> Make nodes more resilient in case of a job cancellation
> -------------------------------------------------------
>
>                 Key: IGNITE-16916
>                 URL: https://issues.apache.org/jira/browse/IGNITE-16916
>             Project: Ignite
>          Issue Type: Task
>          Components: compute
>            Reporter: Kirill Tkalenko
>            Assignee: Kirill Tkalenko
>            Priority: Major
>             Fix For: 2.14
>
>         Attachments: screenshot-1.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In case of a job being cancelled we currently have a really questionable approach.
> We are now setting the interruption flag even before we give a use a chance to stop the job gracefully.
> Proposal for the implementation:
> * Adding a distributed property in the metastore that will set a timeout for interrupting *GridJobWorker* that did not gracefully complete after calling *GridJobWorker#cancel*;
> * On the call of the *GridJobWorker#cancel*, do not *Thread#interrupt* the thread, but add *GridTimeoutObject*.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)