You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Hanghang Liu (Jira)" <ji...@apache.org> on 2022/10/13 17:58:00 UTC

[jira] [Updated] (GOBBLIN-1721) Give option to cancel helix workflow through Delete API to avoid job hanging

     [ https://issues.apache.org/jira/browse/GOBBLIN-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hanghang Liu updated GOBBLIN-1721:
----------------------------------
    Summary: Give option to cancel helix workflow through Delete API to avoid job hanging  (was: Replace STOP API with DELETE for Helix workflow to avoid job hanging)

> Give option to cancel helix workflow through Delete API to avoid job hanging
> ----------------------------------------------------------------------------
>
>                 Key: GOBBLIN-1721
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1721
>             Project: Apache Gobblin
>          Issue Type: Bug
>          Components: gobblin-cluster
>            Reporter: Hanghang Liu
>            Assignee: Hung Tran
>            Priority: Major
>
> Currently when we receive a job restart(handleUpdateJobConfigArrival), GobblinHelixJobLauncher will firstly callĀ  helixTaskDriver.waitToStop to stop the workflow, then initiate the new one. We observe the behavior of Helix taking exceptionally long to stop the workflow, making the job state staying in STOPPING status. This will make waitToStop timeout and throw exception all the time, making the new flow never be able to launch.
> We can utilize Delete API in this case since our job is stateless for Helix, to avoid job hanging.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)