You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@oozie.apache.org by "Purshotam Shah (JIRA)" <ji...@apache.org> on 2014/07/16 21:35:04 UTC

[jira] [Updated] (OOZIE-1938) Fork-join job does not execute join node sometimes during HA failover

     [ https://issues.apache.org/jira/browse/OOZIE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Purshotam Shah updated OOZIE-1938:
----------------------------------

    Description: 
Reported by [~mchiang].

Scenario: (2 Oozie HA servers)
21:38:56 submit job at oozie client
21:41:42 shut down server1
21:46:52 shut down server2
21:47:30 start server1
22:15:05 start server2

the last fork path end time is 21:52:53.
22:36:48 the job is still RUNNING, not moving to join node.

Digging into the logs, the locking part seems to work fine with forked action processing distributed amongst the two servers when both running or when one of them is down. The issue seems to be why even RecoveryService fails to pick up the job after all the forks had completed

  was:
Reported by Michelle Chiang (Yahoo Oozie QE)

Scenario: (2 Oozie HA servers)
21:38:56 submit job at oozie client
21:41:42 shut down server1
21:46:52 shut down server2
21:47:30 start server1
22:15:05 start server2

the last fork path end time is 21:52:53.
22:36:48 the job is still RUNNING, not moving to join node.

Digging into the logs, the locking part seems to work fine with forked action processing distributed amongst the two servers when both running or when one of them is down. The issue seems to be why even RecoveryService fails to pick up the job after all the forks had completed


> Fork-join job does not execute join node sometimes during HA failover
> ---------------------------------------------------------------------
>
>                 Key: OOZIE-1938
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1938
>             Project: Oozie
>          Issue Type: Bug
>          Components: HA
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>             Fix For: trunk
>
>
> Reported by [~mchiang].
> Scenario: (2 Oozie HA servers)
> 21:38:56 submit job at oozie client
> 21:41:42 shut down server1
> 21:46:52 shut down server2
> 21:47:30 start server1
> 22:15:05 start server2
> the last fork path end time is 21:52:53.
> 22:36:48 the job is still RUNNING, not moving to join node.
> Digging into the logs, the locking part seems to work fine with forked action processing distributed amongst the two servers when both running or when one of them is down. The issue seems to be why even RecoveryService fails to pick up the job after all the forks had completed



--
This message was sent by Atlassian JIRA
(v6.2#6252)