You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by Purshotam Shah <pu...@yahoo-inc.com> on 2014/03/25 00:34:12 UTC

Re: Review Request 18995: OOZIE-1735 Support resuming of failed coordinator job and rerun of a failed coordinator action

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18995/
-----------------------------------------------------------

(Updated March 24, 2014, 11:34 p.m.)


Review request for oozie.


Changes
-------

Addressing review comments.


Summary (updated)
-----------------

OOZIE-1735 Support resuming of failed coordinator job and rerun of a failed coordinator action


Bugs: OOZIE-1735
    https://issues.apache.org/jira/browse/OOZIE-1735


Repository: oozie-git


Description
-------


We should support rerunning of failed job. Job are set to failed if there are runtime error( like SQL timeout).
In current scenario there is no way to recover beside running SQL.
Rerun should set coord status to running and also set pending to 1 ,reset doneMaterialization and last modified to current time. So that materialization continues.

We should also provide an option of resuming failed action. The behavior will be same as killed option.


Diffs (updated)
-----

  client/src/main/java/org/apache/oozie/cli/OozieCLI.java 87e2f27 
  client/src/main/java/org/apache/oozie/client/OozieClient.java b0a85fd 
  core/src/main/java/org/apache/oozie/command/coord/CoordChangeXCommand.java 4957330 
  core/src/main/java/org/apache/oozie/command/coord/CoordRerunXCommand.java 301737b 
  core/src/test/java/org/apache/oozie/command/coord/TestCoordChangeXCommand.java b9bbf16 
  core/src/test/java/org/apache/oozie/command/coord/TestCoordRerunXCommand.java 3cee71a 
  docs/src/site/twiki/DG_CommandLineTool.twiki 0748ff8 

Diff: https://reviews.apache.org/r/18995/diff/


Testing
-------


purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C  -oozie http://localhost:11000/oozie
Job ID : 0000000-140324095133518-oozie-puru-C
------------------------------------------------------------------------------------------------------------------------------------
Job Name    : aggregator-coord
App Path    : hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml
Status      : RUNNING
Start Time  : 2010-01-01 01:00 GMT
End Time    : 2010-01-01 03:00 GMT
Pause Time  : -
Concurrency : 1
------------------------------------------------------------------------------------------------------------------------------------
ID                                         Status    Ext ID                               Err Code  Created              Nominal Time         
0000000-140324095133518-oozie-puru-C@1     KILLED    0000001-140324095133518-oozie-puru-W -         2014-03-24 16:52 GMT 2010-01-01 01:00 GMT 
------------------------------------------------------------------------------------------------------------------------------------
0000000-140324095133518-oozie-puru-C@2     KILLED    0000002-140324095133518-oozie-puru-W -         2014-03-24 16:56 GMT 2010-01-01 02:00 GMT 
------------------------------------------------------------------------------------------------------------------------------------
purushah$ 
purushah$ ./oozie job -kill 0000000-140324095133518-oozie-puru-C  -oozie http://localhost:11000/oozie
purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C  -oozie http://localhost:11000/oozie
Job ID : 0000000-140324095133518-oozie-puru-C
------------------------------------------------------------------------------------------------------------------------------------
Job Name    : aggregator-coord
App Path    : hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml
Status      : KILLED
Start Time  : 2010-01-01 01:00 GMT
End Time    : 2010-01-01 03:00 GMT
Pause Time  : -
Concurrency : 1
------------------------------------------------------------------------------------------------------------------------------------
ID                                         Status    Ext ID                               Err Code  Created              Nominal Time         
0000000-140324095133518-oozie-puru-C@1     KILLED    0000001-140324095133518-oozie-puru-W -         2014-03-24 16:52 GMT 2010-01-01 01:00 GMT 
------------------------------------------------------------------------------------------------------------------------------------
0000000-140324095133518-oozie-puru-C@2     KILLED    0000002-140324095133518-oozie-puru-W -         2014-03-24 16:56 GMT 2010-01-01 02:00 GMT 
------------------------------------------------------------------------------------------------------------------------------------
purushah$ ./oozie job -change 0000000-140324095133518-oozie-puru-C -value status=RUNNING -oozie http://localhost:11000/oozie
purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C  -oozie http://localhost:11000/oozie
Job ID : 0000000-140324095133518-oozie-puru-C
------------------------------------------------------------------------------------------------------------------------------------
Job Name    : aggregator-coord
App Path    : hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml
Status      : RUNNING
Start Time  : 2010-01-01 01:00 GMT
End Time    : 2010-01-01 03:00 GMT
Pause Time  : -
Concurrency : 1
------------------------------------------------------------------------------------------------------------------------------------
ID                                         Status    Ext ID                               Err Code  Created              Nominal Time         
0000000-140324095133518-oozie-puru-C@1     KILLED    0000001-140324095133518-oozie-puru-W -         2014-03-24 16:52 GMT 2010-01-01 01:00 GMT 
------------------------------------------------------------------------------------------------------------------------------------
0000000-140324095133518-oozie-puru-C@2     KILLED    0000002-140324095133518-oozie-puru-W -         2014-03-24 16:56 GMT 2010-01-01 02:00 GMT 
------------------------------------------------------------------------------------------------------------------------------------
purushah$ 


Thanks,

Purshotam Shah


Re: Review Request 18995: OOZIE-1735 Support resuming of failed coordinator job and rerun of a failed coordinator action

Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18995/#review38392
-----------------------------------------------------------

Ship it!


Ship It!

- Rohini Palaniswamy


On March 24, 2014, 11:34 p.m., Purshotam Shah wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18995/
> -----------------------------------------------------------
> 
> (Updated March 24, 2014, 11:34 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-1735
>     https://issues.apache.org/jira/browse/OOZIE-1735
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> 
> We should support rerunning of failed job. Job are set to failed if there are runtime error( like SQL timeout).
> In current scenario there is no way to recover beside running SQL.
> Rerun should set coord status to running and also set pending to 1 ,reset doneMaterialization and last modified to current time. So that materialization continues.
> 
> We should also provide an option of resuming failed action. The behavior will be same as killed option.
> 
> 
> Diffs
> -----
> 
>   client/src/main/java/org/apache/oozie/cli/OozieCLI.java 87e2f27 
>   client/src/main/java/org/apache/oozie/client/OozieClient.java b0a85fd 
>   core/src/main/java/org/apache/oozie/command/coord/CoordChangeXCommand.java 4957330 
>   core/src/main/java/org/apache/oozie/command/coord/CoordRerunXCommand.java 301737b 
>   core/src/test/java/org/apache/oozie/command/coord/TestCoordChangeXCommand.java b9bbf16 
>   core/src/test/java/org/apache/oozie/command/coord/TestCoordRerunXCommand.java 3cee71a 
>   docs/src/site/twiki/DG_CommandLineTool.twiki 0748ff8 
> 
> Diff: https://reviews.apache.org/r/18995/diff/
> 
> 
> Testing
> -------
> 
> 
> purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C  -oozie http://localhost:11000/oozie
> Job ID : 0000000-140324095133518-oozie-puru-C
> ------------------------------------------------------------------------------------------------------------------------------------
> Job Name    : aggregator-coord
> App Path    : hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml
> Status      : RUNNING
> Start Time  : 2010-01-01 01:00 GMT
> End Time    : 2010-01-01 03:00 GMT
> Pause Time  : -
> Concurrency : 1
> ------------------------------------------------------------------------------------------------------------------------------------
> ID                                         Status    Ext ID                               Err Code  Created              Nominal Time         
> 0000000-140324095133518-oozie-puru-C@1     KILLED    0000001-140324095133518-oozie-puru-W -         2014-03-24 16:52 GMT 2010-01-01 01:00 GMT 
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000000-140324095133518-oozie-puru-C@2     KILLED    0000002-140324095133518-oozie-puru-W -         2014-03-24 16:56 GMT 2010-01-01 02:00 GMT 
> ------------------------------------------------------------------------------------------------------------------------------------
> purushah$ 
> purushah$ ./oozie job -kill 0000000-140324095133518-oozie-puru-C  -oozie http://localhost:11000/oozie
> purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C  -oozie http://localhost:11000/oozie
> Job ID : 0000000-140324095133518-oozie-puru-C
> ------------------------------------------------------------------------------------------------------------------------------------
> Job Name    : aggregator-coord
> App Path    : hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml
> Status      : KILLED
> Start Time  : 2010-01-01 01:00 GMT
> End Time    : 2010-01-01 03:00 GMT
> Pause Time  : -
> Concurrency : 1
> ------------------------------------------------------------------------------------------------------------------------------------
> ID                                         Status    Ext ID                               Err Code  Created              Nominal Time         
> 0000000-140324095133518-oozie-puru-C@1     KILLED    0000001-140324095133518-oozie-puru-W -         2014-03-24 16:52 GMT 2010-01-01 01:00 GMT 
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000000-140324095133518-oozie-puru-C@2     KILLED    0000002-140324095133518-oozie-puru-W -         2014-03-24 16:56 GMT 2010-01-01 02:00 GMT 
> ------------------------------------------------------------------------------------------------------------------------------------
> purushah$ ./oozie job -change 0000000-140324095133518-oozie-puru-C -value status=RUNNING -oozie http://localhost:11000/oozie
> purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C  -oozie http://localhost:11000/oozie
> Job ID : 0000000-140324095133518-oozie-puru-C
> ------------------------------------------------------------------------------------------------------------------------------------
> Job Name    : aggregator-coord
> App Path    : hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml
> Status      : RUNNING
> Start Time  : 2010-01-01 01:00 GMT
> End Time    : 2010-01-01 03:00 GMT
> Pause Time  : -
> Concurrency : 1
> ------------------------------------------------------------------------------------------------------------------------------------
> ID                                         Status    Ext ID                               Err Code  Created              Nominal Time         
> 0000000-140324095133518-oozie-puru-C@1     KILLED    0000001-140324095133518-oozie-puru-W -         2014-03-24 16:52 GMT 2010-01-01 01:00 GMT 
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000000-140324095133518-oozie-puru-C@2     KILLED    0000002-140324095133518-oozie-puru-W -         2014-03-24 16:56 GMT 2010-01-01 02:00 GMT 
> ------------------------------------------------------------------------------------------------------------------------------------
> purushah$ 
> 
> 
> Thanks,
> 
> Purshotam Shah
> 
>