You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by Purshotam Shah <pu...@yahoo-inc.com> on 2014/03/25 00:34:12 UTC
Re: Review Request 18995: OOZIE-1735 Support resuming of failed coordinator
job and rerun of a failed coordinator action
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18995/
-----------------------------------------------------------
(Updated March 24, 2014, 11:34 p.m.)
Review request for oozie.
Changes
-------
Addressing review comments.
Summary (updated)
-----------------
OOZIE-1735 Support resuming of failed coordinator job and rerun of a failed coordinator action
Bugs: OOZIE-1735
https://issues.apache.org/jira/browse/OOZIE-1735
Repository: oozie-git
Description
-------
We should support rerunning of failed job. Job are set to failed if there are runtime error( like SQL timeout).
In current scenario there is no way to recover beside running SQL.
Rerun should set coord status to running and also set pending to 1 ,reset doneMaterialization and last modified to current time. So that materialization continues.
We should also provide an option of resuming failed action. The behavior will be same as killed option.
Diffs (updated)
-----
client/src/main/java/org/apache/oozie/cli/OozieCLI.java 87e2f27
client/src/main/java/org/apache/oozie/client/OozieClient.java b0a85fd
core/src/main/java/org/apache/oozie/command/coord/CoordChangeXCommand.java 4957330
core/src/main/java/org/apache/oozie/command/coord/CoordRerunXCommand.java 301737b
core/src/test/java/org/apache/oozie/command/coord/TestCoordChangeXCommand.java b9bbf16
core/src/test/java/org/apache/oozie/command/coord/TestCoordRerunXCommand.java 3cee71a
docs/src/site/twiki/DG_CommandLineTool.twiki 0748ff8
Diff: https://reviews.apache.org/r/18995/diff/
Testing
-------
purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C -oozie http://localhost:11000/oozie
Job ID : 0000000-140324095133518-oozie-puru-C
------------------------------------------------------------------------------------------------------------------------------------
Job Name : aggregator-coord
App Path : hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml
Status : RUNNING
Start Time : 2010-01-01 01:00 GMT
End Time : 2010-01-01 03:00 GMT
Pause Time : -
Concurrency : 1
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Err Code Created Nominal Time
0000000-140324095133518-oozie-puru-C@1 KILLED 0000001-140324095133518-oozie-puru-W - 2014-03-24 16:52 GMT 2010-01-01 01:00 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000000-140324095133518-oozie-puru-C@2 KILLED 0000002-140324095133518-oozie-puru-W - 2014-03-24 16:56 GMT 2010-01-01 02:00 GMT
------------------------------------------------------------------------------------------------------------------------------------
purushah$
purushah$ ./oozie job -kill 0000000-140324095133518-oozie-puru-C -oozie http://localhost:11000/oozie
purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C -oozie http://localhost:11000/oozie
Job ID : 0000000-140324095133518-oozie-puru-C
------------------------------------------------------------------------------------------------------------------------------------
Job Name : aggregator-coord
App Path : hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml
Status : KILLED
Start Time : 2010-01-01 01:00 GMT
End Time : 2010-01-01 03:00 GMT
Pause Time : -
Concurrency : 1
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Err Code Created Nominal Time
0000000-140324095133518-oozie-puru-C@1 KILLED 0000001-140324095133518-oozie-puru-W - 2014-03-24 16:52 GMT 2010-01-01 01:00 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000000-140324095133518-oozie-puru-C@2 KILLED 0000002-140324095133518-oozie-puru-W - 2014-03-24 16:56 GMT 2010-01-01 02:00 GMT
------------------------------------------------------------------------------------------------------------------------------------
purushah$ ./oozie job -change 0000000-140324095133518-oozie-puru-C -value status=RUNNING -oozie http://localhost:11000/oozie
purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C -oozie http://localhost:11000/oozie
Job ID : 0000000-140324095133518-oozie-puru-C
------------------------------------------------------------------------------------------------------------------------------------
Job Name : aggregator-coord
App Path : hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml
Status : RUNNING
Start Time : 2010-01-01 01:00 GMT
End Time : 2010-01-01 03:00 GMT
Pause Time : -
Concurrency : 1
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Err Code Created Nominal Time
0000000-140324095133518-oozie-puru-C@1 KILLED 0000001-140324095133518-oozie-puru-W - 2014-03-24 16:52 GMT 2010-01-01 01:00 GMT
------------------------------------------------------------------------------------------------------------------------------------
0000000-140324095133518-oozie-puru-C@2 KILLED 0000002-140324095133518-oozie-puru-W - 2014-03-24 16:56 GMT 2010-01-01 02:00 GMT
------------------------------------------------------------------------------------------------------------------------------------
purushah$
Thanks,
Purshotam Shah
Re: Review Request 18995: OOZIE-1735 Support resuming of failed coordinator
job and rerun of a failed coordinator action
Posted by Rohini Palaniswamy <ro...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18995/#review38392
-----------------------------------------------------------
Ship it!
Ship It!
- Rohini Palaniswamy
On March 24, 2014, 11:34 p.m., Purshotam Shah wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18995/
> -----------------------------------------------------------
>
> (Updated March 24, 2014, 11:34 p.m.)
>
>
> Review request for oozie.
>
>
> Bugs: OOZIE-1735
> https://issues.apache.org/jira/browse/OOZIE-1735
>
>
> Repository: oozie-git
>
>
> Description
> -------
>
>
> We should support rerunning of failed job. Job are set to failed if there are runtime error( like SQL timeout).
> In current scenario there is no way to recover beside running SQL.
> Rerun should set coord status to running and also set pending to 1 ,reset doneMaterialization and last modified to current time. So that materialization continues.
>
> We should also provide an option of resuming failed action. The behavior will be same as killed option.
>
>
> Diffs
> -----
>
> client/src/main/java/org/apache/oozie/cli/OozieCLI.java 87e2f27
> client/src/main/java/org/apache/oozie/client/OozieClient.java b0a85fd
> core/src/main/java/org/apache/oozie/command/coord/CoordChangeXCommand.java 4957330
> core/src/main/java/org/apache/oozie/command/coord/CoordRerunXCommand.java 301737b
> core/src/test/java/org/apache/oozie/command/coord/TestCoordChangeXCommand.java b9bbf16
> core/src/test/java/org/apache/oozie/command/coord/TestCoordRerunXCommand.java 3cee71a
> docs/src/site/twiki/DG_CommandLineTool.twiki 0748ff8
>
> Diff: https://reviews.apache.org/r/18995/diff/
>
>
> Testing
> -------
>
>
> purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C -oozie http://localhost:11000/oozie
> Job ID : 0000000-140324095133518-oozie-puru-C
> ------------------------------------------------------------------------------------------------------------------------------------
> Job Name : aggregator-coord
> App Path : hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml
> Status : RUNNING
> Start Time : 2010-01-01 01:00 GMT
> End Time : 2010-01-01 03:00 GMT
> Pause Time : -
> Concurrency : 1
> ------------------------------------------------------------------------------------------------------------------------------------
> ID Status Ext ID Err Code Created Nominal Time
> 0000000-140324095133518-oozie-puru-C@1 KILLED 0000001-140324095133518-oozie-puru-W - 2014-03-24 16:52 GMT 2010-01-01 01:00 GMT
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000000-140324095133518-oozie-puru-C@2 KILLED 0000002-140324095133518-oozie-puru-W - 2014-03-24 16:56 GMT 2010-01-01 02:00 GMT
> ------------------------------------------------------------------------------------------------------------------------------------
> purushah$
> purushah$ ./oozie job -kill 0000000-140324095133518-oozie-puru-C -oozie http://localhost:11000/oozie
> purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C -oozie http://localhost:11000/oozie
> Job ID : 0000000-140324095133518-oozie-puru-C
> ------------------------------------------------------------------------------------------------------------------------------------
> Job Name : aggregator-coord
> App Path : hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml
> Status : KILLED
> Start Time : 2010-01-01 01:00 GMT
> End Time : 2010-01-01 03:00 GMT
> Pause Time : -
> Concurrency : 1
> ------------------------------------------------------------------------------------------------------------------------------------
> ID Status Ext ID Err Code Created Nominal Time
> 0000000-140324095133518-oozie-puru-C@1 KILLED 0000001-140324095133518-oozie-puru-W - 2014-03-24 16:52 GMT 2010-01-01 01:00 GMT
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000000-140324095133518-oozie-puru-C@2 KILLED 0000002-140324095133518-oozie-puru-W - 2014-03-24 16:56 GMT 2010-01-01 02:00 GMT
> ------------------------------------------------------------------------------------------------------------------------------------
> purushah$ ./oozie job -change 0000000-140324095133518-oozie-puru-C -value status=RUNNING -oozie http://localhost:11000/oozie
> purushah$ ./oozie job -info 0000000-140324095133518-oozie-puru-C -oozie http://localhost:11000/oozie
> Job ID : 0000000-140324095133518-oozie-puru-C
> ------------------------------------------------------------------------------------------------------------------------------------
> Job Name : aggregator-coord
> App Path : hdfs://localhost:9000/user/purushah/examples/apps/aggregator/coordinator.xml
> Status : RUNNING
> Start Time : 2010-01-01 01:00 GMT
> End Time : 2010-01-01 03:00 GMT
> Pause Time : -
> Concurrency : 1
> ------------------------------------------------------------------------------------------------------------------------------------
> ID Status Ext ID Err Code Created Nominal Time
> 0000000-140324095133518-oozie-puru-C@1 KILLED 0000001-140324095133518-oozie-puru-W - 2014-03-24 16:52 GMT 2010-01-01 01:00 GMT
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000000-140324095133518-oozie-puru-C@2 KILLED 0000002-140324095133518-oozie-puru-W - 2014-03-24 16:56 GMT 2010-01-01 02:00 GMT
> ------------------------------------------------------------------------------------------------------------------------------------
> purushah$
>
>
> Thanks,
>
> Purshotam Shah
>
>