You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by Mona Chitnis <mo...@yahoo.in> on 2014/09/04 22:55:09 UTC

Re: Review Request 24948: OOZIE-1940 StatusTransitService has race condition

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24948/#review52346
-----------------------------------------------------------



core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java
<https://reviews.apache.org/r/24948/#comment91109>

    do we need to execute this synchronously? None of the counts here are affected by the outcome of this command. Can queue it to release lock faster



core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java
<https://reviews.apache.org/r/24948/#comment91115>

    related question, is this situation possible? - job status is PAUSED || PWE, and bundle action status is RWE?



core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java
<https://reviews.apache.org/r/24948/#comment91116>

    is this a change from current behavior? a mix of suspended, failed, killed, DWE, SPE = SPE? I'm not sure. Sounds reasonable to me but need to check if this could potentially be confusing to anyone



core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java
<https://reviews.apache.org/r/24948/#comment91118>

    I dont understand this. Are the other bundle actions in Running? Then why is status Prep and not Running?



core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java
<https://reviews.apache.org/r/24948/#comment91117>

    why is getPrepStatus calling getRunningStatus?



core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java
<https://reviews.apache.org/r/24948/#comment91119>

    same comment as BundleStatusTransitX, just make sure whether SuspendedWithError is the right status here



core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java
<https://reviews.apache.org/r/24948/#comment91120>

    why dont we filter out the SKIPPED actions altogether for status transit processing?



core/src/main/java/org/apache/oozie/service/StatusTransitService.java
<https://reviews.apache.org/r/24948/#comment91122>

    some form of batching done right away would be good, so we can execute bundle/coord update queries on multiple bundles/coords in a batched execute query. However then we wont be able to utilize the locking per job to have strong consistency. I guess ok to defer this until we have a better idea


- Mona Chitnis


On Aug. 26, 2014, 12:30 a.m., Purshotam Shah wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24948/
> -----------------------------------------------------------
> 
> (Updated Aug. 26, 2014, 12:30 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-1940
>     https://issues.apache.org/jira/browse/OOZIE-1940
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> StatusTransitService has race condition
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/BundleActionBean.java 5d85a4d 
>   core/src/main/java/org/apache/oozie/BundleJobBean.java 0f1670a 
>   core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 
>   core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 
>   core/src/main/java/org/apache/oozie/ErrorCode.java 88a2c67 
>   core/src/main/java/org/apache/oozie/command/StatusTransitXCommand.java e69de29 
>   core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java e69de29 
>   core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java e69de29 
>   core/src/main/java/org/apache/oozie/executor/jpa/BundleJobQueryExecutor.java 36cd968 
>   core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java 3008393 
>   core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java 04e6e29 
>   core/src/main/java/org/apache/oozie/service/StatusTransitService.java 21ac25f 
>   core/src/test/java/org/apache/oozie/service/TestStatusTransitService.java bb99138 
> 
> Diff: https://reviews.apache.org/r/24948/diff/
> 
> 
> Testing
> -------
> 
> UTC
> 
> 
> Thanks,
> 
> Purshotam Shah
> 
>


Re: Review Request 24948: OOZIE-1940 StatusTransitService has race condition

Posted by Purshotam Shah <pu...@yahoo-inc.com>.

> On Sept. 4, 2014, 8:55 p.m., Mona Chitnis wrote:
> > core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java, line 81
> > <https://reviews.apache.org/r/24948/diff/1/?file=668667#file668667line81>
> >
> >     do we need to execute this synchronously? None of the counts here are affected by the outcome of this command. Can queue it to release lock faster

We need to kill bundle and coord asap. There may be chance the coord action have started and might be some processing. Since it's a faulty bundle, we don't want to start any processing.
This will be fixed in OOZIE-1863.


> On Sept. 4, 2014, 8:55 p.m., Mona Chitnis wrote:
> > core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java, line 177
> > <https://reviews.apache.org/r/24948/diff/1/?file=668667#file668667line177>
> >
> >     related question, is this situation possible? - job status is PAUSED || PWE, and bundle action status is RWE?

Yes, BundlePauseXCommand only set bundle status to pause. Bundle status can still be in running state.


> On Sept. 4, 2014, 8:55 p.m., Mona Chitnis wrote:
> > core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java, line 132
> > <https://reviews.apache.org/r/24948/diff/1/?file=668668#file668668line132>
> >
> >     same comment as BundleStatusTransitX, just make sure whether SuspendedWithError is the right status here

Yes, we first check the terminal case. SUSPENDED  check will come after terminal check.


> On Sept. 4, 2014, 8:55 p.m., Mona Chitnis wrote:
> > core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java, line 168
> > <https://reviews.apache.org/r/24948/diff/1/?file=668668#file668668line168>
> >
> >     why dont we filter out the SKIPPED actions altogether for status transit processing?

We check if all action is in terminal state then, we mark job as terminated. SKIPPED is also a terminal state. Previous StatusTransitService does same way.


> On Sept. 4, 2014, 8:55 p.m., Mona Chitnis wrote:
> > core/src/main/java/org/apache/oozie/service/StatusTransitService.java, line 160
> > <https://reviews.apache.org/r/24948/diff/1/?file=668672#file668672line160>
> >
> >     some form of batching done right away would be good, so we can execute bundle/coord update queries on multiple bundles/coords in a batched execute query. However then we wont be able to utilize the locking per job to have strong consistency. I guess ok to defer this until we have a better idea

Yes, previous StatusTransitService does the same.

Lets create a different JIRA for batching.


> On Sept. 4, 2014, 8:55 p.m., Mona Chitnis wrote:
> > core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java, line 248
> > <https://reviews.apache.org/r/24948/diff/1/?file=668667#file668667line248>
> >
> >     is this a change from current behavior? a mix of suspended, failed, killed, DWE, SPE = SPE? I'm not sure. Sounds reasonable to me but need to check if this could potentially be confusing to anyone

There is no change in behaviour. Previous StatusTransitService has same condition.


> On Sept. 4, 2014, 8:55 p.m., Mona Chitnis wrote:
> > core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java, line 267
> > <https://reviews.apache.org/r/24948/diff/1/?file=668667#file668667line267>
> >
> >     I dont understand this. Are the other bundle actions in Running? Then why is status Prep and not Running?

It should be PrepRunning state.


- Purshotam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24948/#review52346
-----------------------------------------------------------


On Aug. 26, 2014, 12:30 a.m., Purshotam Shah wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24948/
> -----------------------------------------------------------
> 
> (Updated Aug. 26, 2014, 12:30 a.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-1940
>     https://issues.apache.org/jira/browse/OOZIE-1940
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> StatusTransitService has race condition
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/BundleActionBean.java 5d85a4d 
>   core/src/main/java/org/apache/oozie/BundleJobBean.java 0f1670a 
>   core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 
>   core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 
>   core/src/main/java/org/apache/oozie/ErrorCode.java 88a2c67 
>   core/src/main/java/org/apache/oozie/command/StatusTransitXCommand.java e69de29 
>   core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java e69de29 
>   core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java e69de29 
>   core/src/main/java/org/apache/oozie/executor/jpa/BundleJobQueryExecutor.java 36cd968 
>   core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java 3008393 
>   core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java 04e6e29 
>   core/src/main/java/org/apache/oozie/service/StatusTransitService.java 21ac25f 
>   core/src/test/java/org/apache/oozie/service/TestStatusTransitService.java bb99138 
> 
> Diff: https://reviews.apache.org/r/24948/diff/
> 
> 
> Testing
> -------
> 
> UTC
> 
> 
> Thanks,
> 
> Purshotam Shah
> 
>


Re: Review Request 24948: OOZIE-1940 StatusTransitService has race condition

Posted by Mona Chitnis <mo...@yahoo.in>.

> On Sept. 4, 2014, 8:55 p.m., Mona Chitnis wrote:
> > core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java, line 177
> > <https://reviews.apache.org/r/24948/diff/1/?file=668667#file668667line177>
> >
> >     related question, is this situation possible? - job status is PAUSED || PWE, and bundle action status is RWE?
> 
> Purshotam Shah wrote:
>     Yes, BundlePauseXCommand only set bundle status to pause. Bundle status can still be in running state.

Ok. I just checked that BundlePauseXCommand and CoordPauseXCommand have empty implementations of pauseChildren()
@Override
    public void pauseChildren() throws CommandException {
        // TODO - need revisit when revisiting coord job status redesign;

    }


- Mona


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24948/#review52346
-----------------------------------------------------------


On Sept. 8, 2014, 9:15 p.m., Purshotam Shah wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24948/
> -----------------------------------------------------------
> 
> (Updated Sept. 8, 2014, 9:15 p.m.)
> 
> 
> Review request for oozie.
> 
> 
> Bugs: OOZIE-1940
>     https://issues.apache.org/jira/browse/OOZIE-1940
> 
> 
> Repository: oozie-git
> 
> 
> Description
> -------
> 
> StatusTransitService has race condition
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/oozie/BundleActionBean.java 5d85a4d 
>   core/src/main/java/org/apache/oozie/BundleJobBean.java 0f1670a 
>   core/src/main/java/org/apache/oozie/CoordinatorActionBean.java 795bf63 
>   core/src/main/java/org/apache/oozie/CoordinatorJobBean.java 8fd53f1 
>   core/src/main/java/org/apache/oozie/ErrorCode.java 88a2c67 
>   core/src/main/java/org/apache/oozie/command/StatusTransitXCommand.java e69de29 
>   core/src/main/java/org/apache/oozie/command/bundle/BundleStatusTransitXCommand.java e69de29 
>   core/src/main/java/org/apache/oozie/command/coord/CoordStatusTransitXCommand.java e69de29 
>   core/src/main/java/org/apache/oozie/executor/jpa/BundleJobQueryExecutor.java 36cd968 
>   core/src/main/java/org/apache/oozie/executor/jpa/CoordActionQueryExecutor.java 3008393 
>   core/src/main/java/org/apache/oozie/executor/jpa/CoordJobQueryExecutor.java 04e6e29 
>   core/src/main/java/org/apache/oozie/service/StatusTransitService.java 21ac25f 
>   core/src/test/java/org/apache/oozie/service/TestStatusTransitService.java bb99138 
> 
> Diff: https://reviews.apache.org/r/24948/diff/
> 
> 
> Testing
> -------
> 
> UTC
> 
> 
> Thanks,
> 
> Purshotam Shah
> 
>