You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Virag Kothari (JIRA)" <ji...@apache.org> on 2012/06/23 01:49:42 UTC

[jira] [Created] (OOZIE-885) A race condition can cause the workflow/coordinator to run even after the bundle job is killed

Virag Kothari created OOZIE-885:
-----------------------------------

             Summary: A race condition can cause the workflow/coordinator to run even after the bundle job is killed
                 Key: OOZIE-885
                 URL: https://issues.apache.org/jira/browse/OOZIE-885
             Project: Oozie
          Issue Type: Bug
            Reporter: Virag Kothari
            Assignee: Virag Kothari
             Fix For: trunk, 3.2.1


Steps to reproduce:

1) Start the bundle job with a bunch of coordinators
2) Immediately kill it

Observation:
Some coordinators still keep on running

Reason:
Bundle cannot kill a coordinator unless a coord-id is associated to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-885) A race condition can cause the workflow/coordinator to run even after the bundle job is killed

Posted by "Mohammad Kamrul Islam (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406181#comment-13406181 ] 

Mohammad Kamrul Islam commented on OOZIE-885:
---------------------------------------------

Please revisit new try {}catch block added in RecoveryService class. 
                
> A race condition can cause the workflow/coordinator to run even after the bundle job is killed
> ----------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-885
>                 URL: https://issues.apache.org/jira/browse/OOZIE-885
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Virag Kothari
>            Assignee: Virag Kothari
>             Fix For: trunk, 3.2.1
>
>         Attachments: OOZIE-885.patch
>
>
> Steps to reproduce:
> 1) Start the bundle job with a bunch of coordinators
> 2) Immediately kill it
> Observation:
> Some coordinators still keep on running
> Reason:
> Bundle cannot kill a coordinator unless a coord-id is associated to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-885) A race condition can cause the workflow/coordinator to run even after the bundle job is killed

Posted by "Virag Kothari (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421058#comment-13421058 ] 

Virag Kothari commented on OOZIE-885:
-------------------------------------

Its already fixed at https://issues.apache.org/jira/browse/OOZIE-904. The test cases in trunk shouldn't fail
                
> A race condition can cause the workflow/coordinator to run even after the bundle job is killed
> ----------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-885
>                 URL: https://issues.apache.org/jira/browse/OOZIE-885
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Virag Kothari
>            Assignee: Virag Kothari
>             Fix For: trunk, 3.2.1
>
>         Attachments: OOZIE-885-v2.patch, OOZIE-885.patch
>
>
> Steps to reproduce:
> 1) Start the bundle job with a bunch of coordinators
> 2) Immediately kill it
> Observation:
> Some coordinators still keep on running
> Reason:
> Bundle cannot kill a coordinator unless a coord-id is associated to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OOZIE-885) A race condition can cause the workflow/coordinator to run even after the bundle job is killed

Posted by "Virag Kothari (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Virag Kothari updated OOZIE-885:
--------------------------------

    Attachment: OOZIE-885-v2.patch

try/catch structure changed in wf, coord and bundle recovery
Comment corrected in bundlestatusupdate
                
> A race condition can cause the workflow/coordinator to run even after the bundle job is killed
> ----------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-885
>                 URL: https://issues.apache.org/jira/browse/OOZIE-885
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Virag Kothari
>            Assignee: Virag Kothari
>             Fix For: trunk, 3.2.1
>
>         Attachments: OOZIE-885-v2.patch, OOZIE-885.patch
>
>
> Steps to reproduce:
> 1) Start the bundle job with a bunch of coordinators
> 2) Immediately kill it
> Observation:
> Some coordinators still keep on running
> Reason:
> Bundle cannot kill a coordinator unless a coord-id is associated to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OOZIE-885) A race condition can cause the workflow/coordinator to run even after the bundle job is killed

Posted by "Mohammad Kamrul Islam (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OOZIE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406721#comment-13406721 ] 

Mohammad Kamrul Islam commented on OOZIE-885:
---------------------------------------------

+1
                
> A race condition can cause the workflow/coordinator to run even after the bundle job is killed
> ----------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-885
>                 URL: https://issues.apache.org/jira/browse/OOZIE-885
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Virag Kothari
>            Assignee: Virag Kothari
>             Fix For: trunk, 3.2.1
>
>         Attachments: OOZIE-885-v2.patch, OOZIE-885.patch
>
>
> Steps to reproduce:
> 1) Start the bundle job with a bunch of coordinators
> 2) Immediately kill it
> Observation:
> Some coordinators still keep on running
> Reason:
> Bundle cannot kill a coordinator unless a coord-id is associated to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (OOZIE-885) A race condition can cause the workflow/coordinator to run even after the bundle job is killed

Posted by "Robert Kanter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter resolved OOZIE-885.
---------------------------------

    Resolution: Fixed

Sorry about that; git hadn't pulled all of the latest changes for some reason.  
                
> A race condition can cause the workflow/coordinator to run even after the bundle job is killed
> ----------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-885
>                 URL: https://issues.apache.org/jira/browse/OOZIE-885
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Virag Kothari
>            Assignee: Virag Kothari
>             Fix For: trunk, 3.2.1
>
>         Attachments: OOZIE-885-v2.patch, OOZIE-885.patch
>
>
> Steps to reproduce:
> 1) Start the bundle job with a bunch of coordinators
> 2) Immediately kill it
> Observation:
> Some coordinators still keep on running
> Reason:
> Bundle cannot kill a coordinator unless a coord-id is associated to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (OOZIE-885) A race condition can cause the workflow/coordinator to run even after the bundle job is killed

Posted by "Robert Kanter (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter reopened OOZIE-885:
---------------------------------


It looks like this patch broke a test in TestBundleStartXCommand:
{code}

-------------------------------------------------------------------------------
Test set: org.apache.oozie.command.bundle.TestBundleStartXCommand
-------------------------------------------------------------------------------
Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 153.229 sec <<< FAILURE!
testBundleStartNegative2(org.apache.oozie.command.bundle.TestBundleStartXCommand)  Time elapsed: 0.004 sec  <<< FAILURE!
junit.framework.AssertionFailedError: expected:<FAILED> but was:<RUNNING>
	at junit.framework.Assert.fail(Assert.java:50)
	at junit.framework.Assert.failNotEquals(Assert.java:287)
	at junit.framework.Assert.assertEquals(Assert.java:67)
	at junit.framework.Assert.assertEquals(Assert.java:74)
	at org.apache.oozie.command.bundle.TestBundleStartXCommand.testBundleStartNegative2(TestBundleStartXCommand.java:220)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at junit.framework.TestCase.runTest(TestCase.java:168)
	at junit.framework.TestCase.runBare(TestCase.java:134)
	at junit.framework.TestResult$1.protect(TestResult.java:110)
	at junit.framework.TestResult.runProtected(TestResult.java:128)
	at junit.framework.TestResult.run(TestResult.java:113)
	at junit.framework.TestCase.run(TestCase.java:124)
	at junit.framework.TestSuite.runTest(TestSuite.java:243)
	at junit.framework.TestSuite.run(TestSuite.java:238)
	at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
	at org.apache.maven.surefire.junitcore.ClassDemarcatingRunner.run(ClassDemarcatingRunner.java:58)
	at org.junit.runners.Suite.runChild(Suite.java:128)
	at org.junit.runners.Suite.runChild(Suite.java:24)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:680)
{code}

It runs successfully on the commit before this one in trunk.  
                
> A race condition can cause the workflow/coordinator to run even after the bundle job is killed
> ----------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-885
>                 URL: https://issues.apache.org/jira/browse/OOZIE-885
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Virag Kothari
>            Assignee: Virag Kothari
>             Fix For: trunk, 3.2.1
>
>         Attachments: OOZIE-885-v2.patch, OOZIE-885.patch
>
>
> Steps to reproduce:
> 1) Start the bundle job with a bunch of coordinators
> 2) Immediately kill it
> Observation:
> Some coordinators still keep on running
> Reason:
> Bundle cannot kill a coordinator unless a coord-id is associated to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OOZIE-885) A race condition can cause the workflow/coordinator to run even after the bundle job is killed

Posted by "Virag Kothari (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OOZIE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Virag Kothari updated OOZIE-885:
--------------------------------

    Attachment: OOZIE-885.patch

BundleStatusCommand modified to account for when coord-id is null.
Recovery service modified to correctly log the error message.

Testing:
Patch verified by Y! qe 
                
> A race condition can cause the workflow/coordinator to run even after the bundle job is killed
> ----------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-885
>                 URL: https://issues.apache.org/jira/browse/OOZIE-885
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Virag Kothari
>            Assignee: Virag Kothari
>             Fix For: trunk, 3.2.1
>
>         Attachments: OOZIE-885.patch
>
>
> Steps to reproduce:
> 1) Start the bundle job with a bunch of coordinators
> 2) Immediately kill it
> Observation:
> Some coordinators still keep on running
> Reason:
> Bundle cannot kill a coordinator unless a coord-id is associated to it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira