You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Peter Bacsko (JIRA)" <ji...@apache.org> on 2016/06/23 16:59:16 UTC

[jira] [Comment Edited] (OOZIE-2459) TestCoordActionInputCheckXCommand.testNone is flakey

    [ https://issues.apache.org/jira/browse/OOZIE-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346755#comment-15346755 ] 

Peter Bacsko edited comment on OOZIE-2459 at 6/23/16 4:59 PM:
--------------------------------------------------------------

Hm, I have to correct myself. Ferenc was right, it's StatusTransitService, I was able to reproduce it with some changes in the test (here I changed log4j settings + renamed the threads in the scheduler as well):

{code}
12:22:10,183 [scheduler-7]  INFO StatusTransitService$StatusTransitRunnable:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.StatusTransitService]
12:22:10,183 [scheduler-7]  INFO StatusTransitService$StatusTransitRunnable:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running coordinator status service from last instance time =  2016-06-23T10:22Z
...
12:22:10,191 [scheduler-7] DEBUG CoordStatusTransitXCommand:526 - USER[test] GROUP[testg] TOKEN[] APP[COORD-TEST] JOB[0000000-160623122108134-oozie-pbac-C] ACTION[-] Execute command [coord_status_transit] key [0000000-160623122108134-oo
zie-pbac-C]
12:22:10,191 [scheduler-7]  INFO CoordStatusTransitXCommand:520 - USER[test] GROUP[testg] TOKEN[] APP[COORD-TEST] JOB[0000000-160623122108134-oozie-pbac-C] ACTION[-] Set coordinator job [0000000-160623122108134-oozie-pbac-C] status to 'SUCCEEDED' from 'RUNNING'
...
12:22:10,194 [scheduler-7] DEBUG SQL:73 - <t 842827155, conn 626961003> executing prepstmnt 391142767 UPDATE COORD_JOBS t0 SET t0.status = ?, t0.last_modified_time = ?, t0.pending = ? WHERE (t0.id = ?) [params=(String) SUCCEEDED, (Timestamp) 2016-06-23 12:22:10.191, (int) 0, (String) 0000000-160623122108134-oozie-pbac-C]
{code}

I discussed this with Ferenc and we concluded that it's not the only reason why the test can fail. The output from CoordStatusTransitXCommand should be visible, but it's missing from this run:

https://builds.apache.org/job/oozie-trunk-precommit-build/2988/testReport/junit/org.apache.oozie.command.coord/TestCoordActionInputCheckXCommand/testNone/

I'll spend some more effort reproducing the problem, but if we can't figure out the root cause, I agree with Ferenc's idea about splitting up the tests. Maybe there's some tricky race condition somewhere that we're unable to spot.


was (Author: pbacsko):
Hm, I have to correct myself. Ferenc was right, it's StatusTransitService, I was able to reproduce it with some tricks (here I changed log4j settings + renamed the threads in the scheduler as well):

{code}
12:22:10,183 [scheduler-7]  INFO StatusTransitService$StatusTransitRunnable:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.StatusTransitService]
12:22:10,183 [scheduler-7]  INFO StatusTransitService$StatusTransitRunnable:520 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Running coordinator status service from last instance time =  2016-06-23T10:22Z
...
12:22:10,191 [scheduler-7] DEBUG CoordStatusTransitXCommand:526 - USER[test] GROUP[testg] TOKEN[] APP[COORD-TEST] JOB[0000000-160623122108134-oozie-pbac-C] ACTION[-] Execute command [coord_status_transit] key [0000000-160623122108134-oo
zie-pbac-C]
12:22:10,191 [scheduler-7]  INFO CoordStatusTransitXCommand:520 - USER[test] GROUP[testg] TOKEN[] APP[COORD-TEST] JOB[0000000-160623122108134-oozie-pbac-C] ACTION[-] Set coordinator job [0000000-160623122108134-oozie-pbac-C] status to 'SUCCEEDED' from 'RUNNING'
...
12:22:10,194 [scheduler-7] DEBUG SQL:73 - <t 842827155, conn 626961003> executing prepstmnt 391142767 UPDATE COORD_JOBS t0 SET t0.status = ?, t0.last_modified_time = ?, t0.pending = ? WHERE (t0.id = ?) [params=(String) SUCCEEDED, (Timestamp) 2016-06-23 12:22:10.191, (int) 0, (String) 0000000-160623122108134-oozie-pbac-C]
{code}

I discussed this with Ferenc and we concluded that it's not the only reason why the test can fail. The output from CoordStatusTransitXCommand should be visible, but it's missing from this run:

https://builds.apache.org/job/oozie-trunk-precommit-build/2988/testReport/junit/org.apache.oozie.command.coord/TestCoordActionInputCheckXCommand/testNone/

I'll spend some more effort reproducing the problem, but if we can't figure out the root cause, I agree with Ferenc's idea about splitting up the tests. Maybe there's some tricky race condition somewhere that we're unable to spot.

> TestCoordActionInputCheckXCommand.testNone is flakey
> ----------------------------------------------------
>
>                 Key: OOZIE-2459
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2459
>             Project: Oozie
>          Issue Type: Bug
>          Components: action, tests
>    Affects Versions: 4.1.0
>            Reporter: Ferenc Denes
>            Assignee: Peter Bacsko
>            Priority: Minor
>             Fix For: trunk
>
>         Attachments: OOZIE-2459-1.patch
>
>
> TestCoordActionInputCheckXCommand.testNone is flakey.
> Also the TestCoordActionInputCheckXCommandNonUTC.testNone which is very similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)