You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Robert Kanter (JIRA)" <ji...@apache.org> on 2013/01/09 02:17:11 UTC

[jira] [Updated] (OOZIE-1051) Repeating Errors for workflows that were allreday Killed

     [ https://issues.apache.org/jira/browse/OOZIE-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Kanter updated OOZIE-1051:
---------------------------------

    Description: 

 am experiencing a repeated errors for workflows that were already killed.

{code}
012-11-05 21:01:27,975 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000713-121031144215857-oozie-isot-W] ACTION[-] Acquired lock for [0000712-121031144215857-oozie-isot-W] in [kill]
2012-11-05 21:01:27,975 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000713-121031144215857-oozie-isot-W] ACTION[-] Load state for [0000712-121031144215857-oozie-isot-W]
2012-11-05 21:01:28,475 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Precondition check for command [kill] key [0000712-121031144215857-oozie-isot-W]
2012-11-05 21:01:28,475 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Released lock for [0000712-121031144215857-oozie-isot-W] in [kill]
2012-11-05 21:01:28,475  WARN KillXCommand:542 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] E0725: Workflow instance can not be killed, 0000712-121031144215857-oozie-isot-W, Error Code: E0725
2012-11-05 21:01:28,475 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Command [kill] key [0000712-121031144215857-oozie-isot-W]  already executed for [org.apache.oozie.command.wf.KillXCommand@6903ada0]
2012-11-05 21:01:28,475 ERROR KillXCommand:536 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Exception, 
java.lang.IllegalMonitorStateException
	at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:374)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260)
	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:1131)
	at org.apache.oozie.util.MemoryLocks$LockToken.release(MemoryLocks.java:59)
	at org.apache.oozie.command.XCommand.releaseLock(XCommand.java:222)
	at org.apache.oozie.command.XCommand.call(XCommand.java:295)
	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
2012-11-05 21:01:28,475  WARN CallableQueueService$CompositeCallable:542 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable [kill], E0607: Other error in operation [java.lang.IllegalMonitorStateException], {1}
org.apache.oozie.command.CommandException: E0607: Other error in operation [java.lang.IllegalMonitorStateException], {1}
	at org.apache.oozie.command.XCommand.call(XCommand.java:317)
	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
	at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.IllegalMonitorStateException
	at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:374)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260)
	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:1131)
	at org.apache.oozie.util.MemoryLocks$LockToken.release(MemoryLocks.java:59)
	at org.apache.oozie.command.XCommand.releaseLock(XCommand.java:222)
	at org.apache.oozie.command.XCommand.call(XCommand.java:295)
	... 6 more
{code}

What I did, is some time ago I killed a coordinator app whic had a lot of workflows in WAITING state, since then I see this error in the log file.

The log/exception is shown periodically, and it happen for the same job id (0000712-121031144215857-oozie-isot-W) and also several other job ids.


According to Mohammad Islam:
This is definitely a bug in XCommand class.
Most possibly, the bug was introduced when interrupt type command was added.

Would you please create a JIRA for this adding the contents of this email?



Probable reason:
Trying to release a lock w/o locking it first. It could happen on special case.
The bug is here in XCommand code:
{code}
finally{
    if (isLockRequired()) {
        releaseLock();
    }
}
{code}
The "if" condition should be changed to something like this (used before locking):
if (isLockRequired() && !this.inInterruptMode())  


  was:
 am experiencing a repeated errors for workflows that were already killed.

012-11-05 21:01:27,975 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000713-121031144215857-oozie-isot-W] ACTION[-] Acquired lock for [0000712-121031144215857-oozie-isot-W] in [kill]

2012-11-05 21:01:27,975 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000713-121031144215857-oozie-isot-W] ACTION[-] Load state for [0000712-121031144215857-oozie-isot-W]

2012-11-05 21:01:28,475 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Precondition check for command [kill] key [0000712-121031144215857-oozie-isot-W]

2012-11-05 21:01:28,475 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Released lock for [0000712-121031144215857-oozie-isot-W] in [kill]

2012-11-05 21:01:28,475  WARN KillXCommand:542 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] E0725: Workflow instance can not be killed, 0000712-121031144215857-oozie-isot-W, Error Code: E0725

2012-11-05 21:01:28,475 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Command [kill] key [0000712-121031144215857-oozie-isot-W]  already executed for [org.apache.oozie.command.wf.KillXCommand@6903ada0]

2012-11-05 21:01:28,475 ERROR KillXCommand:536 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Exception, 

java.lang.IllegalMonitorStateException

	at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:374)

	at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260)

	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:1131)

	at org.apache.oozie.util.MemoryLocks$LockToken.release(MemoryLocks.java:59)

	at org.apache.oozie.command.XCommand.releaseLock(XCommand.java:222)

	at org.apache.oozie.command.XCommand.call(XCommand.java:295)

	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)

	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)

	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)

	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

	at java.lang.Thread.run(Thread.java:722)

2012-11-05 21:01:28,475  WARN CallableQueueService$CompositeCallable:542 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable [kill], E0607: Other error in operation [java.lang.IllegalMonitorStateException], {1}

org.apache.oozie.command.CommandException: E0607: Other error in operation [java.lang.IllegalMonitorStateException], {1}

	at org.apache.oozie.command.XCommand.call(XCommand.java:317)

	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)

	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)

	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)

	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

	at java.lang.Thread.run(Thread.java:722)

Caused by: java.lang.IllegalMonitorStateException

	at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:374)

	at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260)

	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:1131)

	at org.apache.oozie.util.MemoryLocks$LockToken.release(MemoryLocks.java:59)

	at org.apache.oozie.command.XCommand.releaseLock(XCommand.java:222)

	at org.apache.oozie.command.XCommand.call(XCommand.java:295)

	... 6 more

What I did, is some time ago I killed a coordinator app whic had a lot of workflows in WAITING state, since then I see this error in the log file.

The log/exception is shown periodically, and it happen for the same job id (0000712-121031144215857-oozie-isot-W) and also several other job ids.


According to Mohammad Islam:
This is definitely a bug in XCommand class.
Most possibly, the bug was introduced when interrupt type command was added.

Would you please create a JIRA for this adding the contents of this email?



Probable reason:
Trying to release a lock w/o locking it first. It could happen on special case.
The bug is here in XCommand code:
finally{
                if (isLockRequired()) {
                    releaseLock();
                }
            }

The "if" condition should be changed to something like this (used before locking):
if (isLockRequired() && !this.inInterruptMode())  



    
> Repeating Errors for workflows that were allreday Killed
> --------------------------------------------------------
>
>                 Key: OOZIE-1051
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1051
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.2.0
>            Reporter: Boris Shulman
>            Assignee: Robert Kanter
>
>  am experiencing a repeated errors for workflows that were already killed.
> {code}
> 012-11-05 21:01:27,975 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000713-121031144215857-oozie-isot-W] ACTION[-] Acquired lock for [0000712-121031144215857-oozie-isot-W] in [kill]
> 2012-11-05 21:01:27,975 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000713-121031144215857-oozie-isot-W] ACTION[-] Load state for [0000712-121031144215857-oozie-isot-W]
> 2012-11-05 21:01:28,475 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Precondition check for command [kill] key [0000712-121031144215857-oozie-isot-W]
> 2012-11-05 21:01:28,475 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Released lock for [0000712-121031144215857-oozie-isot-W] in [kill]
> 2012-11-05 21:01:28,475  WARN KillXCommand:542 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] E0725: Workflow instance can not be killed, 0000712-121031144215857-oozie-isot-W, Error Code: E0725
> 2012-11-05 21:01:28,475 DEBUG KillXCommand:545 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Command [kill] key [0000712-121031144215857-oozie-isot-W]  already executed for [org.apache.oozie.command.wf.KillXCommand@6903ada0]
> 2012-11-05 21:01:28,475 ERROR KillXCommand:536 - USER[isotopeuser] GROUP[-] TOKEN[] APP[MonitoringTestWorkflow1__0] JOB[0000712-121031144215857-oozie-isot-W] ACTION[-] Exception, 
> java.lang.IllegalMonitorStateException
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:374)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:1131)
> 	at org.apache.oozie.util.MemoryLocks$LockToken.release(MemoryLocks.java:59)
> 	at org.apache.oozie.command.XCommand.releaseLock(XCommand.java:222)
> 	at org.apache.oozie.command.XCommand.call(XCommand.java:295)
> 	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
> 	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
> 	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
> 2012-11-05 21:01:28,475  WARN CallableQueueService$CompositeCallable:542 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable [kill], E0607: Other error in operation [java.lang.IllegalMonitorStateException], {1}
> org.apache.oozie.command.CommandException: E0607: Other error in operation [java.lang.IllegalMonitorStateException], {1}
> 	at org.apache.oozie.command.XCommand.call(XCommand.java:317)
> 	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
> 	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
> 	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.IllegalMonitorStateException
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryRelease(ReentrantReadWriteLock.java:374)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.unlock(ReentrantReadWriteLock.java:1131)
> 	at org.apache.oozie.util.MemoryLocks$LockToken.release(MemoryLocks.java:59)
> 	at org.apache.oozie.command.XCommand.releaseLock(XCommand.java:222)
> 	at org.apache.oozie.command.XCommand.call(XCommand.java:295)
> 	... 6 more
> {code}
> What I did, is some time ago I killed a coordinator app whic had a lot of workflows in WAITING state, since then I see this error in the log file.
> The log/exception is shown periodically, and it happen for the same job id (0000712-121031144215857-oozie-isot-W) and also several other job ids.
> According to Mohammad Islam:
> This is definitely a bug in XCommand class.
> Most possibly, the bug was introduced when interrupt type command was added.
> Would you please create a JIRA for this adding the contents of this email?
> Probable reason:
> Trying to release a lock w/o locking it first. It could happen on special case.
> The bug is here in XCommand code:
> {code}
> finally{
>     if (isLockRequired()) {
>         releaseLock();
>     }
> }
> {code}
> The "if" condition should be changed to something like this (used before locking):
> if (isLockRequired() && !this.inInterruptMode())  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira