You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Eugene Shevchuk (JIRA)" <ji...@apache.org> on 2013/09/10 09:27:52 UTC
[jira] [Created] (OOZIE-1525) Oozie workflow does not update status
sometimes and is stuck in RUNNING state.
Eugene Shevchuk created OOZIE-1525:
--------------------------------------
Summary: Oozie workflow does not update status sometimes and is stuck in RUNNING state.
Key: OOZIE-1525
URL: https://issues.apache.org/jira/browse/OOZIE-1525
Project: Oozie
Issue Type: Bug
Affects Versions: 3.3.2
Reporter: Eugene Shevchuk
Sometimes workflow is stuck in RUNNING state with the following info:
{noformat}
Job ID : 0000002-130822205833322-oozie-hdp-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : wordcount-wf
App Path : ***
Status : RUNNING
Run : 0
User : ***
Group : -
Created : 2013-08-22 21:01 GMT
Started : 2013-08-22 21:01 GMT
Last Modified : 2013-08-22 21:02 GMT
Ended : -
CoordAction ID: -
Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000002-130822205833322-oozie-hdp-W@:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
0000002-130822205833322-oozie-hdp-W@wc OK job_201308221727_0287 SUCCEEDED -
------------------------------------------------------------------------------------------------------------------------------------
{noformat}
In log there are some problems with jpaservice:
{noformat}
2013-08-22 23:00:59,170 ERROR SignalXCommand:536 - USER[***] GROUP[-] TOKEN[] APP[wordcount-wf] JOB[0000002-130822205833322-oozie-hdp-W] ACTION[0000002-130822205833322-oozie-hdp-W@wc] Exception,
<openjpa-2.1.0-r422266:1071316 fatal store error> org.apache.openjpa.persistence.RollbackException: The transaction has been rolled back. See the nested exceptions for details on the errors that occurred.
FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
at org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:585)
at org.apache.oozie.service.JPAService.execute(JPAService.java:217)
at org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:310)
at org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:64)
at org.apache.oozie.command.XCommand.call(XCommand.java:277)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: <openjpa-2.1.0-r422266:1071316 fatal general error> org.apache.openjpa.persistence.PersistenceException: The transaction has been rolled back. See the nested exceptions for details on the errors that occurred.
FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
at org.apache.openjpa.kernel.BrokerImpl.newFlushException(BrokerImpl.java:2316)
at org.apache.openjpa.kernel.BrokerImpl.flush(BrokerImpl.java:2153)
at org.apache.openjpa.kernel.BrokerImpl.flushSafe(BrokerImpl.java:2051)
at org.apache.openjpa.kernel.BrokerImpl.beforeCompletion(BrokerImpl.java:1969)
at org.apache.openjpa.kernel.LocalManagedRuntime.commit(LocalManagedRuntime.java:81)
at org.apache.openjpa.kernel.BrokerImpl.commit(BrokerImpl.java:1493)
at org.apache.openjpa.kernel.DelegatingBroker.commit(DelegatingBroker.java:925)
at org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:561)
... 10 more
Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error> org.apache.openjpa.persistence.EntityExistsException: Violation of PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, external_child_ids, external_id, external_status, name, retries, stats, tracker_uri, transition, type, user_retry_count, user_retry_interval, user_retry_max, bean_type, end_time, execution_path, last_check_time, log_token, pending, pending_age, signal_value, sla_xml, start_time, status, wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, state=23000]
FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
at org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4854)
at org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:78)
at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:143)
at org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.batchOrExecuteRow(BatchingPreparedStatementManagerImpl.java:101)
at org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.flushAndUpdate(BatchingPreparedStatementManagerImpl.java:85)
at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushInternal(PreparedStatementManagerImpl.java:99)
at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flush(PreparedStatementManagerImpl.java:87)
at org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:550)
at org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:107)
at org.apache.openjpa.jdbc.kernel.BatchingConstraintUpdateManager.flush(BatchingConstraintUpdateManager.java:59)
at org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:103)
at org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:76)
at org.apache.openjpa.jdbc.kernel.JDBCStoreManager.flush(JDBCStoreManager.java:742)
at org.apache.openjpa.kernel.DelegatingStoreManager.flush(DelegatingStoreManager.java:131)
... 17 more
Caused by: org.apache.openjpa.lib.jdbc.ReportingSQLException: Violation of PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, external_child_ids, external_id, external_status, name, retries, stats, tracker_uri, transition, type, user_retry_count, user_retry_interval, user_retry_max, bean_type, end_time, execution_path, last_check_time, log_token, pending, pending_age, signal_value, sla_xml, start_time, status, wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, state=23000]
at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:281)
at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:257)
at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.access$1000(LoggingConnectionDecorator.java:72)
at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.executeUpdate(LoggingConnectionDecorator.java:1199)
at org.apache.openjpa.lib.jdbc.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:291)
at org.apache.openjpa.jdbc.kernel.JDBCStoreManager$CancelPreparedStatement.executeUpdate(JDBCStoreManager.java:1776)
at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.executeUpdate(PreparedStatementManagerImpl.java:267)
at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:118)
... 28 more
2013-08-22 23:00:59,170 WARN CallableQueueService$CompositeCallable:542 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable [signal], E0607: Other error in operation [signal], The transaction has been rolled back. See the nested exceptions for details on the errors that occurred.
org.apache.oozie.command.CommandException: E0607: Other error in operation [signal], The transaction has been rolled back. See the nested exceptions for details on the errors that occurred.
at org.apache.oozie.command.XCommand.call(XCommand.java:317)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error> org.apache.openjpa.persistence.RollbackException: The transaction has been rolled back. See the nested exceptions for details on the errors that occurred.
FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
at org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:585)
at org.apache.oozie.service.JPAService.execute(JPAService.java:217)
at org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:310)
at org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:64)
at org.apache.oozie.command.XCommand.call(XCommand.java:277)
... 6 more
Caused by: <openjpa-2.1.0-r422266:1071316 fatal general error> org.apache.openjpa.persistence.PersistenceException: The transaction has been rolled back. See the nested exceptions for details on the errors that occurred.
FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
at org.apache.openjpa.kernel.BrokerImpl.newFlushException(BrokerImpl.java:2316)
at org.apache.openjpa.kernel.BrokerImpl.flush(BrokerImpl.java:2153)
at org.apache.openjpa.kernel.BrokerImpl.flushSafe(BrokerImpl.java:2051)
at org.apache.openjpa.kernel.BrokerImpl.beforeCompletion(BrokerImpl.java:1969)
at org.apache.openjpa.kernel.LocalManagedRuntime.commit(LocalManagedRuntime.java:81)
at org.apache.openjpa.kernel.BrokerImpl.commit(BrokerImpl.java:1493)
at org.apache.openjpa.kernel.DelegatingBroker.commit(DelegatingBroker.java:925)
at org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:561)
... 10 more
Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error> org.apache.openjpa.persistence.EntityExistsException: Violation of PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, external_child_ids, external_id, external_status, name, retries, stats, tracker_uri, transition, type, user_retry_count, user_retry_interval, user_retry_max, bean_type, end_time, execution_path, last_check_time, log_token, pending, pending_age, signal_value, sla_xml, start_time, status, wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, state=23000]
FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
at org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4854)
at org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:78)
at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:143)
at org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.batchOrExecuteRow(BatchingPreparedStatementManagerImpl.java:101)
at org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.flushAndUpdate(BatchingPreparedStatementManagerImpl.java:85)
at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushInternal(PreparedStatementManagerImpl.java:99)
at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flush(PreparedStatementManagerImpl.java:87)
at org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:550)
at org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:107)
at org.apache.openjpa.jdbc.kernel.BatchingConstraintUpdateManager.flush(BatchingConstraintUpdateManager.java:59)
at org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:103)
at org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:76)
at org.apache.openjpa.jdbc.kernel.JDBCStoreManager.flush(JDBCStoreManager.java:742)
at org.apache.openjpa.kernel.DelegatingStoreManager.flush(DelegatingStoreManager.java:131)
... 17 more
Caused by: org.apache.openjpa.lib.jdbc.ReportingSQLException: Violation of PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, external_child_ids, external_id, external_status, name, retries, stats, tracker_uri, transition, type, user_retry_count, user_retry_interval, user_retry_max, bean_type, end_time, execution_path, last_check_time, log_token, pending, pending_age, signal_value, sla_xml, start_time, status, wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, state=23000]
at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:281)
at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:257)
at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.access$1000(LoggingConnectionDecorator.java:72)
at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.executeUpdate(LoggingConnectionDecorator.java:1199)
at org.apache.openjpa.lib.jdbc.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:291)
at org.apache.openjpa.jdbc.kernel.JDBCStoreManager$CancelPreparedStatement.executeUpdate(JDBCStoreManager.java:1776)
at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.executeUpdate(PreparedStatementManagerImpl.java:267)
at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:118)
... 28 more
{noformat}
Possible problem may be that when oozie is inserting a new object, the JPA entity manager needs to hear back from the connection to know the commit succeeded.
Sometimes when commit indeed succeeded but connection was lost, the JPA entityManager will keep retrying to persist the same object since it assumes the commit failed, therefore you see the "primary key constraint violation exception" in the stacktrace. The workflow status remains in running state until all of its child actions turn "SUCCEEDED". Since one of the child action fails to be created, the workflow will always remain in running state.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira