You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Gopi Krishnan Nambiar (JIRA)" <ji...@apache.org> on 2017/02/10 00:55:42 UTC

[jira] [Commented] (OOZIE-1525) Oozie workflow does not update status sometimes and is stuck in RUNNING state.

    [ https://issues.apache.org/jira/browse/OOZIE-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860479#comment-15860479 ] 

Gopi Krishnan Nambiar commented on OOZIE-1525:
----------------------------------------------

I ran into this error recently and triaged to find that the cause for me was running multiple Oozie servers without HA support. I was running Oozie version 4.0.1 which did not support HA. Do you think this could have been the reason for your issues too ?

> Oozie workflow does not update status sometimes and is stuck in RUNNING state.
> ------------------------------------------------------------------------------
>
>                 Key: OOZIE-1525
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1525
>             Project: Oozie
>          Issue Type: Bug
>    Affects Versions: 3.3.2
>            Reporter: Eugene Shevchuk
>         Attachments: OOZIE-90.patch
>
>
> Sometimes workflow is stuck in RUNNING state with the following info:
> {noformat}
> Job ID : 0000002-130822205833322-oozie-hdp-W
> ------------------------------------------------------------------------------------------------------------------------------------
> Workflow Name : wordcount-wf
> App Path      : ***
> Status        : RUNNING
> Run           : 0
> User          : ***
> Group         : -
> Created       : 2013-08-22 21:01 GMT
> Started       : 2013-08-22 21:01 GMT
> Last Modified : 2013-08-22 21:02 GMT
> Ended         : -
> CoordAction ID: -
> Actions
> ------------------------------------------------------------------------------------------------------------------------------------
> ID                                                                            Status    Ext ID                 Ext Status Err Code
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000002-130822205833322-oozie-hdp-W@:start:                                   OK        -                      OK         -
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000002-130822205833322-oozie-hdp-W@wc                                        OK        job_201308221727_0287  SUCCEEDED  -
> ------------------------------------------------------------------------------------------------------------------------------------
> {noformat}
> In log there are some problems with jpaservice:
> {noformat}
> 2013-08-22 23:00:59,170 ERROR SignalXCommand:536 - USER[***] GROUP[-] TOKEN[] APP[wordcount-wf] JOB[0000002-130822205833322-oozie-hdp-W] ACTION[0000002-130822205833322-oozie-hdp-W@wc] Exception, 
> <openjpa-2.1.0-r422266:1071316 fatal store error> org.apache.openjpa.persistence.RollbackException: The transaction has been rolled back.  See the nested exceptions for details on the errors that occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
> 	at org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:585)
> 	at org.apache.oozie.service.JPAService.execute(JPAService.java:217)
> 	at org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:310)
> 	at org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:64)
> 	at org.apache.oozie.command.XCommand.call(XCommand.java:277)
> 	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
> 	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
> 	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal general error> org.apache.openjpa.persistence.PersistenceException: The transaction has been rolled back.  See the nested exceptions for details on the errors that occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
> 	at org.apache.openjpa.kernel.BrokerImpl.newFlushException(BrokerImpl.java:2316)
> 	at org.apache.openjpa.kernel.BrokerImpl.flush(BrokerImpl.java:2153)
> 	at org.apache.openjpa.kernel.BrokerImpl.flushSafe(BrokerImpl.java:2051)
> 	at org.apache.openjpa.kernel.BrokerImpl.beforeCompletion(BrokerImpl.java:1969)
> 	at org.apache.openjpa.kernel.LocalManagedRuntime.commit(LocalManagedRuntime.java:81)
> 	at org.apache.openjpa.kernel.BrokerImpl.commit(BrokerImpl.java:1493)
> 	at org.apache.openjpa.kernel.DelegatingBroker.commit(DelegatingBroker.java:925)
> 	at org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:561)
> 	... 10 more
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error> org.apache.openjpa.persistence.EntityExistsException: Violation of PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, external_child_ids, external_id, external_status, name, retries, stats, tracker_uri, transition, type, user_retry_count, user_retry_interval, user_retry_max, bean_type, end_time, execution_path, last_check_time, log_token, pending, pending_age, signal_value, sla_xml, start_time, status, wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, state=23000]
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
> 	at org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4854)
> 	at org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
> 	at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
> 	at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:78)
> 	at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:143)
> 	at org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.batchOrExecuteRow(BatchingPreparedStatementManagerImpl.java:101)
> 	at org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.flushAndUpdate(BatchingPreparedStatementManagerImpl.java:85)
> 	at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushInternal(PreparedStatementManagerImpl.java:99)
> 	at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flush(PreparedStatementManagerImpl.java:87)
> 	at org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:550)
> 	at org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:107)
> 	at org.apache.openjpa.jdbc.kernel.BatchingConstraintUpdateManager.flush(BatchingConstraintUpdateManager.java:59)
> 	at org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:103)
> 	at org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:76)
> 	at org.apache.openjpa.jdbc.kernel.JDBCStoreManager.flush(JDBCStoreManager.java:742)
> 	at org.apache.openjpa.kernel.DelegatingStoreManager.flush(DelegatingStoreManager.java:131)
> 	... 17 more
> Caused by: org.apache.openjpa.lib.jdbc.ReportingSQLException: Violation of PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, external_child_ids, external_id, external_status, name, retries, stats, tracker_uri, transition, type, user_retry_count, user_retry_interval, user_retry_max, bean_type, end_time, execution_path, last_check_time, log_token, pending, pending_age, signal_value, sla_xml, start_time, status, wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, state=23000]
> 	at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:281)
> 	at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:257)
> 	at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.access$1000(LoggingConnectionDecorator.java:72)
> 	at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.executeUpdate(LoggingConnectionDecorator.java:1199)
> 	at org.apache.openjpa.lib.jdbc.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:291)
> 	at org.apache.openjpa.jdbc.kernel.JDBCStoreManager$CancelPreparedStatement.executeUpdate(JDBCStoreManager.java:1776)
> 	at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.executeUpdate(PreparedStatementManagerImpl.java:267)
> 	at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:118)
> 	... 28 more
> 2013-08-22 23:00:59,170  WARN CallableQueueService$CompositeCallable:542 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable [signal], E0607: Other error in operation [signal], The transaction has been rolled back.  See the nested exceptions for details on the errors that occurred.
> org.apache.oozie.command.CommandException: E0607: Other error in operation [signal], The transaction has been rolled back.  See the nested exceptions for details on the errors that occurred.
> 	at org.apache.oozie.command.XCommand.call(XCommand.java:317)
> 	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
> 	at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
> 	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 	at java.lang.Thread.run(Thread.java:722)
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error> org.apache.openjpa.persistence.RollbackException: The transaction has been rolled back.  See the nested exceptions for details on the errors that occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
> 	at org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:585)
> 	at org.apache.oozie.service.JPAService.execute(JPAService.java:217)
> 	at org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:310)
> 	at org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:64)
> 	at org.apache.oozie.command.XCommand.call(XCommand.java:277)
> 	... 6 more
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal general error> org.apache.openjpa.persistence.PersistenceException: The transaction has been rolled back.  See the nested exceptions for details on the errors that occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
> 	at org.apache.openjpa.kernel.BrokerImpl.newFlushException(BrokerImpl.java:2316)
> 	at org.apache.openjpa.kernel.BrokerImpl.flush(BrokerImpl.java:2153)
> 	at org.apache.openjpa.kernel.BrokerImpl.flushSafe(BrokerImpl.java:2051)
> 	at org.apache.openjpa.kernel.BrokerImpl.beforeCompletion(BrokerImpl.java:1969)
> 	at org.apache.openjpa.kernel.LocalManagedRuntime.commit(LocalManagedRuntime.java:81)
> 	at org.apache.openjpa.kernel.BrokerImpl.commit(BrokerImpl.java:1493)
> 	at org.apache.openjpa.kernel.DelegatingBroker.commit(DelegatingBroker.java:925)
> 	at org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:561)
> 	... 10 more
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error> org.apache.openjpa.persistence.EntityExistsException: Violation of PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, external_child_ids, external_id, external_status, name, retries, stats, tracker_uri, transition, type, user_retry_count, user_retry_interval, user_retry_max, bean_type, end_time, execution_path, last_check_time, log_token, pending, pending_age, signal_value, sla_xml, start_time, status, wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, state=23000]
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
> 	at org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4854)
> 	at org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
> 	at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
> 	at org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:78)
> 	at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:143)
> 	at org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.batchOrExecuteRow(BatchingPreparedStatementManagerImpl.java:101)
> 	at org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.flushAndUpdate(BatchingPreparedStatementManagerImpl.java:85)
> 	at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushInternal(PreparedStatementManagerImpl.java:99)
> 	at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flush(PreparedStatementManagerImpl.java:87)
> 	at org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:550)
> 	at org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:107)
> 	at org.apache.openjpa.jdbc.kernel.BatchingConstraintUpdateManager.flush(BatchingConstraintUpdateManager.java:59)
> 	at org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:103)
> 	at org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:76)
> 	at org.apache.openjpa.jdbc.kernel.JDBCStoreManager.flush(JDBCStoreManager.java:742)
> 	at org.apache.openjpa.kernel.DelegatingStoreManager.flush(DelegatingStoreManager.java:131)
> 	... 17 more
> Caused by: org.apache.openjpa.lib.jdbc.ReportingSQLException: Violation of PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, external_child_ids, external_id, external_status, name, retries, stats, tracker_uri, transition, type, user_retry_count, user_retry_interval, user_retry_max, bean_type, end_time, execution_path, last_check_time, log_token, pending, pending_age, signal_value, sla_xml, start_time, status, wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, state=23000]
> 	at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:281)
> 	at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:257)
> 	at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.access$1000(LoggingConnectionDecorator.java:72)
> 	at org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.executeUpdate(LoggingConnectionDecorator.java:1199)
> 	at org.apache.openjpa.lib.jdbc.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:291)
> 	at org.apache.openjpa.jdbc.kernel.JDBCStoreManager$CancelPreparedStatement.executeUpdate(JDBCStoreManager.java:1776)
> 	at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.executeUpdate(PreparedStatementManagerImpl.java:267)
> 	at org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:118)
> 	... 28 more
> {noformat}
> Possible problem may be that when oozie is inserting a new object, the JPA entity manager needs to hear back from the connection to know the commit succeeded.
> Sometimes when commit indeed succeeded but connection was lost, the JPA entityManager will keep retrying to persist the same object since it assumes the commit failed, therefore you see the "primary key constraint violation exception" in the stacktrace. The workflow status remains in running state until all of its child actions turn "SUCCEEDED". Since one of the child action fails to be created, the workflow will always remain in running state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)