You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Bill Farner (JIRA)" <ji...@apache.org> on 2015/07/16 19:25:04 UTC

[jira] [Created] (AURORA-1401) DB deadlock caused stuck job update

Bill Farner created AURORA-1401:
-----------------------------------

             Summary: DB deadlock caused stuck job update
                 Key: AURORA-1401
                 URL: https://issues.apache.org/jira/browse/AURORA-1401
             Project: Aurora
          Issue Type: Bug
          Components: Scheduler
            Reporter: Bill Farner


We observed the symptom of a job update that was stuck in ROLLING_FORWARD, but was not active.  Looking in scheduler logs revealed a DB-level deadlock:
{noformat}
E0715 19:04:53.085 THREAD1471 org.apache.aurora.scheduler.updater.JobUpdateEventSubscriber.startUp: Failed to resume job updates: org.apache.aurora.scheduler.storage.Storage$StorageException: 
### Error querying database.  Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: "
Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive).
Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
SELECT
      lock_token
    FROM job_update_locks AS l
    INNER JOIN job_updates u ON l.update_row_id = u.id
     
    INNER JOIN job_keys AS j ON j.id = u.job_key_id
   
    WHERE  
    u.update_id = ?
    AND j.role = ?
    AND j.environment = ?
    AND j.name = ? [40001-187]
### The error may exist in org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
### The error may involve defaultParameterMap
### The error occurred while setting parameters
### SQL: SELECT       lock_token     FROM job_update_locks AS l     INNER JOIN job_updates u ON l.update_row_id = u.id           INNER JOIN job_keys AS j ON j.id = u.job_key_id         WHERE       u.update_id = ?     AND j.role = ?     AND j.environment = ?     AND j.name = ?
### Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: "
Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive).
Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
SELECT
      lock_token
    FROM job_update_locks AS l
    INNER JOIN job_updates u ON l.update_row_id = u.id
     
    INNER JOIN job_keys AS j ON j.id = u.job_key_id
   
    WHERE  
    u.update_id = ?
    AND j.role = ?
    AND j.environment = ?
    AND j.name = ? [40001-187]
org.apache.aurora.scheduler.storage.Storage$StorageException: 
### Error querying database.  Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: "
Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive).
Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
SELECT
      lock_token
    FROM job_update_locks AS l
    INNER JOIN job_updates u ON l.update_row_id = u.id
     
    INNER JOIN job_keys AS j ON j.id = u.job_key_id
   
    WHERE  
    u.update_id = ?
    AND j.role = ?
    AND j.environment = ?
    AND j.name = ? [40001-187]
### The error may exist in org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
### The error may involve defaultParameterMap
### The error occurred while setting parameters
### SQL: SELECT       lock_token     FROM job_update_locks AS l     INNER JOIN job_updates u ON l.update_row_id = u.id           INNER JOIN job_keys AS j ON j.id = u.job_key_id         WHERE       u.update_id = ?     AND j.role = ?     AND j.environment = ?     AND j.name = ?
### Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: "
Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive).
Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
SELECT
      lock_token
    FROM job_update_locks AS l
    INNER JOIN job_updates u ON l.update_row_id = u.id
     
    INNER JOIN job_keys AS j ON j.id = u.job_key_id
   
    WHERE  
    u.update_id = ?
    AND j.role = ?
    AND j.environment = ?
    AND j.name = ? [40001-187]
        at org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:145)
        at org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
        at com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
        at org.apache.aurora.scheduler.storage.log.LogStorage.doInTransaction(LogStorage.java:602)
        at org.apache.aurora.scheduler.storage.log.LogStorage.write(LogStorage.java:635)
        at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.write(CallOrderEnforcingStorage.java:130)
        at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.systemResume(JobUpdateControllerImpl.java:263)
        at org.apache.aurora.scheduler.updater.JobUpdateEventSubscriber.startUp(JobUpdateEventSubscriber.java:85)
        at com.google.common.util.concurrent.AbstractIdleService$2$1.run(AbstractIdleService.java:54)
        at com.google.common.util.concurrent.Callables$3.run(Callables.java:93)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.ibatis.exceptions.PersistenceException: 
### Error querying database.  Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: "
Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive).
Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
SELECT
      lock_token
    FROM job_update_locks AS l
    INNER JOIN job_updates u ON l.update_row_id = u.id
     
    INNER JOIN job_keys AS j ON j.id = u.job_key_id
   
    WHERE  
    u.update_id = ?
    AND j.role = ?
    AND j.environment = ?
    AND j.name = ? [40001-187]
### The error may exist in org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
### The error may involve defaultParameterMap
### The error occurred while setting parameters
### SQL: SELECT       lock_token     FROM job_update_locks AS l     INNER JOIN job_updates u ON l.update_row_id = u.id           INNER JOIN job_keys AS j ON j.id = u.job_key_id         WHERE       u.update_id = ?     AND j.role = ?     AND j.environment = ?     AND j.name = ?
### Cause: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: "
Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive).
Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
SELECT
      lock_token
    FROM job_update_locks AS l
    INNER JOIN job_updates u ON l.update_row_id = u.id
     
    INNER JOIN job_keys AS j ON j.id = u.job_key_id
   
    WHERE  
    u.update_id = ?
    AND j.role = ?
    AND j.environment = ?
    AND j.name = ? [40001-187]
        at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
        at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:122)
        at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:113)
        at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:73)
        at sun.reflect.GeneratedMethodAccessor29.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.ibatis.session.SqlSessionManager$SqlSessionInterceptor.invoke(SqlSessionManager.java:334)
        at com.sun.proxy.$Proxy57.selectOne(Unknown Source)
        at org.apache.ibatis.session.SqlSessionManager.selectOne(SqlSessionManager.java:165)
        at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:69)
        at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:53)
        at org.apache.aurora.scheduler.storage.db.$Proxy64.selectLockToken(Unknown Source)
        at org.apache.aurora.scheduler.storage.db.DbJobUpdateStore.getLockToken(DbJobUpdateStore.java:257)
        at com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
        at org.apache.aurora.scheduler.storage.ForwardingStore.getLockToken(ForwardingStore.java:178)
        at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.changeJobUpdateStatus(JobUpdateControllerImpl.java:454)
        at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl.access$1000(JobUpdateControllerImpl.java:108)
        at org.apache.aurora.scheduler.updater.JobUpdateControllerImpl$4.execute(JobUpdateControllerImpl.java:283)
        at org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:131)
        at org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult$Quiet.apply(Storage.java:148)
        at org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:605)
        at org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:602)
        at org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:143)
        ... 10 more
Caused by: org.h2.jdbc.JdbcSQLException: Deadlock detected. The current transaction was rolled back. Details: "
Session #3 (user: ) on thread RowGarbageCollector RUNNING is waiting to lock PUBLIC.LOCKS (shared) while locking PUBLIC.JOB_KEYS (exclusive).
Session #2 (user: ) on thread JobUpdateEventSubscriber STARTING is waiting to lock PUBLIC.JOB_KEYS (shared) while locking PUBLIC.JOB_INSTANCE_UPDATE_EVENTS (exclusive), PUBLIC.JOB_UPDATE_EVENTS (exclusive), PUBLIC.LOCKS (exclusive), PUBLIC.JOB_UPDATE_LOCKS (exclusive)."; SQL statement:
SELECT
      lock_token
    FROM job_update_locks AS l
    INNER JOIN job_updates u ON l.update_row_id = u.id
     
    INNER JOIN job_keys AS j ON j.id = u.job_key_id
   
    WHERE  
    u.update_id = ?
    AND j.role = ?
    AND j.environment = ?
    AND j.name = ? [40001-187]
        at org.h2.message.DbException.getJdbcSQLException(DbException.java:345)
        at org.h2.message.DbException.get(DbException.java:179)
        at org.h2.message.DbException.get(DbException.java:155)
        at org.h2.table.RegularTable.doLock1(RegularTable.java:496)
        at org.h2.table.RegularTable.lock(RegularTable.java:472)
        at org.h2.table.TableFilter.lock(TableFilter.java:146)
        at org.h2.command.dml.Select.queryWithoutCache(Select.java:671)
        at org.h2.command.dml.Query.query(Query.java:322)
        at org.h2.command.dml.Query.query(Query.java:290)
        at org.h2.command.dml.Query.query(Query.java:36)
        at org.h2.command.CommandContainer.query(CommandContainer.java:90)
        at org.h2.command.Command.executeQuery(Command.java:197)
        at org.h2.jdbc.JdbcPreparedStatement.execute(JdbcPreparedStatement.java:192)
        at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:62)
        at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:78)
        at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:62)
        at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:303)
        at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:154)
        at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:102)
        at org.apache.ibatis.executor.CachingExecutor.query(CachingExecutor.java:82)
        at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:120)
        ... 32 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)