You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Bill Farner (JIRA)" <ji...@apache.org> on 2016/02/01 18:55:39 UTC

[jira] [Commented] (AURORA-1603) Scheduler fails to start after rollback

    [ https://issues.apache.org/jira/browse/AURORA-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126653#comment-15126653 ] 

Bill Farner commented on AURORA-1603:
-------------------------------------

One option worth considering - allow duplicates in this routine, and add a cleanup operation that collapses them.  This sidesteps tricky schema evolution logic entirely, and AFAICT little negative impact.

> Scheduler fails to start after rollback
> ---------------------------------------
>
>                 Key: AURORA-1603
>                 URL: https://issues.apache.org/jira/browse/AURORA-1603
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Maxim Khutornenko
>            Assignee: Maxim Khutornenko
>            Priority: Critical
>
> We had to rollback scheduler due to the duplicate instances in the UI and when tried to restart on the older version (8d3fb2413306387bc533b1b800bbc97149f96b26) got the following error preventing scheduler from loading snapshot:
> {noformat}
> To index multiple values under a key, use Multimaps.index.
>         at com.google.common.collect.Maps.uniqueIndex(Maps.java:1215) ~[guava-19.0.jar:na]
>         at com.google.common.collect.Maps.uniqueIndex(Maps.java:1173) ~[guava-19.0.jar:na]
>         at org.apache.aurora.scheduler.storage.db.TaskConfigManager.getConfigRow(TaskConfigManager.java:46) ~[aurora-113.jar:na]
>         at org.apache.aurora.scheduler.storage.db.TaskConfigManager.insert(TaskConfigManager.java:57) ~[aurora-113.jar:na]
>         at org.apache.aurora.scheduler.storage.db.DbJobUpdateStore.saveJobUpdate(DbJobUpdateStore.java:125) ~[aurora-113.jar:na]
>         at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) ~[commons-113.jar:na]
>         at org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl$7.restoreFromSnapshot(SnapshotStoreImpl.java:208) ~[aurora-113.jar:na]
>         at org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.lambda$applySnapshot$238(SnapshotStoreImpl.java:278) ~[aurora-113.jar:na]
>         at org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:137) ~[aurora-113.jar:na]
>         at org.apache.aurora.scheduler.storage.Storage$MutateWork$NoResult.apply(Storage.java:132) ~[aurora-113.jar:na]
>         at org.apache.aurora.scheduler.storage.db.DbStorage.transactionedWrite(DbStorage.java:146) ~[aurora-113.jar:na]
>         at org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101) ~[mybatis-guice-3.7.jar:3.7]
>         at org.apache.aurora.scheduler.storage.db.DbStorage.lambda$write$203(DbStorage.java:160) ~[aurora-113.jar:na]
>         at org.apache.aurora.scheduler.async.GatingDelayExecutor.closeDuring(GatingDelayExecutor.java:62) ~[aurora-113.jar:na]
>         at org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:158) ~[aurora-113.jar:na]
>         at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) ~[commons-113.jar:na]
>         at org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(SnapshotStoreImpl.java:274) ~[aurora-113.jar:na]
>         at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) ~[commons-113.jar:na]
>         at org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(SnapshotStoreImpl.java:63) ~[aurora-113.jar:na]
>         at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) ~[commons-113.jar:na]
> ...
> {noformat}
> We blamed that to fee5943a95c4f08e148dc5f1366486a8c23d5773 and reverted it in https://reviews.apache.org/r/42922/. I have been unable to reproduce it in unit tests yet. Need some further investigation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)