You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Clément Michaud (JIRA)" <ji...@apache.org> on 2018/07/20 14:05:00 UTC

[jira] [Created] (AURORA-1993) Aurora crashes when handling an unknown custom resource

Clément Michaud created AURORA-1993:
---------------------------------------

             Summary: Aurora crashes when handling an unknown custom resource
                 Key: AURORA-1993
                 URL: https://issues.apache.org/jira/browse/AURORA-1993
             Project: Aurora
          Issue Type: Bug
    Affects Versions: 0.16.0
            Reporter: Clément Michaud


While we tried to declare network bandwidth as a custom resource in Mesos, we faced a crash in Aurora with the following stacktrace:
{code:java}
Jul 18, 2018 1:35:19 PM com.google.common.util.concurrent.ServiceManager$ServiceListener failed
SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING state.
java.lang.NullPointerException: Unknown Mesos resource: name: "network_bandwidth"
type: SCALAR
scalar {
value: 2000.0
}
role: "*"
11: "\n\adefault"
at java.util.Objects.requireNonNull(Objects.java:228)
at org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
at org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at java.util.Iterator.forEachRemaining(Iterator.java:115)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
at org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
at org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
at org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
at org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
at com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
E0718 13:35:19.240 [SlotSizeCounterService RUNNING, GuavaUtils$LifecycleShutdownListener:55] Service: SlotSizeCounterService [FAILED] faile
I0718 13:35:19.240 [SlotSizeCounterService RUNNING, Lifecycle:84] Shutting down application
I0718 13:35:19.240 [SlotSizeCounterService RUNNING, ShutdownRegistry$ShutdownRegistryImpl:77] Executing 4 shutdown commands.
I0718 13:35:19.243 [SlotSizeCounterService RUNNING, StateMachine$Builder:389] SchedulerLifecycle state machine transition ACTIVE -> DEAD
I0718 13:35:19.249073 331 sched.cpp:2021] Asked to stop the driver
I0718 13:35:19.249344 30748 sched.cpp:1203] Stopping framework 2a905643-b76f-4f17-a406-524d406f49f8-0000
I0718 13:35:19.249 [SlotSizeCounterService RUNNING, StateMachine$Builder:389] storage state machine transition READY -> STOPPED
I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$6:267] Driver exited, terminating lifecycle.
I0718 13:35:19.250 [BlockingDriverJoin, StateMachine$Builder:389] SchedulerLifecycle state machine transition DEAD -> DEAD
I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$7:287] Shutdown already invoked, ignoring extra call.
I0718 13:35:19.255 [CronLifecycle STOPPING, CronLifecycle:90] Shutting down Quartz cron scheduler.
I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:694] Scheduler QuartzScheduler_$_aurora-cron-1 shutting down.
I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:613] Scheduler QuartzScheduler_$_aurora-cron-1 paused.
I0718 13:35:19.255 [CronLifecycle STOPPING, QuartzScheduler:771] Scheduler QuartzScheduler_$_aurora-cron-1 shutdown complete.
E0718 13:35:19.945 [AsyncProcessor-0, AsyncUtil:159] java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Driver is no
{code}
It would be great if Aurora was able to handle custom resources or at least not crash.

We are using version 0.16.0.

 

https://mesos.slack.com/archives/C1KR1PRP1/p1532013001000626



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)