You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Mehrdad Nurolahzade (JIRA)" <ji...@apache.org> on 2016/12/21 20:10:58 UTC

[jira] [Updated] (AURORA-1869) Investigate the status update processing overhead

     [ https://issues.apache.org/jira/browse/AURORA-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mehrdad Nurolahzade updated AURORA-1869:
----------------------------------------
    Description: 
There is a peculiar similarity pattern between the number of task status update events received from Mesos and the number of JVM threads started by the system ([graphview|http://192.168.33.7:8081/graphview?query=rate(jvm_threads_started)%0Arate(scheduler_status_update_events)]). It seems like a new thread is started every time a status update event is processed.

{{TaskStatusHandlerImpl}} is a single-threaded service, therefore it should not instantiate new threads. Looking at status update reasons/results, the majority of status updates are associated with {{RECONCILIATION}} and should result in {{NOOP}}. Therefore, they should have no impact on the internal workers. The task state machine should short-circuit and return upon realizing that the task’s reported new state corresponds to scheduler’s view.

{code:title=TaskStateMachine.updateState()}
if (stateMachine.getState() == taskState) {
  return new TransitionResult(NOOP, ImmutableSet.of());
}
{code}

Given the volume of status update events received upon reconciliation this overhead needs to be avoided, if possible.

  was:
There is a peculiar similarity pattern between the number of task status update events received from Mesos and the number of JVM threads started by the system ([graphview|http://192.168.33.7:8081/graphview?query=rate(jvm_threads_started)%0Arate(scheduler_status_update_events)]). It seems like a new thread is started every time a status update event is processed.

{{TaskStatusHandlerImpl}} is a singleton service, therefore it should not instantiate new threads. Looking at status update reasons/results, the majority of status updates are associated with {{RECONCILIATION}} and should result in {{NOOP}}. Therefore, they should have no impact on the internal workers. The task state machine should short-circuit and return upon realizing that the task’s reported new state corresponds to scheduler’s view.

{code:title=TaskStateMachine.updateState()}
if (stateMachine.getState() == taskState) {
  return new TransitionResult(NOOP, ImmutableSet.of());
}
{code}

Given the volume of status update events received upon reconciliation this overhead needs to be avoided, if possible.


> Investigate the status update processing overhead
> -------------------------------------------------
>
>                 Key: AURORA-1869
>                 URL: https://issues.apache.org/jira/browse/AURORA-1869
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: Mehrdad Nurolahzade
>            Priority: Minor
>
> There is a peculiar similarity pattern between the number of task status update events received from Mesos and the number of JVM threads started by the system ([graphview|http://192.168.33.7:8081/graphview?query=rate(jvm_threads_started)%0Arate(scheduler_status_update_events)]). It seems like a new thread is started every time a status update event is processed.
> {{TaskStatusHandlerImpl}} is a single-threaded service, therefore it should not instantiate new threads. Looking at status update reasons/results, the majority of status updates are associated with {{RECONCILIATION}} and should result in {{NOOP}}. Therefore, they should have no impact on the internal workers. The task state machine should short-circuit and return upon realizing that the task’s reported new state corresponds to scheduler’s view.
> {code:title=TaskStateMachine.updateState()}
> if (stateMachine.getState() == taskState) {
>   return new TransitionResult(NOOP, ImmutableSet.of());
> }
> {code}
> Given the volume of status update events received upon reconciliation this overhead needs to be avoided, if possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)