You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Andras Gyori (Jira)" <ji...@apache.org> on 2021/05/03 08:28:00 UTC

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

    [ https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338237#comment-17338237 ] 

Andras Gyori commented on YARN-9927:
------------------------------------

Thank you [~zhuqi] for the patch. I think this approach is well thought, because it reuses existing logic already established elsewhere (services as separate threads). On this part I have one addition:
 * Please move MultiThreadDispatcher to its own file as ResourceManager already has quite a good amount of code.

As for RMNode event handling, I have one proposition. Usually keeping code as simple as possible is a good recommendation, but I do think event handling is a crucial part of YARN, and it might be worthwhile to provide fine tuning options. The RMNode event handling is a good way to improve performance, but I could see a value in providing a more generic way of event handling. A proof of concept implementation of my proposition is:
 # Create a MultiThreadEventHandler wrapper
{code:java}
 public static class MultiThreadEventHandler implements EventHandler<Event<?>> {
    private final ThreadPoolExecutor multiHandlerThreadPool;
    private final EventHandler<Event<?>> handler;

    public MultiThreadEventHandler(EventHandler<Event<?>> handler,
                                   int maximumPoolSize) {
      this.handler = handler;
      ThreadFactory threadFactory = new ThreadFactoryBuilder()
          .setNameFormat("multiHandlerThread #%d")
          .build();
      multiHandlerThreadPool = new ThreadPoolExecutor(
          5, maximumPoolSize, 10, TimeUnit.SECONDS,
          new LinkedBlockingQueue<>(), threadFactory);
    }

    @Override
    public void handle(Event<?> event) {
      multiHandlerThreadPool.submit(() -> handler.handle(event));
    }
  }
{code}

 # Provide configuration values to set MultiThreadEventHandler for a specific EventType and the MultiThreadDispatcher#register would look like this
{code:java}
 @Override
    public void register(Class<? extends Enum> eventType,
        EventHandler handler) {
      if (eventTypeDispatcherMap.get(eventType) == null) {
        AsyncDispatcher asyncDispatcher =
            createDispatcher(eventType);
        eventTypeDispatcherMap.put(eventType,
            asyncDispatcher);
        addIfService(asyncDispatcher);
      }
      EventHandler registeredHandler = handler;
      boolean isMultiThreadEventHandler = getConfig().getBoolean("yarn.scheduler.event." + eventType.getCanonicalName()
          + ".multi-thread-handler.enabled", false);
      if (isMultiThreadEventHandler) {
        int poolSize = getConfig().getInt("yarn.scheduler.event." + eventType.getCanonicalName()
            + ".multi-thread-handler.max-pool-size", 5);
        registeredHandler = new MultiThreadEventHandler(handler, poolSize);
      }

      eventTypeDispatcherMap.
          get(eventType).register(eventType, registeredHandler);
    }
{code}

As it was emphasised before, this is a performance critical section of YARN, therefore some kind of stress test done via SLS or manually would need to be done to make sure RM is not crippled by these changes and the performance increase justifies this complexity and extended hardware resource usage. 

> RM multi-thread event processing mechanism
> ------------------------------------------
>
>                 Key: YARN-9927
>                 URL: https://issues.apache.org/jira/browse/YARN-9927
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>    Affects Versions: 3.0.0, 2.9.2
>            Reporter: hcarrot
>            Assignee: Qi Zhu
>            Priority: Major
>         Attachments: RM multi-thread event processing mechanism.pdf, YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, YARN-9927.004.patch, YARN-9927.005.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher queue. After analysis of RM event monitoring data and RM event processing logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org