You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Shen Yinjie (Jira)" <ji...@apache.org> on 2020/08/04 06:57:00 UTC

[jira] [Commented] (YARN-9826) Blocked threads at EntityGroupFSTimelineStore#getCachedStore

    [ https://issues.apache.org/jira/browse/YARN-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170597#comment-17170597 ] 

Shen Yinjie commented on YARN-9826:
-----------------------------------

Is there any progress on this issue? :)

> Blocked threads at EntityGroupFSTimelineStore#getCachedStore
> ------------------------------------------------------------
>
>                 Key: YARN-9826
>                 URL: https://issues.apache.org/jira/browse/YARN-9826
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: timelineserver
>    Affects Versions: 2.7.3
>            Reporter: Harunobu Daikoku
>            Priority: Minor
>
> We have observed this case several times on our production cluster where 100s of TimelineServer threads are blocked at the following synchronized block in EntityGroupFSTimelineStore#getCachedStore when our HDFS NameNode is under high load.
> {code:java}
>     synchronized (this.cachedLogs) {
>       // Note that the content in the cache log storage may be stale.
>       cacheItem = this.cachedLogs.get(groupId);
>       if (cacheItem == null) {
>         LOG.debug("Set up new cache item for id {}", groupId);
>         cacheItem = new EntityCacheItem(groupId, getConfig());
>         AppLogs appLogs = getAndSetAppLogs(groupId.getApplicationId());
>         if (appLogs != null) {
>           LOG.debug("Set applogs {} for group id {}", appLogs, groupId);
>           cacheItem.setAppLogs(appLogs);
>           this.cachedLogs.put(groupId, cacheItem);
>         } else {
>           LOG.warn("AppLogs for groupId {} is set to null!", groupId);
>         }
>       }
>     }
> {code}
> One thread inside the synchronized block performs multiple fs operations (fs.exists) inside getAndSetAppLogs, which could block other threads when, for instance, the NameNode RPC queue is full.
> One possible solution is to move getAndSetAppLogs outside the synchronized block.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org