You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Junping Du (JIRA)" <ji...@apache.org> on 2016/01/04 18:54:40 UTC

[jira] [Commented] (YARN-4265) Provide new timeline plugin storage to support fine-grained entity caching

    [ https://issues.apache.org/jira/browse/YARN-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081472#comment-15081472 ] 

Junping Du commented on YARN-4265:
----------------------------------

Thanks [~gtCarrera9] for clarification. Assume Jason is fine with going on with this patch, I quickly go through the v3 patch. 
with following comments so far (I haven't finished my review yet as patch is pretty big):

In YarnConfiguration.java,
{code}
TIMELINE_SERVICE_ENTITYGROUP_FS_STORE_SCAN_INTERVAL_SECONDS_DEFAULT = 60;
{code}
I noticed that we are setting 1 minutes as default scan interval but original patch in HDFS-3942 is 5 minutes. Why shall we do any update here? The same question on "app-cache-size", the default value in HDFS-3942 is 5 but here is 10. Any reason to update the value?

In yarn-default.xml,
{code}
+    <description>DFS path to store active application’s timeline data</description>
...
+    <description>DFS path to store done application’s timeline data</description>
{code}
DFS is very old name, use HDFS instead to be more clear.

Why we don't have any default value specified in property of "yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes"?

In hadoop-yarn-server-timeline-pluginstorage/pom.xml,


For EmptyTimelineEntityGroupPlugin.java, why we need this class? I didn't see any help provided even in tests. We should remove it if useless.

In EntityCacheItem.java,
We should have a description for this class in Javadoc.

Can we optimize the synchronization logic here? Like in synchronized method refreshCache, we are intialize/start/stop TimelineDataManager (and MemoryTimelineStore) which is quite expensive and unnecessary to block other synchronized operations. Shall we move these operations out of synchronized block?

{code}
+      LOG.warn("Error closing datamanager", e);
{code}
I think we are closing store here instead of datamanager. Isn't it?

{code}
+  public boolean needRefresh() {
+    //TODO: make a config for cache freshness
+    return (Time.monotonicNow() - lastRefresh > 10000);
+  }
{code}
Does refresh interval here need to do any coordination with scan interval specificed in "yarn.timeline-service.entity-group-fs-store.scan-interval-seconds"?

In EntityGroupFSTimelineStore.java,

{code}
+      if (appState != AppState.UNKNOWN) {
+        appLogs = new AppLogs(applicationId, appDirPath, appState);
+        LOG.debug("Create and try to add new appLogs to appIdLogMap for {}",
+            applicationId);
+        AppLogs oldAppLogs = appIdLogMap.putIfAbsent(applicationId, appLogs);
+        if (oldAppLogs != null) {
+          appLogs = oldAppLogs;
+        }
+      }
{code}
This logic is very similiar with method of getAndSetActiveLog(). Can we consolidate them together?

If parseSummaryLogs() is synchronized, it seems getSummaryLogs() should be synchronized too or the getter will get stale(half-done) result.

Still checking if other multi-threads issues. More comments will come soon.

> Provide new timeline plugin storage to support fine-grained entity caching
> --------------------------------------------------------------------------
>
>                 Key: YARN-4265
>                 URL: https://issues.apache.org/jira/browse/YARN-4265
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Li Lu
>            Assignee: Li Lu
>         Attachments: YARN-4265-trunk.001.patch, YARN-4265.YARN-4234.001.patch, YARN-4265.YARN-4234.002.patch
>
>
> To support the newly proposed APIs in YARN-4234, we need to create a new plugin timeline store. The store may have similar behavior as the EntityFileTimelineStore proposed in YARN-3942, but cache date in cache id granularity, instead of application id granularity. Let's have this storage as a standalone one, instead of updating EntityFileTimelineStore, to keep the existing store (EntityFileTimelineStore) stable. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)