You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Xuan Gong (JIRA)" <ji...@apache.org> on 2018/03/06 12:53:00 UTC

[jira] [Comment Edited] (YARN-7952) Find a way to persist the log aggregation status

    [ https://issues.apache.org/jira/browse/YARN-7952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387707#comment-16387707 ] 

Xuan Gong edited comment on YARN-7952 at 3/6/18 12:52 PM:
----------------------------------------------------------

Thanks for the review. [~leftnoteasy]
{quote}It looks like pullLock should be a readLock and updateLocker should be a writeLock since pull doesn't change internal data, correct?
{quote}
No, it is not. Let me try to explain this. For one application in one NM, we only have one AppLogAggregator which would do the log aggregation which means it would call updateLogAggregationStatus. And we should allow different AppLogAggregator to update their own log aggregation status. In this case, I think that the readLock can be used here.

But for the pull call, we need to make sure when we do the pull, there is no update. (or when we do update, we should block pull)

bq.  For overall workflow of this class, IMO it should be:
 This is right.(a. When log aggregation starts, application will be added to {{NMLogAggregationStatusTracker}} with timestamp.)
 This is not correct (b. When RM acknowledges application finished, application will be removed from the {{NMLogAggregationStatusTracker) Because as I described in option One, currently, RM will not figure out the log aggregation status automatically by itself until some CLI or web API call happens. So, RM will not know whether the status is final immediately that is why NM would check and remove it periodically without RM.}}

This is correct.(c. If configured timeout reaches, application will be removed from the {{NMLogAggregationStatusTracker}}.)

 
{quote} Since the currentTime always almost equal to updateTime. The if statement should be false, correct? And I think if application is null in NM side, we should not add it to the track, correct?
{quote}
This if is mainly for handling the mis-order log aggregation update call. So, first of all, the currentTime should be always larger than updateTime. If this is the mis-order log aggregation update call, and it is out of timeOut period, we should ignore it.

 
{quote}  I'm not sure if we should ever change the {{lastModifiedTime}}. IIUC, the RM will timeout log aggregation status after configured timeout. What's the purpose of updating the {{lastModifiedTime}}? Will this cause OOM issue potentially?
{quote}
 

This check is also used to handle the mis-order log aggregation update call. We should always handle the latest update call. If the call is from previous, we need to ignore it.

 
{quote}  the {{long updateTime}} parameter is not necessary, can always use System.currentTime instead.
{quote}
 

We need, See my previous comment.

 

For other re-name comment, i will handle them together after your next comment.

 

  


was (Author: xgong):
Thanks for the review. [~leftnoteasy]

bq. It looks like pullLock should be a readLock and updateLocker should be a writeLock since pull doesn't change internal data, correct?

No, it is not. Let me try to explain this. For one application in one NM, we only have one AppLogAggregator which would do the log aggregation which means it would call updateLogAggregationStatus. And we should allow different AppLogAggregator to update their own log aggregation status. In this case, I think that the readLock can be used here.

But for the pull call, we need to make sure when we do the pull, there is no update. (or when we do update, we should block pull)

bq .For overall workflow of this class, IMO it should be:
This is right.(a. When log aggregation starts, application will be added to {{NMLogAggregationStatusTracker}} with timestamp.)
This is not correct (b. When RM acknowledges application finished, application will be removed from the {{NMLogAggregationStatusTracker) Because as I described in option One, currently, RM will not figure out the log aggregation status automatically by itself until some CLI or web API call happens. So, RM will not know whether the status is final immediately that is why NM would check and remove it periodically without RM.}}

This is correct.(c. If configured timeout reaches, application will be removed from the {{NMLogAggregationStatusTracker}}.)

 

bq. Since the currentTime always almost equal to updateTime. The if statement should be false, correct? And I think if application is null in NM side, we should not add it to the track, correct?

This if is mainly for handling the mis-order log aggregation update call. So, first of all, the currentTime should be always larger than updateTime. If this is the mis-order log aggregation update call, and it is out of timeOut period, we should ignore it.

 

bq.  I'm not sure if we should ever change the {{lastModifiedTime}}. IIUC, the RM will timeout log aggregation status after configured timeout. What's the purpose of updating the {{lastModifiedTime}}? Will this cause OOM issue potentially?

 

This check is also used to handle the mis-order log aggregation update call. We should always handle the latest update call. If the call is from previous, we need to ignore it.

 

bq.  the {{long updateTime}} parameter is not necessary, can always use System.currentTime instead.

 

We need, See my previous comment.

 

For other re-name comment, i will handle them together after your next comment.

 

  

> Find a way to persist the log aggregation status
> ------------------------------------------------
>
>                 Key: YARN-7952
>                 URL: https://issues.apache.org/jira/browse/YARN-7952
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>            Priority: Major
>         Attachments: YARN-7952-poc.patch, YARN-7952.1.patch, YARN-7952.2.patch
>
>
> In MAPREDUCE-6415, we have created a CLI to har the aggregated logs, and In YARN-4946: RM should write out Aggregated Log Completion file flag next to logs, we have a discussion on how we can get the log aggregation status: make a client call to RM or get it directly from the Distributed file system(HDFS).
> No matter which approach we would like to choose, we need to figure out a way to persist the log aggregation status first. This ticket is used to track the working progress for this purpose.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org