You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Xuan Gong (JIRA)" <ji...@apache.org> on 2015/11/25 01:05:11 UTC

[jira] [Comment Edited] (YARN-4392) ApplicationCreatedEvent event time resets after RM restart/failover

    [ https://issues.apache.org/jira/browse/YARN-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025761#comment-15025761 ] 

Xuan Gong edited comment on YARN-4392 at 11/25/15 12:05 AM:
------------------------------------------------------------

We would see those only when the following two conditions are satisfied:
1)The app entity has been deleted from EntityDeletionThread
2) RM restart/failover

Because when we recover the Applications, we always send a new ApplicationCreatedEvent:
{code}
    this.startTime = this.systemClock.getTime();
    rmContext.getSystemMetricsPublisher().appCreated(this, startTime);
{code}
which would give this event a new timestamp.

And when generate AppReport from ATS, we are doing
{code}
if (event.getEventType().equals(
             ApplicationMetricsConstants.CREATED_EVENT_TYPE)) {
         createdTime = event.getTimestamp();
}
{code}

In that case, we would get the new timeStamp as the application start_time


was (Author: xgong):
We would see those only when the following two conditions happens are satisfied:
1)The app entity has been deleted from EntityDeletionThread
2) RM restart/failover

Because when we recover the Applications, we always send a new ApplicationCreatedEvent:
{code}
    this.startTime = this.systemClock.getTime();
    rmContext.getSystemMetricsPublisher().appCreated(this, startTime);
{code}
which would give this event a new timestamp.

And when generate AppReport from ATS, we are doing
{code}
if (event.getEventType().equals(
             ApplicationMetricsConstants.CREATED_EVENT_TYPE)) {
         createdTime = event.getTimestamp();
}
{code}

In that case, we would get the new timeStamp.

> ApplicationCreatedEvent event time resets after RM restart/failover
> -------------------------------------------------------------------
>
>                 Key: YARN-4392
>                 URL: https://issues.apache.org/jira/browse/YARN-4392
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Xuan Gong
>            Assignee: Xuan Gong
>            Priority: Critical
>
> {code}2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437453994768 is ahead of started time 1440308399674 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437454008244 is ahead of started time 1440308399676 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444305171 is ahead of started time 1440308399653 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444293115 is ahead of started time 1440308399647 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444379645 is ahead of started time 1440308399656 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444361234 is ahead of started time 1440308399655 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444342029 is ahead of started time 1440308399654 
> 2015-09-01 12:39:09,852 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444323447 is ahead of started time 1440308399654 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444430006 is ahead of started time 1440308399660 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444415698 is ahead of started time 1440308399659 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444419060 is ahead of started time 1440308399658 
> 2015-09-01 12:39:09,853 WARN util.Times (Times.java:elapsed(53)) - Finished time 1437444393931 is ahead of started time 1440308399657
> {code} . 
> From ATS logs, we would see a large amount of 'stale alerts' messages periodically



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)