You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Li Lu (JIRA)" <ji...@apache.org> on 2016/07/08 21:15:11 UTC

[jira] [Commented] (YARN-5340) App Name/User/RPC Port/AM Host info is missing from ATS web service or YARN CLI's app info

    [ https://issues.apache.org/jira/browse/YARN-5340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368489#comment-15368489 ] 

Li Lu commented on YARN-5340:
-----------------------------

Thanks for reporting this issue [~ssathish@hortonworks.com]! This is a very very interesting discovery. I did some debug on this issue and found out the the direct reason for the missing fields is authentication failure. The original user failed to get authentication to get the app report. Checked into the ATS returned message, I can see something like this:
{code}
{"events":[{"timestamp":1467931672057,"eventtype":"YARN_APPLICATION_FINISHED","eventinfo":{"YARN_APPLICATION_LATEST_APP_ATTEMPT":"appattempt_1467931619679_0001_000001","YARN_APPLICATION_FINAL_STATUS":"SUCCEEDED","YARN_APPLICATION_DIAGNOSTICS_INFO":"","YARN_APPLICATION_STATE":"FINISHED"}},{"timestamp":1467931652492,"eventtype":"YARN_APPLICATION_STATE_UPDATED","eventinfo":{"YARN_APPLICATION_STATE":"RUNNING"}},{"timestamp":1467931641896,"eventtype":"YARN_APPLICATION_ACLS_UPDATED","eventinfo":{}}],"entitytype":"YARN_APPLICATION","entity":"application_1467931619679_0001","starttime":1467931641896,"domain":"DEFAULT","otherinfo":{"YARN_APPLICATION_MEM_METRIC":290014,"YARN_APPLICATION_CPU_METRIC":74,"YARN_APPLICATION_VIEW_ACLS":"hrt_5 viewtestgroup"},"primaryfilters":{},"relatedentities":{}}
{code}

Note that the application creation information has been missing in the returned information. I found that in the level db, there are two <entityType, timestamp, entityId> tuples created with application application_1467931619679_0001, with two different timestamps. The application creation message is associated with a different timestamp. 

Checking the code of rolling leveldb, I can see both call-sites of RollingLevelDBTimelineStore#getAndSetStartTime is not properly synchronized, although in the comments it says that it "Should only be called when a lock has been obtained on the entity. " Then for two events on the same application arrive the timeline server concurrently, something like this may happen:
1. put1 checks existing timestamp for the application, no result. 
2. put2 checks existing timestamp for the application, no result. 
3. put1 set the application entity's timestamp to be its own timestamp 
4. put2 override the application entity's timestamp to be its own timestamp. 

After the process, put1 will write its data to a key (<entityType, timestamp, entityId>) that has a stale timestamp, which will never be read out since the time stamp is overridden by put 2. 

The original LeveldbTimelineStore does not have this problem, because it always grab a lock when it performs getAndSetStartTime. 

With regard to fix, probably making getAndSetStartTime synchronized will fix the problem. I'm wondering that making checkStartTimeInDb to be synchronized would also to the trick (since it's the only place in the process to have a read-then-update semantic). 

[~jeagles] I know you're an expert on rolling leveldb's source code, so if you have any free bandwidth, I truly appreciate your suggestions here. Thanks! 

> App Name/User/RPC Port/AM Host info is missing from ATS web service or YARN CLI's app info
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-5340
>                 URL: https://issues.apache.org/jira/browse/YARN-5340
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Sumana Sathish
>            Assignee: Li Lu
>            Priority: Critical
>
> App Name/User/RPC Port/AM Host info is missing from ATS web service or YARN CLI's app info
> {code}
> RUNNING: /usr/hdp/current/hadoop-yarn-client/bin/yarn --config /tmp/hadoopConf application -status application_1467931619679_0001
> Application Report :
> Application-Id : application_1467931619679_0001
> Application-Name : null
> Application-Type : null
> User : null
> Queue : null
> Application Priority : null
> Start-Time : 0
> Finish-Time : 1467931672057
> Progress : 100%
> State : FINISHED
> Final-State : SUCCEEDED
> Tracking-URL : N/A
> RPC Port : -1
> AM Host : N/A
> Aggregate Resource Allocation : 290014 MB-seconds, 74 vcore-seconds
> Log Aggregation Status : N/A
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org