You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Junping Du (JIRA)" <ji...@apache.org> on 2015/03/03 11:48:05 UTC

[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

    [ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344914#comment-14344914 ] 

Junping Du commented on YARN-3039:
----------------------------------

Thanks for comments, [~Naganarasimha]!
bq. +1 for this approach. Also if NM uses this new blocking call in AMRMClient to get aggregator address then there might not be any race conditions for posting AM container's life cycle events by NM immediately after creation of appAggregator through Aux service.
Discussed with [~vinodkv] and [~zjshen] on this again offline. It looks heavy weight to make TimelineClient to wrap AMRMClient especially for security reason it make NM to take AMRMTokens for using TimelineClient in future which make less sense. To get rid of rack condition you mentioned above, we propose to use observer pattern to make TimelineClient can listen aggregator address update in AM or NM (wrap with retry logic to tolerant connection failure).

bq. Are we just adding a method to get the aggregator address aggregator address ? or what other API's are planned ?
Per above comments, we have no plan to add API to TimelineClient to talk to RM directly.

bq. I beleive the idea of using AUX service was to to decouple NM and Timeline service. If NM will notify RM about new appAggregator creation (based on AUX service) then basically NM should be aware of PerNodeAggregatorServer is configured as AUX service, and and if it supports rebinding appAggregator for failure then it should be able to communicate with this Auxservice too, whether would this be clean approach?
I agree we want to decouple things here. However, AUX service is not the only way to deploy app aggregators. There are other ways (check from diagram in YARN-3033) that app aggregators could be deployed in a separate process or an independent container which make less sense to have a protocol between AUX service and RM. I think now we should plan to add a protocol between aggregator and NM, and then notify RM through NM-RM heartbeat on registering/rebind for aggregator.

bq. I also feel we need to support to start per app aggregator only if app requests for it (Zhijie also had mentioned abt this). If not we can make use of one default aggregator for all these kind of apps launched in NM, which is just used to post container entities from different NM's for these apps.
My 2 cents here is app aggregator should have logic to consolidate all messages (events and metrics) for one application into more complex and flexible new data model. If each NM do aggregation separately, then it still a *writer* (like old timeline service), but not an *aggregator*. Thoughts?

bq. Any discussions happened wrt RM having its own Aggregator ? I feel it would be better for RM to have it as it need not depend on any NM's to post any entities.
Agree. I think we are on the same page now.
Will update proposal to reflect all these discussions (JIRA's and offline).

> [Aggregator wireup] Implement ATS app-appgregator service discovery
> -------------------------------------------------------------------
>
>                 Key: YARN-3039
>                 URL: https://issues.apache.org/jira/browse/YARN-3039
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Junping Du
>         Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, YARN-3039-no-test.patch
>
>
> Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)