You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "Vrushali C (JIRA)" <ji...@apache.org> on 2017/05/11 01:41:04 UTC

[jira] [Commented] (YARN-3981) offline collector: support timeline clients not associated with an application

    [ https://issues.apache.org/jira/browse/YARN-3981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005775#comment-16005775 ] 

Vrushali C commented on YARN-3981:
----------------------------------

Thanks for the design draft Rohith. I think I have some preliminary questions, more like discussion. 

- Do I understand it correctly that flow collectors will run on each node that runs an NM in the cluster? 
- How much traffic do we think might come in? Would it be similar to app table writes? If not, is there a possibility we can run this on head node of the cluster like where RM or NNs run? Not on the same node as RM but a node similar to RM, so that it's "outside" the cluster. We have fairly big sized clusters and having each node run a collector may not be optimal. 
- aggregation is not relevant I think for a flow collector. Or do we want to support it? If not, we don't need to mention it under challenges, it is a non issue.
- We surely want to think about optimizing connections to hbase

Perhaps I will have more as I think over this further. 

> offline collector: support timeline clients not associated with an application
> ------------------------------------------------------------------------------
>
>                 Key: YARN-3981
>                 URL: https://issues.apache.org/jira/browse/YARN-3981
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Sangjin Lee
>            Assignee: Rohith Sharma K S
>              Labels: YARN-5355
>         Attachments: YARN-3981- offline-collector-draft.pdf
>
>
> In the current v.2 design, all timeline writes must belong in a flow/application context (cluster + user + flow + flow run + application).
> But there are use cases that require writing data outside the context of an application. One such example is a higher level client (e.g. tez client or hive/oozie/cascading client) writing flow-level data that spans multiple applications. We need to find a way to support them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org