You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Varun Saxena (JIRA)" <ji...@apache.org> on 2016/05/30 05:20:12 UTC

[jira] [Comment Edited] (MAPREDUCE-6688) Store job configurations in Timeline Service v2

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265635#comment-15265635 ] 

Varun Saxena edited comment on MAPREDUCE-6688 at 5/30/16 5:19 AM:
------------------------------------------------------------------

I actually wanted this point up for discussion. Forgot to mention it.
sync or async semantically is decided more on the basis of which entities we would want to publish immediately rather than if they have to be merged or not. Are configs something which have to be published immediately as part of sync put ?

There can be a fair argument in favor of sending together all entities in one shot for a sync though. But we can convert list to array outside as well. And for converting into an array I will have to first use a list anyways(as array size cannot be predetermined in some cases).

I guess you mean the same, but just to elaborate for others as well. 
The reason I am looping through a list and putting entities one by one instead of turning it into an array and publishing in a single put call is because of consideration to the fact that entities are merged together for async calls. 
From what I remember of YARN-3367, we were waiting up to 10 TimelineEnties object before publishing. Key is that we wait for 10 TimelineEntities objects and not TimelineEntity ones. We do not check how many entities are wrapped inside a single TimelineEntities object. Correct me if I am wrong.
If I pass an array of 10 entities, all those entities would be wrapped up in a single TimelineEntities object. And hence would count as a single addition to the queue. If I put them separately, it will be counted as 10 additions to the queue. Hence went with looping over.
Now, the reason I chose 100k as the limit was assuming that even if all 10 entities go in single call, the payload size will be 1 M which IMO is fine enough. If 1M is not fine, we can change the limit size to something like 50k(say).

Would like to hear views of others on the same.

bq. This solution looks fine as of now but would require changes if we adopt different approach for publishing metrics and configurations as per YARN-3401.
Even if we were to route our entities through RM, we would likely do that based on entity type(i.e. route entities with YARN entity type via RM). That is one solution which comes to my mind for YARN-3401.
In that case current structure of code should work well.


was (Author: varun_saxena):
I actually wanted this point up for discussion. Forgot to mention it.
sync or async semantically is decided more on the basis of which entities we would want to publish immediately rather than if they have to be merged or not. Are configs something which have to be published immediately as part of sync put ?

There can be a fair argument in favor of sending together all entities in one short for a sync though. But we can convert list to array outside as well. And for converting into an array I will have to first use a list anyways(as array size cannot be predetermined in some cases).

I guess you mean the same, but just to elaborate for others as well. 
The reason I am looping through a list and putting entities one by one instead of turning it into an array and publishing in a single put call is because of consideration to the fact that entities are merged together for async calls. 
From what I remember of YARN-3367, we were waiting up to 10 TimelineEnties object before publishing. Key is that we wait for 10 TimelineEntities objects and not TimelineEntity ones. We do not check how many entities are wrapped inside a single TimelineEntities object. Correct me if I am wrong.
If I pass an array of 10 entities, all those entities would be wrapped up in a single TimelineEntities object. And hence would count as a single addition to the queue. If I put them separately, it will be counted as 10 additions to the queue. Hence went with looping over.
Now, the reason I chose 100k as the limit was assuming that even if all 10 entities go in single call, the payload size will be 1 M which IMO is fine enough. If 1M is not fine, we can change the limit size to something like 50k(say).

Would like to hear views of others on the same.

bq. This solution looks fine as of now but would require changes if we adopt different approach for publishing metrics and configurations as per YARN-3401.
Even if we were to route our entities through RM, we would likely do that based on entity type(i.e. route entities with YARN entity type via RM). That is one solution which comes to my mind for YARN-3401.
In that case current structure of code should work well.

> Store job configurations in Timeline Service v2
> -----------------------------------------------
>
>                 Key: MAPREDUCE-6688
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6688
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster
>    Affects Versions: YARN-2928
>            Reporter: Junping Du
>            Assignee: Varun Saxena
>              Labels: yarn-2928-1st-milestone
>             Fix For: YARN-2928
>
>         Attachments: MAPREDUCE-6688-YARN-2928.01.patch, MAPREDUCE-6688-YARN-2928.02.patch, MAPREDUCE-6688-YARN-2928.03.patch, MAPREDUCE-6688-YARN-2928.04.patch, MAPREDUCE-6688-YARN-2928.v2.01.patch, MAPREDUCE-6688-YARN-2928.v2.02.patch, YARN-3959-YARN-2928.01.patch
>
>
> We already have configuration field in HBase schema for application entity. We need to make sure AM write it out when it get launched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org