You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Wangda Tan (JIRA)" <ji...@apache.org> on 2016/04/22 01:13:12 UTC
[jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64

    [ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252976#comment-15252976 ] 

Wangda Tan commented on YARN-4844:
----------------------------------

Discussed with [~vinodkv], [~jianhe], [~hitesh] about this issue.

A good news is, Google PB has backward/forward compatibility for all int_ fields, see: https://developers.google.com/protocol-buffers/docs/proto#updating:
bq. int32, uint32, int64, uint64, and bool are all compatible – this means you can change a field from one of these types to another without breaking forwards- or backwards-compatibility. If a number is parsed from the wire which doesn't fit in the corresponding type, you will get the same effect as if you had cast the number to that type in C++ (e.g. if a 64-bit number is read as an int32, it will be truncated to 32 bits).
So we have no problem to change ResourceProto from int32 to int64.

In addition to .proto change, following changes are required for API record : Resource.
- Update {{set_(int ...)}} to {{set_(long ...)}}, there's no compatible issue for setters
- Add {{getMemoryLong}} and {{getVirtualCoresLong}} method

And also, we need update Metrics objects related to Resources, such as QueueMetrics, etc. AFAIK, there's no compatibility issue.

The last part is scheduler and test fixes.

Attached ver.1 patch for review.

> Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
> ------------------------------------------------------------------
>
>                 Key: YARN-4844
>                 URL: https://issues.apache.org/jira/browse/YARN-4844
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Blocker
>         Attachments: YARN-4844.1.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G memory, we will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending resources of running apps to cluster's total pending resources. If a problematic app requires too much resources (let's say 1M+ containers, each of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that there're many running apps, each of them has capped but still significant numbers of pending resources.
> So we may possibly need to upgrade int32 memory field (could include v-cores as well) to int64 to avoid integer overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)