You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Sunil G <su...@apache.org> on 2017/11/24 17:49:06 UTC

[DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Hi All,

We would like to bring up the discussion of merging “absolute min/max
resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
in a few weeks. The goal is to get it in for Hadoop 3.1.

*Major work happened in this branch*

   - YARN-6471. Support to add min/max resource configuration for a queue
   - YARN-7332. Compute effectiveCapacity per each resource vector
   - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
   handle absolute resources.

*Regarding design details*

Please refer [1] for detailed design document.

*Regarding to testing:*

We did extensive tests for the feature in the last couple of months.
Comparing to latest trunk.

- For SLS benchmark: We didn't see observable performance gap from
simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
containers allocated per second.

- For microbenchmark: We use performance test cases added by YARN 6775, it
did not show much performance regression comparing to trunk.

*YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>

#ResourceTypes = 2. Avg of fastest 20: 55294.52
#ResourceTypes = 2. Avg of fastest 20: 55401.66

*trunk*
#ResourceTypes = 2. Avg of fastest 20: 55865.92
#ResourceTypes = 2. Avg of fastest 20: 55096.418

*Regarding to API stability:*

All newly added @Public APIs are @Unstable.

Documentation jira [3] could help to provide detailed configuration
details. This feature works from end-to-end and we are running this in our
development cluster for last couple of months and undergone good amount of
testing. Branch code is run against trunk and tracked via [4].

We would love to get your thoughts before opening a voting thread.

Special thanks to a team of folks who worked hard and contributed towards
this efforts including design discussion / patch / reviews, etc.: Wangda
Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.

[1] :
https://issues.apache.org/jira/secure/attachment/12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.Capacity.Scheduler.design-doc.v1.pdf
[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7533

[4] : https://issues.apache.org/jira/browse/YARN-7510

Thanks,

Sunil G and Wangda Tan

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Wangda Tan <wh...@gmail.com>.
Thanks Sunil for starting this, +1 from my side.

- Wangda

On Fri, Nov 24, 2017 at 9:49 AM, Sunil G <su...@apache.org> wrote:

> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>    - YARN-6471. Support to add min/max resource configuration for a queue
>    - YARN-7332. Compute effectiveCapacity per each resource vector
>    - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>    handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Sunil G <su...@apache.org>.
Thanks Eric. Appreciate the support in verifying the feature.
YARN-7575 is closed now.

- Sunil


On Tue, Nov 28, 2017 at 11:15 PM Eric Payne
<er...@yahoo.com.invalid> wrote:

> Thanks Sunil for the great work on this feature.
> I looked through the design document, reviewed the code, and tested out
> branch YARN-5881. The design makes sense and the code looks like it is
> implementing the desing in a sensible way. However, I have encountered a
> couple of bugs. I opened https://issues.apache.org/jira/browse/YARN-7575
> to track my findings. Basically, here's a summary:
>
> The design document from YARN-5881 says that for max-capacity:
> 3)  For each queue, we require: a) if max-resource not set, it
> automatically set to parent.max-resource
>
> When I try not setting
> anyyarn.scheduler.capacity.<queue-path>.maximum-capacity, the RMUI
> scheduler page refuses to render. It looks like it's in
> CapacitySchedulerPage$LeafQueueInfoBlock.
>
> Also... A job will run in the leaf queue with no max capacity set and it
> will grow to the max capacity of the cluster, but if I add resources to the
> node, the job won't grow any more even though it has pending resources.
>
> Thanks,Eric
>
>
>       From: Sunil G <su...@apache.org>
>  To: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; Hadoop
> Common <co...@hadoop.apache.org>; Hdfs-dev <
> hdfs-dev@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <
> mapreduce-dev@hadoop.apache.org>
>  Sent: Friday, November 24, 2017 11:49 AM
>  Subject: [DISCUSS] Merge Absolute resource configuration support in
> Capacity Scheduler (YARN-5881) to trunk
>
> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>   - YARN-6471. Support to add min/max resource configuration for a queue
>   - YARN-7332. Compute effectiveCapacity per each resource vector
>   - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>   handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
>
> https://issues.apache.org/jira/secure/attachment/12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Sunil G <su...@apache.org>.
Thanks Eric. Appreciate the support in verifying the feature.
YARN-7575 is closed now.

- Sunil


On Tue, Nov 28, 2017 at 11:15 PM Eric Payne
<er...@yahoo.com.invalid> wrote:

> Thanks Sunil for the great work on this feature.
> I looked through the design document, reviewed the code, and tested out
> branch YARN-5881. The design makes sense and the code looks like it is
> implementing the desing in a sensible way. However, I have encountered a
> couple of bugs. I opened https://issues.apache.org/jira/browse/YARN-7575
> to track my findings. Basically, here's a summary:
>
> The design document from YARN-5881 says that for max-capacity:
> 3)  For each queue, we require: a) if max-resource not set, it
> automatically set to parent.max-resource
>
> When I try not setting
> anyyarn.scheduler.capacity.<queue-path>.maximum-capacity, the RMUI
> scheduler page refuses to render. It looks like it's in
> CapacitySchedulerPage$LeafQueueInfoBlock.
>
> Also... A job will run in the leaf queue with no max capacity set and it
> will grow to the max capacity of the cluster, but if I add resources to the
> node, the job won't grow any more even though it has pending resources.
>
> Thanks,Eric
>
>
>       From: Sunil G <su...@apache.org>
>  To: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; Hadoop
> Common <co...@hadoop.apache.org>; Hdfs-dev <
> hdfs-dev@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <
> mapreduce-dev@hadoop.apache.org>
>  Sent: Friday, November 24, 2017 11:49 AM
>  Subject: [DISCUSS] Merge Absolute resource configuration support in
> Capacity Scheduler (YARN-5881) to trunk
>
> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>   - YARN-6471. Support to add min/max resource configuration for a queue
>   - YARN-7332. Compute effectiveCapacity per each resource vector
>   - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>   handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
>
> https://issues.apache.org/jira/secure/attachment/12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Sunil G <su...@apache.org>.
Thanks Eric. Appreciate the support in verifying the feature.
YARN-7575 is closed now.

- Sunil


On Tue, Nov 28, 2017 at 11:15 PM Eric Payne
<er...@yahoo.com.invalid> wrote:

> Thanks Sunil for the great work on this feature.
> I looked through the design document, reviewed the code, and tested out
> branch YARN-5881. The design makes sense and the code looks like it is
> implementing the desing in a sensible way. However, I have encountered a
> couple of bugs. I opened https://issues.apache.org/jira/browse/YARN-7575
> to track my findings. Basically, here's a summary:
>
> The design document from YARN-5881 says that for max-capacity:
> 3)  For each queue, we require: a) if max-resource not set, it
> automatically set to parent.max-resource
>
> When I try not setting
> anyyarn.scheduler.capacity.<queue-path>.maximum-capacity, the RMUI
> scheduler page refuses to render. It looks like it's in
> CapacitySchedulerPage$LeafQueueInfoBlock.
>
> Also... A job will run in the leaf queue with no max capacity set and it
> will grow to the max capacity of the cluster, but if I add resources to the
> node, the job won't grow any more even though it has pending resources.
>
> Thanks,Eric
>
>
>       From: Sunil G <su...@apache.org>
>  To: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; Hadoop
> Common <co...@hadoop.apache.org>; Hdfs-dev <
> hdfs-dev@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <
> mapreduce-dev@hadoop.apache.org>
>  Sent: Friday, November 24, 2017 11:49 AM
>  Subject: [DISCUSS] Merge Absolute resource configuration support in
> Capacity Scheduler (YARN-5881) to trunk
>
> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>   - YARN-6471. Support to add min/max resource configuration for a queue
>   - YARN-7332. Compute effectiveCapacity per each resource vector
>   - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>   handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
>
> https://issues.apache.org/jira/secure/attachment/12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Sunil G <su...@apache.org>.
Thanks Eric. Appreciate the support in verifying the feature.
YARN-7575 is closed now.

- Sunil


On Tue, Nov 28, 2017 at 11:15 PM Eric Payne
<er...@yahoo.com.invalid> wrote:

> Thanks Sunil for the great work on this feature.
> I looked through the design document, reviewed the code, and tested out
> branch YARN-5881. The design makes sense and the code looks like it is
> implementing the desing in a sensible way. However, I have encountered a
> couple of bugs. I opened https://issues.apache.org/jira/browse/YARN-7575
> to track my findings. Basically, here's a summary:
>
> The design document from YARN-5881 says that for max-capacity:
> 3)  For each queue, we require: a) if max-resource not set, it
> automatically set to parent.max-resource
>
> When I try not setting
> anyyarn.scheduler.capacity.<queue-path>.maximum-capacity, the RMUI
> scheduler page refuses to render. It looks like it's in
> CapacitySchedulerPage$LeafQueueInfoBlock.
>
> Also... A job will run in the leaf queue with no max capacity set and it
> will grow to the max capacity of the cluster, but if I add resources to the
> node, the job won't grow any more even though it has pending resources.
>
> Thanks,Eric
>
>
>       From: Sunil G <su...@apache.org>
>  To: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; Hadoop
> Common <co...@hadoop.apache.org>; Hdfs-dev <
> hdfs-dev@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <
> mapreduce-dev@hadoop.apache.org>
>  Sent: Friday, November 24, 2017 11:49 AM
>  Subject: [DISCUSS] Merge Absolute resource configuration support in
> Capacity Scheduler (YARN-5881) to trunk
>
> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>   - YARN-6471. Support to add min/max resource configuration for a queue
>   - YARN-7332. Compute effectiveCapacity per each resource vector
>   - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>   handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
>
> https://issues.apache.org/jira/secure/attachment/12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Eric Payne <er...@yahoo.com.INVALID>.
Thanks Sunil for the great work on this feature.
I looked through the design document, reviewed the code, and tested out branch YARN-5881. The design makes sense and the code looks like it is implementing the desing in a sensible way. However, I have encountered a couple of bugs. I opened https://issues.apache.org/jira/browse/YARN-7575 to track my findings. Basically, here's a summary:

The design document from YARN-5881 says that for max-capacity:    
3)  For each queue, we require: a) if max-resource not set, it automatically set to parent.max-resource
     
When I try not setting anyyarn.scheduler.capacity.<queue-path>.maximum-capacity, the RMUI scheduler page refuses to render. It looks like it's in CapacitySchedulerPage$LeafQueueInfoBlock.

Also... A job will run in the leaf queue with no max capacity set and it will grow to the max capacity of the cluster, but if I add resources to the node, the job won't grow any more even though it has pending resources.

Thanks,Eric


      From: Sunil G <su...@apache.org>
 To: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; Hadoop Common <co...@hadoop.apache.org>; Hdfs-dev <hd...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org> 
 Sent: Friday, November 24, 2017 11:49 AM
 Subject: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk
   
Hi All,

We would like to bring up the discussion of merging “absolute min/max
resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
in a few weeks. The goal is to get it in for Hadoop 3.1.

*Major work happened in this branch*

  - YARN-6471. Support to add min/max resource configuration for a queue
  - YARN-7332. Compute effectiveCapacity per each resource vector
  - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
  handle absolute resources.

*Regarding design details*

Please refer [1] for detailed design document.

*Regarding to testing:*

We did extensive tests for the feature in the last couple of months.
Comparing to latest trunk.

- For SLS benchmark: We didn't see observable performance gap from
simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
containers allocated per second.

- For microbenchmark: We use performance test cases added by YARN 6775, it
did not show much performance regression comparing to trunk.

*YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>

#ResourceTypes = 2. Avg of fastest 20: 55294.52
#ResourceTypes = 2. Avg of fastest 20: 55401.66

*trunk*
#ResourceTypes = 2. Avg of fastest 20: 55865.92
#ResourceTypes = 2. Avg of fastest 20: 55096.418

*Regarding to API stability:*

All newly added @Public APIs are @Unstable.

Documentation jira [3] could help to provide detailed configuration
details. This feature works from end-to-end and we are running this in our
development cluster for last couple of months and undergone good amount of
testing. Branch code is run against trunk and tracked via [4].

We would love to get your thoughts before opening a voting thread.

Special thanks to a team of folks who worked hard and contributed towards
this efforts including design discussion / patch / reviews, etc.: Wangda
Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.

[1] :
https://issues.apache.org/jira/secure/attachment/12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.Capacity.Scheduler.design-doc.v1.pdf
[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7533

[4] : https://issues.apache.org/jira/browse/YARN-7510

Thanks,

Sunil G and Wangda Tan

   

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Sunil G <su...@apache.org>.
Thanks everyone for the feedback!

Based on positive feedback, we started voting thread in
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201711.mbox/%3CCACYiTuhzMrd_kFRT7_f4VBHejrajbCnVB1wmgHLMLXRr58y0MA%40mail.gmail.com%3E

@Carlo: Yes, this change should be straight forward except some minor
conflicts.

- Sunil



On Thu, Nov 30, 2017 at 9:34 AM Carlo Aldo Curino <ca...@gmail.com>
wrote:

> I haven't tested this, but I support the merge as the patch is very much
> needed for MS usecases as well... Can this be cherry-picked on 2.9 easily?
>
> Thanks for this contribution!
>
> Cheers,
> Carlo
>
> On Nov 29, 2017 6:34 PM, "Weiwei Yang" <ch...@hotmail.com> wrote:
>
>> Hi Sunil
>>
>> +1 from my side.
>> Actually we have applied some of these patches to our production cluster
>> since Sep this year, on over 2000+ nodes and it works nicely. +1 for the
>> merge. I am pretty sure this feature will help a lot of users, especially
>> those on cloud. Thanks for getting this done, great job!
>>
>> --
>> Weiwei
>>
>> On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <
>> rohithsharmaks@apache.org>, wrote:
>> +1, thanks Sunil for working on this feature!
>>
>> -Rohith Sharma K S
>>
>> On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:
>>
>> Hi All,
>>
>> We would like to bring up the discussion of merging “absolute min/max
>> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
>> in a few weeks. The goal is to get it in for Hadoop 3.1.
>>
>> *Major work happened in this branch*
>>
>> - YARN-6471. Support to add min/max resource configuration for a queue
>> - YARN-7332. Compute effectiveCapacity per each resource vector
>> - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>> handle absolute resources.
>>
>> *Regarding design details*
>>
>> Please refer [1] for detailed design document.
>>
>> *Regarding to testing:*
>>
>> We did extensive tests for the feature in the last couple of months.
>> Comparing to latest trunk.
>>
>> - For SLS benchmark: We didn't see observable performance gap from
>> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
>> containers allocated per second.
>>
>> - For microbenchmark: We use performance test cases added by YARN 6775, it
>> did not show much performance regression comparing to trunk.
>>
>> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881
>>
>> #ResourceTypes = 2. Avg of fastest 20: 55294.52
>> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>>
>> *trunk*
>> #ResourceTypes = 2. Avg of fastest 20: 55865.92
>> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>>
>> *Regarding to API stability:*
>>
>> All newly added @Public APIs are @Unstable.
>>
>> Documentation jira [3] could help to provide detailed configuration
>> details. This feature works from end-to-end and we are running this in our
>> development cluster for last couple of months and undergone good amount of
>> testing. Branch code is run against trunk and tracked via [4].
>>
>> We would love to get your thoughts before opening a voting thread.
>>
>> Special thanks to a team of folks who worked hard and contributed towards
>> this efforts including design discussion / patch / reviews, etc.: Wangda
>> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>>
>> [1] :
>> https://issues.apache.org/jira/secure/attachment/
>> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
>> Capacity.Scheduler.design-doc.v1.pdf
>> [2] : https://issues.apache.org/jira/browse/YARN-5881
>>
>> [3] : https://issues.apache.org/jira/browse/YARN-7533
>>
>> [4] : https://issues.apache.org/jira/browse/YARN-7510
>>
>> Thanks,
>>
>> Sunil G and Wangda Tan
>>
>>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Sunil G <su...@apache.org>.
Thanks everyone for the feedback!

Based on positive feedback, we started voting thread in
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201711.mbox/%3CCACYiTuhzMrd_kFRT7_f4VBHejrajbCnVB1wmgHLMLXRr58y0MA%40mail.gmail.com%3E

@Carlo: Yes, this change should be straight forward except some minor
conflicts.

- Sunil



On Thu, Nov 30, 2017 at 9:34 AM Carlo Aldo Curino <ca...@gmail.com>
wrote:

> I haven't tested this, but I support the merge as the patch is very much
> needed for MS usecases as well... Can this be cherry-picked on 2.9 easily?
>
> Thanks for this contribution!
>
> Cheers,
> Carlo
>
> On Nov 29, 2017 6:34 PM, "Weiwei Yang" <ch...@hotmail.com> wrote:
>
>> Hi Sunil
>>
>> +1 from my side.
>> Actually we have applied some of these patches to our production cluster
>> since Sep this year, on over 2000+ nodes and it works nicely. +1 for the
>> merge. I am pretty sure this feature will help a lot of users, especially
>> those on cloud. Thanks for getting this done, great job!
>>
>> --
>> Weiwei
>>
>> On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <
>> rohithsharmaks@apache.org>, wrote:
>> +1, thanks Sunil for working on this feature!
>>
>> -Rohith Sharma K S
>>
>> On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:
>>
>> Hi All,
>>
>> We would like to bring up the discussion of merging “absolute min/max
>> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
>> in a few weeks. The goal is to get it in for Hadoop 3.1.
>>
>> *Major work happened in this branch*
>>
>> - YARN-6471. Support to add min/max resource configuration for a queue
>> - YARN-7332. Compute effectiveCapacity per each resource vector
>> - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>> handle absolute resources.
>>
>> *Regarding design details*
>>
>> Please refer [1] for detailed design document.
>>
>> *Regarding to testing:*
>>
>> We did extensive tests for the feature in the last couple of months.
>> Comparing to latest trunk.
>>
>> - For SLS benchmark: We didn't see observable performance gap from
>> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
>> containers allocated per second.
>>
>> - For microbenchmark: We use performance test cases added by YARN 6775, it
>> did not show much performance regression comparing to trunk.
>>
>> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881
>>
>> #ResourceTypes = 2. Avg of fastest 20: 55294.52
>> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>>
>> *trunk*
>> #ResourceTypes = 2. Avg of fastest 20: 55865.92
>> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>>
>> *Regarding to API stability:*
>>
>> All newly added @Public APIs are @Unstable.
>>
>> Documentation jira [3] could help to provide detailed configuration
>> details. This feature works from end-to-end and we are running this in our
>> development cluster for last couple of months and undergone good amount of
>> testing. Branch code is run against trunk and tracked via [4].
>>
>> We would love to get your thoughts before opening a voting thread.
>>
>> Special thanks to a team of folks who worked hard and contributed towards
>> this efforts including design discussion / patch / reviews, etc.: Wangda
>> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>>
>> [1] :
>> https://issues.apache.org/jira/secure/attachment/
>> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
>> Capacity.Scheduler.design-doc.v1.pdf
>> [2] : https://issues.apache.org/jira/browse/YARN-5881
>>
>> [3] : https://issues.apache.org/jira/browse/YARN-7533
>>
>> [4] : https://issues.apache.org/jira/browse/YARN-7510
>>
>> Thanks,
>>
>> Sunil G and Wangda Tan
>>
>>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Sunil G <su...@apache.org>.
Thanks everyone for the feedback!

Based on positive feedback, we started voting thread in
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201711.mbox/%3CCACYiTuhzMrd_kFRT7_f4VBHejrajbCnVB1wmgHLMLXRr58y0MA%40mail.gmail.com%3E

@Carlo: Yes, this change should be straight forward except some minor
conflicts.

- Sunil



On Thu, Nov 30, 2017 at 9:34 AM Carlo Aldo Curino <ca...@gmail.com>
wrote:

> I haven't tested this, but I support the merge as the patch is very much
> needed for MS usecases as well... Can this be cherry-picked on 2.9 easily?
>
> Thanks for this contribution!
>
> Cheers,
> Carlo
>
> On Nov 29, 2017 6:34 PM, "Weiwei Yang" <ch...@hotmail.com> wrote:
>
>> Hi Sunil
>>
>> +1 from my side.
>> Actually we have applied some of these patches to our production cluster
>> since Sep this year, on over 2000+ nodes and it works nicely. +1 for the
>> merge. I am pretty sure this feature will help a lot of users, especially
>> those on cloud. Thanks for getting this done, great job!
>>
>> --
>> Weiwei
>>
>> On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <
>> rohithsharmaks@apache.org>, wrote:
>> +1, thanks Sunil for working on this feature!
>>
>> -Rohith Sharma K S
>>
>> On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:
>>
>> Hi All,
>>
>> We would like to bring up the discussion of merging “absolute min/max
>> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
>> in a few weeks. The goal is to get it in for Hadoop 3.1.
>>
>> *Major work happened in this branch*
>>
>> - YARN-6471. Support to add min/max resource configuration for a queue
>> - YARN-7332. Compute effectiveCapacity per each resource vector
>> - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>> handle absolute resources.
>>
>> *Regarding design details*
>>
>> Please refer [1] for detailed design document.
>>
>> *Regarding to testing:*
>>
>> We did extensive tests for the feature in the last couple of months.
>> Comparing to latest trunk.
>>
>> - For SLS benchmark: We didn't see observable performance gap from
>> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
>> containers allocated per second.
>>
>> - For microbenchmark: We use performance test cases added by YARN 6775, it
>> did not show much performance regression comparing to trunk.
>>
>> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881
>>
>> #ResourceTypes = 2. Avg of fastest 20: 55294.52
>> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>>
>> *trunk*
>> #ResourceTypes = 2. Avg of fastest 20: 55865.92
>> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>>
>> *Regarding to API stability:*
>>
>> All newly added @Public APIs are @Unstable.
>>
>> Documentation jira [3] could help to provide detailed configuration
>> details. This feature works from end-to-end and we are running this in our
>> development cluster for last couple of months and undergone good amount of
>> testing. Branch code is run against trunk and tracked via [4].
>>
>> We would love to get your thoughts before opening a voting thread.
>>
>> Special thanks to a team of folks who worked hard and contributed towards
>> this efforts including design discussion / patch / reviews, etc.: Wangda
>> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>>
>> [1] :
>> https://issues.apache.org/jira/secure/attachment/
>> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
>> Capacity.Scheduler.design-doc.v1.pdf
>> [2] : https://issues.apache.org/jira/browse/YARN-5881
>>
>> [3] : https://issues.apache.org/jira/browse/YARN-7533
>>
>> [4] : https://issues.apache.org/jira/browse/YARN-7510
>>
>> Thanks,
>>
>> Sunil G and Wangda Tan
>>
>>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Sunil G <su...@apache.org>.
Thanks everyone for the feedback!

Based on positive feedback, we started voting thread in
http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201711.mbox/%3CCACYiTuhzMrd_kFRT7_f4VBHejrajbCnVB1wmgHLMLXRr58y0MA%40mail.gmail.com%3E

@Carlo: Yes, this change should be straight forward except some minor
conflicts.

- Sunil



On Thu, Nov 30, 2017 at 9:34 AM Carlo Aldo Curino <ca...@gmail.com>
wrote:

> I haven't tested this, but I support the merge as the patch is very much
> needed for MS usecases as well... Can this be cherry-picked on 2.9 easily?
>
> Thanks for this contribution!
>
> Cheers,
> Carlo
>
> On Nov 29, 2017 6:34 PM, "Weiwei Yang" <ch...@hotmail.com> wrote:
>
>> Hi Sunil
>>
>> +1 from my side.
>> Actually we have applied some of these patches to our production cluster
>> since Sep this year, on over 2000+ nodes and it works nicely. +1 for the
>> merge. I am pretty sure this feature will help a lot of users, especially
>> those on cloud. Thanks for getting this done, great job!
>>
>> --
>> Weiwei
>>
>> On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <
>> rohithsharmaks@apache.org>, wrote:
>> +1, thanks Sunil for working on this feature!
>>
>> -Rohith Sharma K S
>>
>> On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:
>>
>> Hi All,
>>
>> We would like to bring up the discussion of merging “absolute min/max
>> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
>> in a few weeks. The goal is to get it in for Hadoop 3.1.
>>
>> *Major work happened in this branch*
>>
>> - YARN-6471. Support to add min/max resource configuration for a queue
>> - YARN-7332. Compute effectiveCapacity per each resource vector
>> - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>> handle absolute resources.
>>
>> *Regarding design details*
>>
>> Please refer [1] for detailed design document.
>>
>> *Regarding to testing:*
>>
>> We did extensive tests for the feature in the last couple of months.
>> Comparing to latest trunk.
>>
>> - For SLS benchmark: We didn't see observable performance gap from
>> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
>> containers allocated per second.
>>
>> - For microbenchmark: We use performance test cases added by YARN 6775, it
>> did not show much performance regression comparing to trunk.
>>
>> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881
>>
>> #ResourceTypes = 2. Avg of fastest 20: 55294.52
>> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>>
>> *trunk*
>> #ResourceTypes = 2. Avg of fastest 20: 55865.92
>> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>>
>> *Regarding to API stability:*
>>
>> All newly added @Public APIs are @Unstable.
>>
>> Documentation jira [3] could help to provide detailed configuration
>> details. This feature works from end-to-end and we are running this in our
>> development cluster for last couple of months and undergone good amount of
>> testing. Branch code is run against trunk and tracked via [4].
>>
>> We would love to get your thoughts before opening a voting thread.
>>
>> Special thanks to a team of folks who worked hard and contributed towards
>> this efforts including design discussion / patch / reviews, etc.: Wangda
>> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>>
>> [1] :
>> https://issues.apache.org/jira/secure/attachment/
>> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
>> Capacity.Scheduler.design-doc.v1.pdf
>> [2] : https://issues.apache.org/jira/browse/YARN-5881
>>
>> [3] : https://issues.apache.org/jira/browse/YARN-7533
>>
>> [4] : https://issues.apache.org/jira/browse/YARN-7510
>>
>> Thanks,
>>
>> Sunil G and Wangda Tan
>>
>>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Carlo Aldo Curino <ca...@gmail.com>.
I haven't tested this, but I support the merge as the patch is very much
needed for MS usecases as well... Can this be cherry-picked on 2.9 easily?

Thanks for this contribution!

Cheers,
Carlo

On Nov 29, 2017 6:34 PM, "Weiwei Yang" <ch...@hotmail.com> wrote:

> Hi Sunil
>
> +1 from my side.
> Actually we have applied some of these patches to our production cluster
> since Sep this year, on over 2000+ nodes and it works nicely. +1 for the
> merge. I am pretty sure this feature will help a lot of users, especially
> those on cloud. Thanks for getting this done, great job!
>
> --
> Weiwei
>
> On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <
> rohithsharmaks@apache.org>, wrote:
> +1, thanks Sunil for working on this feature!
>
> -Rohith Sharma K S
>
> On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:
>
> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
> - YARN-6471. Support to add min/max resource configuration for a queue
> - YARN-7332. Compute effectiveCapacity per each resource vector
> - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
> handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Carlo Aldo Curino <ca...@gmail.com>.
I haven't tested this, but I support the merge as the patch is very much
needed for MS usecases as well... Can this be cherry-picked on 2.9 easily?

Thanks for this contribution!

Cheers,
Carlo

On Nov 29, 2017 6:34 PM, "Weiwei Yang" <ch...@hotmail.com> wrote:

> Hi Sunil
>
> +1 from my side.
> Actually we have applied some of these patches to our production cluster
> since Sep this year, on over 2000+ nodes and it works nicely. +1 for the
> merge. I am pretty sure this feature will help a lot of users, especially
> those on cloud. Thanks for getting this done, great job!
>
> --
> Weiwei
>
> On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <
> rohithsharmaks@apache.org>, wrote:
> +1, thanks Sunil for working on this feature!
>
> -Rohith Sharma K S
>
> On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:
>
> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
> - YARN-6471. Support to add min/max resource configuration for a queue
> - YARN-7332. Compute effectiveCapacity per each resource vector
> - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
> handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Carlo Aldo Curino <ca...@gmail.com>.
I haven't tested this, but I support the merge as the patch is very much
needed for MS usecases as well... Can this be cherry-picked on 2.9 easily?

Thanks for this contribution!

Cheers,
Carlo

On Nov 29, 2017 6:34 PM, "Weiwei Yang" <ch...@hotmail.com> wrote:

> Hi Sunil
>
> +1 from my side.
> Actually we have applied some of these patches to our production cluster
> since Sep this year, on over 2000+ nodes and it works nicely. +1 for the
> merge. I am pretty sure this feature will help a lot of users, especially
> those on cloud. Thanks for getting this done, great job!
>
> --
> Weiwei
>
> On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <
> rohithsharmaks@apache.org>, wrote:
> +1, thanks Sunil for working on this feature!
>
> -Rohith Sharma K S
>
> On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:
>
> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
> - YARN-6471. Support to add min/max resource configuration for a queue
> - YARN-7332. Compute effectiveCapacity per each resource vector
> - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
> handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Carlo Aldo Curino <ca...@gmail.com>.
I haven't tested this, but I support the merge as the patch is very much
needed for MS usecases as well... Can this be cherry-picked on 2.9 easily?

Thanks for this contribution!

Cheers,
Carlo

On Nov 29, 2017 6:34 PM, "Weiwei Yang" <ch...@hotmail.com> wrote:

> Hi Sunil
>
> +1 from my side.
> Actually we have applied some of these patches to our production cluster
> since Sep this year, on over 2000+ nodes and it works nicely. +1 for the
> merge. I am pretty sure this feature will help a lot of users, especially
> those on cloud. Thanks for getting this done, great job!
>
> --
> Weiwei
>
> On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <
> rohithsharmaks@apache.org>, wrote:
> +1, thanks Sunil for working on this feature!
>
> -Rohith Sharma K S
>
> On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:
>
> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
> - YARN-6471. Support to add min/max resource configuration for a queue
> - YARN-7332. Compute effectiveCapacity per each resource vector
> - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
> handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Weiwei Yang <ch...@hotmail.com>.
Hi Sunil

+1 from my side.
Actually we have applied some of these patches to our production cluster since Sep this year, on over 2000+ nodes and it works nicely. +1 for the merge. I am pretty sure this feature will help a lot of users, especially those on cloud. Thanks for getting this done, great job!

--
Weiwei

On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <ro...@apache.org>, wrote:
+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:

Hi All,

We would like to bring up the discussion of merging “absolute min/max
resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
in a few weeks. The goal is to get it in for Hadoop 3.1.

*Major work happened in this branch*

- YARN-6471. Support to add min/max resource configuration for a queue
- YARN-7332. Compute effectiveCapacity per each resource vector
- YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
handle absolute resources.

*Regarding design details*

Please refer [1] for detailed design document.

*Regarding to testing:*

We did extensive tests for the feature in the last couple of months.
Comparing to latest trunk.

- For SLS benchmark: We didn't see observable performance gap from
simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
containers allocated per second.

- For microbenchmark: We use performance test cases added by YARN 6775, it
did not show much performance regression comparing to trunk.

*YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881

#ResourceTypes = 2. Avg of fastest 20: 55294.52
#ResourceTypes = 2. Avg of fastest 20: 55401.66

*trunk*
#ResourceTypes = 2. Avg of fastest 20: 55865.92
#ResourceTypes = 2. Avg of fastest 20: 55096.418

*Regarding to API stability:*

All newly added @Public APIs are @Unstable.

Documentation jira [3] could help to provide detailed configuration
details. This feature works from end-to-end and we are running this in our
development cluster for last couple of months and undergone good amount of
testing. Branch code is run against trunk and tracked via [4].

We would love to get your thoughts before opening a voting thread.

Special thanks to a team of folks who worked hard and contributed towards
this efforts including design discussion / patch / reviews, etc.: Wangda
Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.

[1] :
https://issues.apache.org/jira/secure/attachment/
12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
Capacity.Scheduler.design-doc.v1.pdf
[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7533

[4] : https://issues.apache.org/jira/browse/YARN-7510

Thanks,

Sunil G and Wangda Tan


Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Weiwei Yang <ch...@hotmail.com>.
Hi Sunil

+1 from my side.
Actually we have applied some of these patches to our production cluster since Sep this year, on over 2000+ nodes and it works nicely. +1 for the merge. I am pretty sure this feature will help a lot of users, especially those on cloud. Thanks for getting this done, great job!

--
Weiwei

On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <ro...@apache.org>, wrote:
+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:

Hi All,

We would like to bring up the discussion of merging “absolute min/max
resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
in a few weeks. The goal is to get it in for Hadoop 3.1.

*Major work happened in this branch*

- YARN-6471. Support to add min/max resource configuration for a queue
- YARN-7332. Compute effectiveCapacity per each resource vector
- YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
handle absolute resources.

*Regarding design details*

Please refer [1] for detailed design document.

*Regarding to testing:*

We did extensive tests for the feature in the last couple of months.
Comparing to latest trunk.

- For SLS benchmark: We didn't see observable performance gap from
simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
containers allocated per second.

- For microbenchmark: We use performance test cases added by YARN 6775, it
did not show much performance regression comparing to trunk.

*YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881

#ResourceTypes = 2. Avg of fastest 20: 55294.52
#ResourceTypes = 2. Avg of fastest 20: 55401.66

*trunk*
#ResourceTypes = 2. Avg of fastest 20: 55865.92
#ResourceTypes = 2. Avg of fastest 20: 55096.418

*Regarding to API stability:*

All newly added @Public APIs are @Unstable.

Documentation jira [3] could help to provide detailed configuration
details. This feature works from end-to-end and we are running this in our
development cluster for last couple of months and undergone good amount of
testing. Branch code is run against trunk and tracked via [4].

We would love to get your thoughts before opening a voting thread.

Special thanks to a team of folks who worked hard and contributed towards
this efforts including design discussion / patch / reviews, etc.: Wangda
Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.

[1] :
https://issues.apache.org/jira/secure/attachment/
12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
Capacity.Scheduler.design-doc.v1.pdf
[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7533

[4] : https://issues.apache.org/jira/browse/YARN-7510

Thanks,

Sunil G and Wangda Tan


Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Weiwei Yang <ch...@hotmail.com>.
Hi Sunil

+1 from my side.
Actually we have applied some of these patches to our production cluster since Sep this year, on over 2000+ nodes and it works nicely. +1 for the merge. I am pretty sure this feature will help a lot of users, especially those on cloud. Thanks for getting this done, great job!

--
Weiwei

On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <ro...@apache.org>, wrote:
+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:

Hi All,

We would like to bring up the discussion of merging “absolute min/max
resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
in a few weeks. The goal is to get it in for Hadoop 3.1.

*Major work happened in this branch*

- YARN-6471. Support to add min/max resource configuration for a queue
- YARN-7332. Compute effectiveCapacity per each resource vector
- YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
handle absolute resources.

*Regarding design details*

Please refer [1] for detailed design document.

*Regarding to testing:*

We did extensive tests for the feature in the last couple of months.
Comparing to latest trunk.

- For SLS benchmark: We didn't see observable performance gap from
simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
containers allocated per second.

- For microbenchmark: We use performance test cases added by YARN 6775, it
did not show much performance regression comparing to trunk.

*YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881

#ResourceTypes = 2. Avg of fastest 20: 55294.52
#ResourceTypes = 2. Avg of fastest 20: 55401.66

*trunk*
#ResourceTypes = 2. Avg of fastest 20: 55865.92
#ResourceTypes = 2. Avg of fastest 20: 55096.418

*Regarding to API stability:*

All newly added @Public APIs are @Unstable.

Documentation jira [3] could help to provide detailed configuration
details. This feature works from end-to-end and we are running this in our
development cluster for last couple of months and undergone good amount of
testing. Branch code is run against trunk and tracked via [4].

We would love to get your thoughts before opening a voting thread.

Special thanks to a team of folks who worked hard and contributed towards
this efforts including design discussion / patch / reviews, etc.: Wangda
Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.

[1] :
https://issues.apache.org/jira/secure/attachment/
12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
Capacity.Scheduler.design-doc.v1.pdf
[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7533

[4] : https://issues.apache.org/jira/browse/YARN-7510

Thanks,

Sunil G and Wangda Tan


Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Weiwei Yang <ch...@hotmail.com>.
Hi Sunil

+1 from my side.
Actually we have applied some of these patches to our production cluster since Sep this year, on over 2000+ nodes and it works nicely. +1 for the merge. I am pretty sure this feature will help a lot of users, especially those on cloud. Thanks for getting this done, great job!

--
Weiwei

On 29 Nov 2017, 9:23 PM +0800, Rohith Sharma K S <ro...@apache.org>, wrote:
+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:

Hi All,

We would like to bring up the discussion of merging “absolute min/max
resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
in a few weeks. The goal is to get it in for Hadoop 3.1.

*Major work happened in this branch*

- YARN-6471. Support to add min/max resource configuration for a queue
- YARN-7332. Compute effectiveCapacity per each resource vector
- YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
handle absolute resources.

*Regarding design details*

Please refer [1] for detailed design document.

*Regarding to testing:*

We did extensive tests for the feature in the last couple of months.
Comparing to latest trunk.

- For SLS benchmark: We didn't see observable performance gap from
simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
containers allocated per second.

- For microbenchmark: We use performance test cases added by YARN 6775, it
did not show much performance regression comparing to trunk.

*YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881

#ResourceTypes = 2. Avg of fastest 20: 55294.52
#ResourceTypes = 2. Avg of fastest 20: 55401.66

*trunk*
#ResourceTypes = 2. Avg of fastest 20: 55865.92
#ResourceTypes = 2. Avg of fastest 20: 55096.418

*Regarding to API stability:*

All newly added @Public APIs are @Unstable.

Documentation jira [3] could help to provide detailed configuration
details. This feature works from end-to-end and we are running this in our
development cluster for last couple of months and undergone good amount of
testing. Branch code is run against trunk and tracked via [4].

We would love to get your thoughts before opening a voting thread.

Special thanks to a team of folks who worked hard and contributed towards
this efforts including design discussion / patch / reviews, etc.: Wangda
Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.

[1] :
https://issues.apache.org/jira/secure/attachment/
12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
Capacity.Scheduler.design-doc.v1.pdf
[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7533

[4] : https://issues.apache.org/jira/browse/YARN-7510

Thanks,

Sunil G and Wangda Tan


Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Rohith Sharma K S <ro...@apache.org>.
+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:

> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>    - YARN-6471. Support to add min/max resource configuration for a queue
>    - YARN-7332. Compute effectiveCapacity per each resource vector
>    - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>    handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Wangda Tan <wh...@gmail.com>.
Thanks Sunil for starting this, +1 from my side.

- Wangda

On Fri, Nov 24, 2017 at 9:49 AM, Sunil G <su...@apache.org> wrote:

> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>    - YARN-6471. Support to add min/max resource configuration for a queue
>    - YARN-7332. Compute effectiveCapacity per each resource vector
>    - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>    handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Rohith Sharma K S <ro...@apache.org>.
+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:

> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>    - YARN-6471. Support to add min/max resource configuration for a queue
>    - YARN-7332. Compute effectiveCapacity per each resource vector
>    - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>    handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Eric Payne <er...@yahoo.com.INVALID>.
Thanks Sunil for the great work on this feature.
I looked through the design document, reviewed the code, and tested out branch YARN-5881. The design makes sense and the code looks like it is implementing the desing in a sensible way. However, I have encountered a couple of bugs. I opened https://issues.apache.org/jira/browse/YARN-7575 to track my findings. Basically, here's a summary:

The design document from YARN-5881 says that for max-capacity:    
3)  For each queue, we require: a) if max-resource not set, it automatically set to parent.max-resource
     
When I try not setting anyyarn.scheduler.capacity.<queue-path>.maximum-capacity, the RMUI scheduler page refuses to render. It looks like it's in CapacitySchedulerPage$LeafQueueInfoBlock.

Also... A job will run in the leaf queue with no max capacity set and it will grow to the max capacity of the cluster, but if I add resources to the node, the job won't grow any more even though it has pending resources.

Thanks,Eric


      From: Sunil G <su...@apache.org>
 To: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; Hadoop Common <co...@hadoop.apache.org>; Hdfs-dev <hd...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org> 
 Sent: Friday, November 24, 2017 11:49 AM
 Subject: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk
   
Hi All,

We would like to bring up the discussion of merging “absolute min/max
resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
in a few weeks. The goal is to get it in for Hadoop 3.1.

*Major work happened in this branch*

  - YARN-6471. Support to add min/max resource configuration for a queue
  - YARN-7332. Compute effectiveCapacity per each resource vector
  - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
  handle absolute resources.

*Regarding design details*

Please refer [1] for detailed design document.

*Regarding to testing:*

We did extensive tests for the feature in the last couple of months.
Comparing to latest trunk.

- For SLS benchmark: We didn't see observable performance gap from
simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
containers allocated per second.

- For microbenchmark: We use performance test cases added by YARN 6775, it
did not show much performance regression comparing to trunk.

*YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>

#ResourceTypes = 2. Avg of fastest 20: 55294.52
#ResourceTypes = 2. Avg of fastest 20: 55401.66

*trunk*
#ResourceTypes = 2. Avg of fastest 20: 55865.92
#ResourceTypes = 2. Avg of fastest 20: 55096.418

*Regarding to API stability:*

All newly added @Public APIs are @Unstable.

Documentation jira [3] could help to provide detailed configuration
details. This feature works from end-to-end and we are running this in our
development cluster for last couple of months and undergone good amount of
testing. Branch code is run against trunk and tracked via [4].

We would love to get your thoughts before opening a voting thread.

Special thanks to a team of folks who worked hard and contributed towards
this efforts including design discussion / patch / reviews, etc.: Wangda
Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.

[1] :
https://issues.apache.org/jira/secure/attachment/12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.Capacity.Scheduler.design-doc.v1.pdf
[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7533

[4] : https://issues.apache.org/jira/browse/YARN-7510

Thanks,

Sunil G and Wangda Tan

   

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Wangda Tan <wh...@gmail.com>.
Thanks Sunil for starting this, +1 from my side.

- Wangda

On Fri, Nov 24, 2017 at 9:49 AM, Sunil G <su...@apache.org> wrote:

> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>    - YARN-6471. Support to add min/max resource configuration for a queue
>    - YARN-7332. Compute effectiveCapacity per each resource vector
>    - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>    handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Rohith Sharma K S <ro...@apache.org>.
+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:

> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>    - YARN-6471. Support to add min/max resource configuration for a queue
>    - YARN-7332. Compute effectiveCapacity per each resource vector
>    - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>    handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Rohith Sharma K S <ro...@apache.org>.
+1, thanks Sunil for working on this feature!

-Rohith Sharma K S

On 24 November 2017 at 23:19, Sunil G <su...@apache.org> wrote:

> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>    - YARN-6471. Support to add min/max resource configuration for a queue
>    - YARN-7332. Compute effectiveCapacity per each resource vector
>    - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>    handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Eric Payne <er...@yahoo.com.INVALID>.
Thanks Sunil for the great work on this feature.
I looked through the design document, reviewed the code, and tested out branch YARN-5881. The design makes sense and the code looks like it is implementing the desing in a sensible way. However, I have encountered a couple of bugs. I opened https://issues.apache.org/jira/browse/YARN-7575 to track my findings. Basically, here's a summary:

The design document from YARN-5881 says that for max-capacity:    
3)  For each queue, we require: a) if max-resource not set, it automatically set to parent.max-resource
     
When I try not setting anyyarn.scheduler.capacity.<queue-path>.maximum-capacity, the RMUI scheduler page refuses to render. It looks like it's in CapacitySchedulerPage$LeafQueueInfoBlock.

Also... A job will run in the leaf queue with no max capacity set and it will grow to the max capacity of the cluster, but if I add resources to the node, the job won't grow any more even though it has pending resources.

Thanks,Eric


      From: Sunil G <su...@apache.org>
 To: "yarn-dev@hadoop.apache.org" <ya...@hadoop.apache.org>; Hadoop Common <co...@hadoop.apache.org>; Hdfs-dev <hd...@hadoop.apache.org>; "mapreduce-dev@hadoop.apache.org" <ma...@hadoop.apache.org> 
 Sent: Friday, November 24, 2017 11:49 AM
 Subject: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk
   
Hi All,

We would like to bring up the discussion of merging “absolute min/max
resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
in a few weeks. The goal is to get it in for Hadoop 3.1.

*Major work happened in this branch*

  - YARN-6471. Support to add min/max resource configuration for a queue
  - YARN-7332. Compute effectiveCapacity per each resource vector
  - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
  handle absolute resources.

*Regarding design details*

Please refer [1] for detailed design document.

*Regarding to testing:*

We did extensive tests for the feature in the last couple of months.
Comparing to latest trunk.

- For SLS benchmark: We didn't see observable performance gap from
simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
containers allocated per second.

- For microbenchmark: We use performance test cases added by YARN 6775, it
did not show much performance regression comparing to trunk.

*YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>

#ResourceTypes = 2. Avg of fastest 20: 55294.52
#ResourceTypes = 2. Avg of fastest 20: 55401.66

*trunk*
#ResourceTypes = 2. Avg of fastest 20: 55865.92
#ResourceTypes = 2. Avg of fastest 20: 55096.418

*Regarding to API stability:*

All newly added @Public APIs are @Unstable.

Documentation jira [3] could help to provide detailed configuration
details. This feature works from end-to-end and we are running this in our
development cluster for last couple of months and undergone good amount of
testing. Branch code is run against trunk and tracked via [4].

We would love to get your thoughts before opening a voting thread.

Special thanks to a team of folks who worked hard and contributed towards
this efforts including design discussion / patch / reviews, etc.: Wangda
Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.

[1] :
https://issues.apache.org/jira/secure/attachment/12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.Capacity.Scheduler.design-doc.v1.pdf
[2] : https://issues.apache.org/jira/browse/YARN-5881

[3] : https://issues.apache.org/jira/browse/YARN-7533

[4] : https://issues.apache.org/jira/browse/YARN-7510

Thanks,

Sunil G and Wangda Tan

   

Re: [DISCUSS] Merge Absolute resource configuration support in Capacity Scheduler (YARN-5881) to trunk

Posted by Wangda Tan <wh...@gmail.com>.
Thanks Sunil for starting this, +1 from my side.

- Wangda

On Fri, Nov 24, 2017 at 9:49 AM, Sunil G <su...@apache.org> wrote:

> Hi All,
>
> We would like to bring up the discussion of merging “absolute min/max
> resources support in capacity scheduler” branch (YARN-5881) [2] into trunk
> in a few weeks. The goal is to get it in for Hadoop 3.1.
>
> *Major work happened in this branch*
>
>    - YARN-6471. Support to add min/max resource configuration for a queue
>    - YARN-7332. Compute effectiveCapacity per each resource vector
>    - YARN-7411. Inter-Queue preemption's computeFixpointAllocation need to
>    handle absolute resources.
>
> *Regarding design details*
>
> Please refer [1] for detailed design document.
>
> *Regarding to testing:*
>
> We did extensive tests for the feature in the last couple of months.
> Comparing to latest trunk.
>
> - For SLS benchmark: We didn't see observable performance gap from
> simulated test based on 8K nodes SLS traces (1 PB memory). We got 3k+
> containers allocated per second.
>
> - For microbenchmark: We use performance test cases added by YARN 6775, it
> did not show much performance regression comparing to trunk.
>
> *YARN-5881* <https://issues.apache.org/jira/browse/YARN-5881>
>
> #ResourceTypes = 2. Avg of fastest 20: 55294.52
> #ResourceTypes = 2. Avg of fastest 20: 55401.66
>
> *trunk*
> #ResourceTypes = 2. Avg of fastest 20: 55865.92
> #ResourceTypes = 2. Avg of fastest 20: 55096.418
>
> *Regarding to API stability:*
>
> All newly added @Public APIs are @Unstable.
>
> Documentation jira [3] could help to provide detailed configuration
> details. This feature works from end-to-end and we are running this in our
> development cluster for last couple of months and undergone good amount of
> testing. Branch code is run against trunk and tracked via [4].
>
> We would love to get your thoughts before opening a voting thread.
>
> Special thanks to a team of folks who worked hard and contributed towards
> this efforts including design discussion / patch / reviews, etc.: Wangda
> Tan, Vinod Kumar Vavilappali, Rohith Sharma K S.
>
> [1] :
> https://issues.apache.org/jira/secure/attachment/
> 12855984/YARN-5881.Support.Absolute.Min.Max.Resource.In.
> Capacity.Scheduler.design-doc.v1.pdf
> [2] : https://issues.apache.org/jira/browse/YARN-5881
>
> [3] : https://issues.apache.org/jira/browse/YARN-7533
>
> [4] : https://issues.apache.org/jira/browse/YARN-7510
>
> Thanks,
>
> Sunil G and Wangda Tan
>