You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Peter Bacsko (Jira)" <ji...@apache.org> on 2023/05/10 07:43:00 UTC

[jira] [Comment Edited] (YUNIKORN-1715) Yunikorn performance improvements

    [ https://issues.apache.org/jira/browse/YUNIKORN-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721232#comment-17721232 ] 

Peter Bacsko edited comment on YUNIKORN-1715 at 5/10/23 7:42 AM:
-----------------------------------------------------------------

[~yichiu] as we discussed on Slack:

# Try to setup Kwok with Yunikorn
# Multiple test scenarios:
** Few apps with lot of pods (10 / 1000)
** Balanced number of apps/pods (100 / 100)
** Lot of apps with few pods (1000 / 10)

Priorities:
# Check heap & cpu profile, which is available on the REST interface 
# Network/block/mutex profiles
# Traces

We expose the URL of pprof tool: https://pkg.go.dev/net/http/pprof


was (Author: pbacsko):
[~yichiu] as we discussed on Slack:

# Try to setup Kwok with Yunikorn
# Multiple test scenarios:
** Few apps with lot of pods (10 / 1000)
** Balanced number of apps/pods (100 / 100)
** Lot of apps with few pods (1000 / 10)

Priorities:
1. Check heap & cpu profile, which is available on the REST interface 
2. Network/block/mutex profiles
3. Traces

We expose the URL of pprof tool: https://pkg.go.dev/net/http/pprof

> Yunikorn performance improvements
> ---------------------------------
>
>                 Key: YUNIKORN-1715
>                 URL: https://issues.apache.org/jira/browse/YUNIKORN-1715
>             Project: Apache YuniKorn
>          Issue Type: Improvement
>          Components: core - scheduler
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>
> There are some methods/functions in Yunikorn which are called frequently and often unnecessarily. On a large, busy cluster, eliminating these calls can result in faster scheduling cycle, therefore better throughput.
> In the listed cases below, we can re-use a previously computed value and the expensive copy/sort phase can be eliminated completely.
> {*}Retrieving node iterators{*}: in {{{}baseNodeCollection.getNodeIteratorInternal(){}}}, we always clone the tree of sorted nodes, then we build a slice. The node tree is only modified when a node gets a new score (plus node add/removal). By reusing the sorted list, we avoid cloning an {{*btree.BTree}} structure and creating {{[]*Node}} slices.
> {*}Queue sorting{*}: only need sorting if the following occurred:
>  * Allocated resource changed in one of the child queues (most common)
>  * Pending resource changed from 0 to "n", or from "n" to 0 (affects filtering)
>  * Child queue got stopped (affects filtering)
>  * Child queue structure changed on config update
> {*}Application sorting{*}: in {{Queue.TryAllocate()}} and {{{}Queue.TryPlaceholderAllocate(){}}}, {{sortApplications()}} always runs. In every iteration, it calls {{Queue.GetCopyOfApps()}} then proceeds to sort the apps. It only has to run if something relevant happens from the sorting POV:
>  * Application added/removed
>  * Ask added to an application
>  * Ask max priority changed in at least one application
>  * Allocated resource changed in at least one application
> {*}Request sorting{*}: request (ask) sorting is only necessary when the following occurs:
>  * Ask added
>  * pendingAskRepeat gets 0 in an ask
> {*}Misc{*}: we can have a bunch of other stuff that helps performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org