You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@yunikorn.apache.org by "Peter Bacsko (Jira)" <ji...@apache.org> on 2023/05/10 07:43:00 UTC
[jira] [Comment Edited] (YUNIKORN-1715) Yunikorn performance improvements
[ https://issues.apache.org/jira/browse/YUNIKORN-1715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17721232#comment-17721232 ]
Peter Bacsko edited comment on YUNIKORN-1715 at 5/10/23 7:42 AM:
-----------------------------------------------------------------
[~yichiu] as we discussed on Slack:
# Try to setup Kwok with Yunikorn
# Multiple test scenarios:
** Few apps with lot of pods (10 / 1000)
** Balanced number of apps/pods (100 / 100)
** Lot of apps with few pods (1000 / 10)
Priorities:
# Check heap & cpu profile, which is available on the REST interface
# Network/block/mutex profiles
# Traces
We expose the URL of pprof tool: https://pkg.go.dev/net/http/pprof
was (Author: pbacsko):
[~yichiu] as we discussed on Slack:
# Try to setup Kwok with Yunikorn
# Multiple test scenarios:
** Few apps with lot of pods (10 / 1000)
** Balanced number of apps/pods (100 / 100)
** Lot of apps with few pods (1000 / 10)
Priorities:
1. Check heap & cpu profile, which is available on the REST interface
2. Network/block/mutex profiles
3. Traces
We expose the URL of pprof tool: https://pkg.go.dev/net/http/pprof
> Yunikorn performance improvements
> ---------------------------------
>
> Key: YUNIKORN-1715
> URL: https://issues.apache.org/jira/browse/YUNIKORN-1715
> Project: Apache YuniKorn
> Issue Type: Improvement
> Components: core - scheduler
> Reporter: Peter Bacsko
> Assignee: Peter Bacsko
> Priority: Major
>
> There are some methods/functions in Yunikorn which are called frequently and often unnecessarily. On a large, busy cluster, eliminating these calls can result in faster scheduling cycle, therefore better throughput.
> In the listed cases below, we can re-use a previously computed value and the expensive copy/sort phase can be eliminated completely.
> {*}Retrieving node iterators{*}: in {{{}baseNodeCollection.getNodeIteratorInternal(){}}}, we always clone the tree of sorted nodes, then we build a slice. The node tree is only modified when a node gets a new score (plus node add/removal). By reusing the sorted list, we avoid cloning an {{*btree.BTree}} structure and creating {{[]*Node}} slices.
> {*}Queue sorting{*}: only need sorting if the following occurred:
> * Allocated resource changed in one of the child queues (most common)
> * Pending resource changed from 0 to "n", or from "n" to 0 (affects filtering)
> * Child queue got stopped (affects filtering)
> * Child queue structure changed on config update
> {*}Application sorting{*}: in {{Queue.TryAllocate()}} and {{{}Queue.TryPlaceholderAllocate(){}}}, {{sortApplications()}} always runs. In every iteration, it calls {{Queue.GetCopyOfApps()}} then proceeds to sort the apps. It only has to run if something relevant happens from the sorting POV:
> * Application added/removed
> * Ask added to an application
> * Ask max priority changed in at least one application
> * Allocated resource changed in at least one application
> {*}Request sorting{*}: request (ask) sorting is only necessary when the following occurs:
> * Ask added
> * pendingAskRepeat gets 0 in an ask
> {*}Misc{*}: we can have a bunch of other stuff that helps performance.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: issues-help@yunikorn.apache.org