You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@yunikorn.apache.org by wi...@apache.org on 2022/09/09 13:06:40 UTC

[yunikorn-site] branch master updated: [YUNIKORN-1312] update website for 1.1.0 (#185)

This is an automated email from the ASF dual-hosted git repository.

wilfreds pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/yunikorn-site.git


The following commit(s) were added to refs/heads/master by this push:
     new 70d64601f [YUNIKORN-1312] update website for 1.1.0 (#185)
70d64601f is described below

commit 70d64601f1d8289f8c6faf5a10f39cfa85317341
Author: Peter Bacsko <pb...@cloudera.com>
AuthorDate: Fri Sep 9 23:06:19 2022 +1000

    [YUNIKORN-1312] update website for 1.1.0 (#185)
    
    1.1.0.md changes
    download.md update
    update docusaurus.config.js
    
    Closes: #185
    
    Signed-off-by: Wilfred Spiegelenburg <wi...@apache.org>
---
 docusaurus.config.js                               |    2 +-
 .../version-1.1.0/assets                           |    1 +
 .../version-1.1.0/get_started/core_features.md     |   71 +
 .../version-1.1.0/get_started/get_started.md       |   76 +
 .../evaluate_perf_function_with_kubemark.md        |  120 ++
 .../version-1.1.0/performance/metrics.md           |  105 ++
 .../performance/performance_tutorial.md            |  451 ++++++
 .../version-1.1.0/performance/profiling.md         |  115 ++
 .../version-1.1.0/user_guide/gang_scheduling.md    |  288 ++++
 .../version-1.1.0/user_guide/trouble_shooting.md   |  192 +++
 .../user_guide/workloads/run_flink.md              |   66 +
 .../user_guide/workloads/run_spark.md              |  145 ++
 .../user_guide/workloads/run_tensorflow.md         |   93 ++
 package.json                                       |    1 +
 src/pages/community/download.md                    |    6 +-
 src/pages/release-announce/1.1.0.md                |   58 +
 versioned_docs/version-1.1.0/api/cluster.md        |   85 ++
 versioned_docs/version-1.1.0/api/scheduler.md      | 1479 ++++++++++++++++++++
 versioned_docs/version-1.1.0/api/system.md         |  225 +++
 .../version-1.1.0/assets/allocation_4k.png         |  Bin 0 -> 68868 bytes
 .../version-1.1.0/assets/application-state.png     |  Bin 0 -> 117964 bytes
 .../version-1.1.0/assets/architecture.png          |  Bin 0 -> 188534 bytes
 .../version-1.1.0/assets/cpu_profile.jpg           |  Bin 0 -> 259919 bytes
 .../version-1.1.0/assets/dashboard_secret.png      |  Bin 0 -> 31815 bytes
 .../assets/dashboard_token_select.png              |  Bin 0 -> 47533 bytes
 .../assets/docker-dektop-minikube.png              |  Bin 0 -> 249095 bytes
 .../version-1.1.0/assets/docker-desktop.png        |  Bin 0 -> 247891 bytes
 .../version-1.1.0/assets/fifo-state-example.png    |  Bin 0 -> 239302 bytes
 .../version-1.1.0/assets/gang_clean_up.png         |  Bin 0 -> 169887 bytes
 .../version-1.1.0/assets/gang_generic_flow.png     |  Bin 0 -> 222524 bytes
 .../assets/gang_scheduling_iintro.png              |  Bin 0 -> 43907 bytes
 .../version-1.1.0/assets/gang_timeout.png          |  Bin 0 -> 179466 bytes
 .../version-1.1.0/assets/gang_total_ask.png        |  Bin 0 -> 22557 bytes
 .../version-1.1.0/assets/goland_debug.jpg          |  Bin 0 -> 198845 bytes
 .../assets/k8shim-application-state.png            |  Bin 0 -> 129912 bytes
 .../version-1.1.0/assets/k8shim-node-state.png     |  Bin 0 -> 52316 bytes
 .../assets/k8shim-scheduler-state.png              |  Bin 0 -> 53283 bytes
 .../version-1.1.0/assets/k8shim-task-state.png     |  Bin 0 -> 144820 bytes
 .../version-1.1.0/assets/namespace-mapping.png     |  Bin 0 -> 327547 bytes
 .../version-1.1.0/assets/node-bin-packing.png      |  Bin 0 -> 232909 bytes
 versioned_docs/version-1.1.0/assets/node-fair.png  |  Bin 0 -> 462310 bytes
 .../version-1.1.0/assets/node_fairness_conf.png    |  Bin 0 -> 12287 bytes
 .../version-1.1.0/assets/object-state.png          |  Bin 0 -> 39732 bytes
 .../version-1.1.0/assets/perf-tutorial-build.png   |  Bin 0 -> 53518 bytes
 .../assets/perf-tutorial-resultDiagrams.png        |  Bin 0 -> 251263 bytes
 .../assets/perf-tutorial-resultLog.png             |  Bin 0 -> 204787 bytes
 .../version-1.1.0/assets/perf_e2e_test.png         |  Bin 0 -> 3957 bytes
 .../version-1.1.0/assets/perf_e2e_test_conf.png    |  Bin 0 -> 25200 bytes
 .../version-1.1.0/assets/perf_node_fairness.png    |  Bin 0 -> 26614 bytes
 .../version-1.1.0/assets/perf_throughput.png       |  Bin 0 -> 13024 bytes
 .../version-1.1.0/assets/pluggable-app-mgmt.jpg    |  Bin 0 -> 79170 bytes
 .../version-1.1.0/assets/predicateComaparation.png |  Bin 0 -> 182417 bytes
 .../version-1.1.0/assets/predicate_4k.png          |  Bin 0 -> 86281 bytes
 versioned_docs/version-1.1.0/assets/prometheus.png |  Bin 0 -> 88021 bytes
 .../version-1.1.0/assets/queue-fairness.png        |  Bin 0 -> 173299 bytes
 .../version-1.1.0/assets/queue-resource-quotas.png |  Bin 0 -> 283689 bytes
 .../assets/resilience-node-recovery.jpg            |  Bin 0 -> 319477 bytes
 .../version-1.1.0/assets/resilience-workflow.jpg   |  Bin 0 -> 441551 bytes
 .../assets/scheduling_no_predicate_4k.png          |  Bin 0 -> 96404 bytes
 .../assets/scheduling_with_predicate_4k_.png       |  Bin 0 -> 107557 bytes
 .../version-1.1.0/assets/simple_preemptor.png      |  Bin 0 -> 81655 bytes
 .../version-1.1.0/assets/spark-jobs-on-ui.png      |  Bin 0 -> 528736 bytes
 versioned_docs/version-1.1.0/assets/spark-pods.png |  Bin 0 -> 303407 bytes
 .../version-1.1.0/assets/tf-job-on-ui.png          |  Bin 0 -> 327800 bytes
 versioned_docs/version-1.1.0/assets/throughput.png |  Bin 0 -> 252615 bytes
 .../version-1.1.0/assets/throughput_3types.png     |  Bin 0 -> 173025 bytes
 .../version-1.1.0/assets/throughput_conf.png       |  Bin 0 -> 26837 bytes
 .../version-1.1.0/assets/yk-ui-screenshots.gif     |  Bin 0 -> 813848 bytes
 .../version-1.1.0/assets/yunirkonVSdefault.png     |  Bin 0 -> 156100 bytes
 .../version-1.1.0/design/architecture.md           |   62 +
 .../version-1.1.0/design/cache_removal.md          |  451 ++++++
 .../version-1.1.0/design/cross_queue_preemption.md |  126 ++
 .../version-1.1.0/design/gang_scheduling.md        |  605 ++++++++
 .../version-1.1.0/design/generic_resource.md       |   75 +
 .../design/interface_message_simplification.md     |  309 ++++
 versioned_docs/version-1.1.0/design/k8shim.md      |   74 +
 .../design/namespace_resource_quota.md             |  183 +++
 .../design/pluggable_app_management.md             |   75 +
 versioned_docs/version-1.1.0/design/predicates.md  |   80 ++
 versioned_docs/version-1.1.0/design/resilience.md  |  144 ++
 .../design/scheduler_configuration.md              |  246 ++++
 .../version-1.1.0/design/scheduler_core_design.md  |  401 ++++++
 .../design/scheduler_object_states.md              |  127 ++
 .../version-1.1.0/design/scheduler_plugin.md       |  112 ++
 .../version-1.1.0/design/simple_preemptor.md       |  114 ++
 .../version-1.1.0/design/state_aware_scheduling.md |  112 ++
 .../version-1.1.0/developer_guide/build.md         |  190 +++
 .../version-1.1.0/developer_guide/dependencies.md  |  124 ++
 .../version-1.1.0/developer_guide/deployment.md    |  164 +++
 .../version-1.1.0/developer_guide/env_setup.md     |  156 +++
 .../developer_guide/openshift_development.md       |  182 +++
 .../version-1.1.0/get_started/core_features.md     |   73 +
 .../version-1.1.0/get_started/get_started.md       |   80 ++
 .../evaluate_perf_function_with_kubemark.md        |  120 ++
 .../version-1.1.0/performance/metrics.md           |  109 ++
 .../performance/performance_tutorial.md            |  522 +++++++
 .../version-1.1.0/performance/profiling.md         |  122 ++
 versioned_docs/version-1.1.0/user_guide/acls.md    |  119 ++
 .../version-1.1.0/user_guide/deployment_modes.md   |   51 +
 .../version-1.1.0/user_guide/gang_scheduling.md    |  288 ++++
 .../labels_and_annotations_in_yunikorn.md          |   48 +
 .../version-1.1.0/user_guide/placement_rules.md    |  354 +++++
 .../version-1.1.0/user_guide/queue_config.md       |  374 +++++
 .../user_guide/resource_quota_mgmt.md              |  323 +++++
 .../version-1.1.0/user_guide/sorting_policies.md   |  185 +++
 .../version-1.1.0/user_guide/trouble_shooting.md   |  192 +++
 .../user_guide/usergroup_resolution.md             |   68 +
 .../user_guide/workloads/run_flink.md              |   66 +
 .../user_guide/workloads/run_spark.md              |  149 ++
 .../user_guide/workloads/run_tensorflow.md         |   93 ++
 .../user_guide/workloads/workload_overview.md      |   58 +
 versioned_sidebars/version-1.1.0-sidebars.json     |   75 +
 versions.json                                      |    1 +
 113 files changed, 11157 insertions(+), 4 deletions(-)

diff --git a/docusaurus.config.js b/docusaurus.config.js
index 04cb8a71b..1fbe470dc 100644
--- a/docusaurus.config.js
+++ b/docusaurus.config.js
@@ -58,7 +58,7 @@ module.exports = {
     announcementBar: {
       id: 'new_release',
       content:
-          '1.0.0 has been released, check the DOWNLOADS',
+          '1.1.0 has been released, check the DOWNLOADS',
       backgroundColor: '#fafbfc',
       textColor: '#091E42',
     },
diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/assets b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/assets
new file mode 120000
index 000000000..778d0f8e4
--- /dev/null
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/assets
@@ -0,0 +1 @@
+../../../../docs/assets
\ No newline at end of file
diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/get_started/core_features.md b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/get_started/core_features.md
new file mode 100644
index 000000000..d6d3c4979
--- /dev/null
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/get_started/core_features.md
@@ -0,0 +1,71 @@
+---
+id: core_features
+title: 特征
+keywords:
+ - 特征
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+YuniKorn的主要特点包括:
+
+## 应用感知调度
+
+YuniKorn的关键特性之一就是支持应用感知。在默认的K8s调度程序中,它只能根据Pod进行调度,而不能基于用户、作业或者队列进行更细粒度的调度。
+与之不同的是,YuniKorn可以识别用户、作业或者队列,并在做出调度决策时,考虑更多与它们相关的因素,如资源、排序等。
+这使我们能够对资源配额、资源公平性和优先级进行细粒度控制,这是多租户计算系统最重要的需求。
+
+## 层次资源队列
+
+层次队列提供了一种有效的机制来管理集群资源。
+队列的层次结构可以在逻辑上映射到组织结构。这为不同租户提供了对资源的细粒度控制。
+YuniKorn UI 提供了一个集中的视图来监视资源队列的使用情况,它可以帮助您了解不同租户是如何使用资源的。
+此外,用户可以利用设置最小/最大队列容量来为每个租户设定其弹性资源配额。
+
+## 作业排序和排队
+
+YuniKorn将每个资源队列中的队列进行排队,排序策略决定哪个应用程序可以首先获得资源。
+这个策略可以是多种多样的,例如简单的 `FIFO`、`Fair`、`StateAware` 或基于 `Priority` 的策略。
+队列可以维持应用的顺序,调度器根据不同的策略为作业分配相应的资源。这种行为更容易被理解和控制。
+
+此外,当配置队列最大容量时,作业和任务可以在资源队列中正确排队。
+如果剩余的容量不够,它们可以排队等待,直到释放一些资源。这就简化了客户端操作。
+而在默认调度程序中,资源由命名空间资源配额限制:如果命名空间没有足够的配额,Pod就不能被创建。这是由配额许可控制器强制执行的。
+客户端需要更复杂的逻辑来处理此类场景,例如按条件重试。
+
+## 资源公平性
+
+在多租户环境中,许多用户共享集群资源。
+为了避免租户争夺资源或者可能的资源不足,需要做到更细粒度的公平性需求,以此来实现跨用户以及跨团队/组织的公平性。
+考虑到权重或优先级,一些更重要的应用可以获得超过其配额的更多的需求资源。
+这往往与资源预算有关,更细粒度的公平模式可以进一步提高资源控制。
+
+## 资源预留
+
+YuniKorn会自动为未完成的请求进行资源预留。
+如果Pod无法分配,YuniKorn将尝试把它预留在一个满足条件的节点上,并在这个节点上暂时分配该 pod(在尝试其他节点之前)。
+这种机制可以避免这个 Pod 需要的资源被后来提交的更小的、更不挑剔的 Pod 所挤占。
+此功能在批处理工作负载场景中非常重要,因为当对集群提交大量异构 Pod 时,很有可能一些 Pod 会处于“饥饿”状态,即使它们提交得更早。
+
+## 吞吐量
+
+吞吐量是衡量调度器性能的关键标准。这对于一个大规模的分布式系统来说是至关重要的。
+如果吞吐量不好,应用程序可能会浪费时间等待调度,并进一步影响服务的 SLA(服务级别协议)。
+集群越大,对吞吐量的要求也越高。[基于Kube标记的运行评估](performance/evaluate_perf_function_with_kubemark.md) 章节显示了一些性能数据。
diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/get_started/get_started.md b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/get_started/get_started.md
new file mode 100644
index 000000000..fcf009f0f
--- /dev/null
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/get_started/get_started.md
@@ -0,0 +1,76 @@
+---
+id: user_guide
+title: 开始
+slug: /
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+在阅读本指南之前,我们假设您有一个Kubernetes集群或本地 Kubernetes 开发环境,例如 MiniKube。
+还假定 `kubectl` 在您的环境路径内,并且配置正确。
+遵循此 [指南](developer_guide/env_setup.md) 来讲述如何使用 docker-desktop 设置本地Kubernetes集群。
+
+## 安装
+
+最简单的方法是使用我们的 Helm Chart 在现有的Kubernetes集群上部署YuniKorn。
+我们建议使用 Helm 3 或更高版本。
+
+```shell script
+helm repo add yunikorn https://apache.github.io/yunikorn-release
+helm repo update
+kubectl create namespace yunikorn
+helm install yunikorn yunikorn/yunikorn --namespace yunikorn
+```
+
+默认情况下,Helm Chart 将在集群中安装调度器、web服务器和 admission-controller。
+`admission-controller` 一旦安装,它将把所有集群流量路由到YuniKorn。
+这意味着资源调度会委托给YuniKorn。在Helm安装过程中,可以通过将 `embedAdmissionController` 标志设置为 `false` 来禁用它。
+通过将Helm的 `enableSchedulerPlugin` 标志设置为 `true`,YuniKorn调度器也可以以Kubernetes的调度器插件的方式进行部署。
+这种方式将会部署一个包含与默认调度器一起编译的YuniKorn备用Docker镜像。
+这种新模式借助默认的Kubernetes调度器提供了更好的兼容性,并且适合与将所有调度委托给YuniKorn的 admission-controller 协同使用。
+因为这个模式还是很新的,所以默认没有开启。
+
+如果您不确定应该使用哪种部署模式,请参阅我们 [并列比较](user_guide/deployment_modes) 章节的内容。
+
+如果你不想使用 Helm Chart,您可以找到我们的细节教程 [点击这里](developer_guide/deployment.md) 。
+
+## 卸载
+
+运行如下的命令卸载 YuniKorn:
+
+```shell script
+helm uninstall yunikorn --namespace yunikorn
+```
+
+## 访问 Web UI
+
+当部署调度程序时,Web UI 也会部署在容器中。
+我们可以通过以下方式在标准端口上打开 Web 界面的端口转发:
+
+```shell script
+kubectl port-forward svc/yunikorn-service 9889:9889 -n yunikorn
+```
+
+`9889` 是 Web UI 的默认端口。
+完成此操作后,web UI将在以下地址可用: http://localhost:9889 。
+
+![UI 截图](./../assets/yk-ui-screenshots.gif)
+
+YuniKorn UI 提供了集群资源容量、利用率和所有应用信息的集中视图。
diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/performance/evaluate_perf_function_with_kubemark.md b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/performance/evaluate_perf_function_with_kubemark.md
new file mode 100644
index 000000000..44f4c67eb
--- /dev/null
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/performance/evaluate_perf_function_with_kubemark.md
@@ -0,0 +1,120 @@
+---
+id: evaluate_perf_function_with_kubemark
+title: 使用 Kubemark 评估 YuniKorn 的性能
+keywords:
+ - 性能
+ - 吞吐量
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+YuniKorn 社区关注调度程序的性能,并继续在发布时对其进行优化。 社区已经开发了一些工具来反复测试和调整性能。
+
+## 环境设置
+
+我们利用[Kubemark](https://github.com/kubernetes/kubernetes/blob/release-1.3/docs/devel/kubemark-guide.md#starting-a-kubemark-cluster)评估调度器的性能。 Kubemark是一个模拟大规模集群的测试工具。 它创建空节点,运行空kubelet以假装原始kubelet行为。 这些空节点上的调度pod不会真正执行。它能够创建一个满足我们实验要求的大集群,揭示yunikorn调度器的性能。 请参阅有关如何设置环境的[详细步骤](performance/performance_tutorial.md)。
+
+## 调度程序吞吐量
+
+我们在模拟的大规模环境中设计了一些简单的基准测试场景,以评估调度器的性能。 我们的工具测量[吞吐量](https://en.wikipedia.org/wiki/Throughput)并使用这些关键指标来评估性能。 简而言之,调度程序吞吐量是处理pod从在集群上发现它们到将它们分配给节点的速率。
+
+在本实验中,我们使用 [Kubemark](https://github.com/kubernetes/kubernetes/blob/release-1.3/docs/devel/kubemark-guide.md#starting-a-kubemark-cluster) 设置了一个模拟的2000/4000节点集群。然后我们启动10个[部署](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/),每个部署分别设置5000个副本。 这模拟了大规模工作负载同时提交到K8s集群。 我们的工具会定期监控和检查pod状态,随着时间的推移,根据 `podSpec.StartTime` 计算启动的pod数量。 作为对比,我们将相同的实验应用到相同环境中的默认调度程序。 我们看到了YuniKorn相对于默认调度程序的性能优势,如下图所示:
+
+![Scheduler Throughput](./../assets/yunirkonVSdefault.png)
+<p align="center">图 1. Yunikorn 和默认调度器吞吐量 </p>
+
+图表记录了集群上所有 Pod 运行所花费的时间:
+
+|  节点数  | yunikorn        | k8s 默认调度器		| 差异   |
+|------------------	|:--------------:	|:---------------------: |:-----:  |
+| 2000(节点)       | 204(pods/秒)			| 49(pods/秒)			        |   416%  |
+| 4000(节点)       | 115(pods/秒)			| 48(pods/秒)			        |   240%  |
+
+为了使结果标准化,我们已经运行了几轮测试。 如上所示,与默认调度程序相比,YuniKorn实现了`2 倍`~`4 倍`的性能提升。
+
+:::note
+
+与其他性能测试一样,结果因底层硬件而异,例如服务器CPU/内存、网络带宽、I/O速度等。为了获得适用于您的环境的准确结果,我们鼓励您运行这些测试在靠近生产环境的集群上。
+
+:::
+
+## 性能分析
+
+我们从实验中得到的结果是有希望的。 我们通过观察更多的YuniKorn内部指标进一步深入分析性能,我们能够找到影响性能的几个关键区域。
+
+### K8s 限制
+
+我们发现整体性能实际上受到了K8s主服务的限制,例如api-server、controller-manager和etcd,在我们所有的实验中都没有达到YuniKorn的限制。 如果您查看内部调度指标,您可以看到:
+
+![Allocation latency](./../assets/allocation_4k.png)
+<p align="center">图 2. 4k 节点中的 Yunikorn 指标 </p>
+
+图2是Prometheus的截图,它记录了YuniKorn中的[内部指标](performance/metrics.md) `containerAllocation`。 它们是调度程序分配的 pod 数量,但不一定绑定到节点。 完成调度50k pod大约需要122秒,即410 pod/秒。 实际吞吐量下降到115个 Pod/秒,额外的时间用于绑定不同节点上的Pod。 如果K8s方面能赶上来,我们会看到更好的结果。 实际上,当我们在大规模集群上调整性能时,我们要做的第一件事就是调整API-server、控制器管理器中的一些参数,以提高吞吐量。 在[性能教程文档](performance/performance_tutorial.md)中查看更多信息。
+
+### 节点排序
+
+当集群大小增加时,我们看到YuniKorn的性能明显下降。 这是因为在YuniKorn中,我们对集群节点进行了完整排序,以便为给定的pod找到 **“best-fit”** 节点。 这种策略使Pod分布更加优化,基于所使用的 [节点排序策略](./../user_guide/sorting_policies#node-sorting)。 但是,对节点进行排序很昂贵,在调度周期中这样做会产生很多开销。 为了克服这个问题,我们在 [YUNIKORN-807](https://issues.apache.org/jira/browse/YUNIKORN-807) 中改进了我们的节点排序机制,其背后的想法是使用 [B-Tree ](https://en.wikipedia.org/wiki/B-tree)来存储所有节点并在必要时应用增量更新。 这显着改善了延迟,根据我们的基准测试,这在500、1000、2000 和 5000个节点的集群上分别提高了 35 倍、42 倍、51 倍、74 倍。
+
+### 每个节点的前提条件检查
+
+在每个调度周期中,另一个耗时的部分是节点的“前提条件检查”。 在这个阶段,YuniKorn评估所有K8s标准断言(Predicates),例如节点选择器、pod亲和性/反亲和性等,以确定pod是否适合节点。 这些评估成本很高。
+
+我们做了两个实验来比较启用和禁用断言评估的情况。 请参阅以下结果:
+
+![Allocation latency](./../assets/predicateComaparation.png)
+<p align="center">图 3. Yunikorn 中的断言效果比较 </p>
+
+当断言评估被禁用时,吞吐量会提高很多。 我们进一步研究了整个调度周期的延迟分布和断言评估延迟。 并发现:
+
+![YK predicate latency](./../assets/predicate_4k.png)
+<p align="center">图 4. 断言延迟 </p>
+
+![YK scheduling with predicate](./../assets/scheduling_with_predicate_4k_.png)
+<p align="center">图 5. 启用断言的调度时间 </p>
+
+![YK scheduling with no predicate](./../assets/scheduling_no_predicate_4k.png)
+<p align="center">图 6. 不启用断言的调度时间 </p>
+
+总体而言,YuniKorn 调度周期运行得非常快,每个周期的延迟下降在 **0.001s - 0.01s** 范围内。 并且大部分时间用于断言评估,10倍于调度周期中的其他部分。
+
+|				| 调度延迟分布(秒)	| 断言-评估延迟分布(秒)	|
+|-----------------------	|:---------------------:		|:---------------------:			|
+| 启用断言		| 0.01 - 0.1				| 0.01-0.1					|
+| 不启用断言	| 0.001 - 0.01				| 无						|
+
+## 为什么 YuniKorn 更快?
+
+默认调度器被创建为面向服务的调度器; 与YuniKorn相比,它在吞吐量方面的敏感性较低。 YuniKorn社区非常努力地保持出色的性能并不断改进。 YuniKorn可以比默认调度器运行得更快的原因是:
+
+* 短调度周期
+
+YuniKorn 保持调度周期短而高效。 YuniKorn 使用所有异步通信协议来确保所有关键路径都是非阻塞调用。 大多数地方只是在进行内存计算,这可能非常高效。 默认调度器利用 [调度框架](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/),它为扩展调度器提供了很大的灵活性,但是,权衡是性能。 调度周期变成了一条很长的链,因为它需要访问所有这些插件。
+
+* 异步事件处理
+
+YuniKorn利用异步事件处理框架来处理内部状态。 这使得核心调度周期可以快速运行而不会被任何昂贵的调用阻塞。 例如,默认调度程序需要将状态更新、事件写入pod对象,这是在调度周期内完成的。 这涉及将数据持久化到etcd,这可能很慢。 YuniKorn将所有此类事件缓存在一个队列中,并以异步方式写回pod。
+
+* 更快的节点排序
+
+[YUNIKORN-807](https://issues.apache.org/jira/browse/YUNIKORN-807)之后,YuniKorn进行了高效的增量节点排序。 这是建立在所谓的基于“资源权重”的节点评分机制之上的,它也可以通过插件进行扩展。 所有这些一起减少了计算节点分数时的开销。 相比之下,默认调度器提供了一些计算节点分数的扩展点,例如`PreScore`、`Score`和`NormalizeScore`。 这些计算量很大,并且在每个调度周期中都会调用它们。 请参阅[代码行](https://github.com/kubernetes/kubernetes/blob/481459d12dc82ab88e413886e2130c2a5e4a8ec4/pkg/scheduler/framework/runtime/framework.go#L857)中的详细信息。
+
+## 概括
+
+在测试过程中,我们发现YuniKorn的性能非常好,尤其是与默认调度程序相比。 我们已经确定了YuniKorn中可以继续提高性能的主要因素,并解释了为什么YuniKorn的性能优于默认调度程序。 我们还意识到将Kubernetes扩展到数千个节点时的局限性,可以通过使用其他技术(例如联合)来缓解这些局限性。 因此,YuniKorn是一个高效、高吞吐量的调度程序,非常适合在Kubernetes上运行批处理/混合工作负载。
diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/performance/metrics.md b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/performance/metrics.md
new file mode 100644
index 000000000..7d6fa73e5
--- /dev/null
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/performance/metrics.md
@@ -0,0 +1,105 @@
+---
+id: metrics
+title: 调度程序指标
+keywords:
+ - 指标
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+YuniKorn利用[Prometheus](https://prometheus.io/) 记录指标。 度量系统不断跟踪调度程序的关键执行路径,以揭示潜在的性能瓶颈。 目前,这些指标分为三类:
+
+- 调度器:调度器的通用指标,例如分配延迟、应用程序数量等。
+- 队列:每个队列都有自己的指标子系统,跟踪队列状态。
+- 事件:记录YuniKorn中事件的各种变化。
+
+所有指标都在`yunikorn`命名空间中声明。
+###    调度程序指标
+
+| 指标名称               | 指标类型        | 描述         | 
+| --------------------- | ------------  | ------------ |
+| containerAllocation   | Counter       | 尝试分配容器的总次数。 尝试状态包括`allocated`, `rejected`, `error`, `released`。 该指标只会增加。  |
+| applicationSubmission | Counter       | 提交申请的总数。 尝试的状态包括 `accepted`和`rejected`。 该指标只会增加。 |
+| applicationStatus     | Gauge         | 申请状态总数。 应用程序的状态包括`running`和`completed`。  | 
+| totalNodeActive       | Gauge         | 活动节点总数。                          |
+| totalNodeFailed       | Gauge         | 失败节点的总数。                          |
+| nodeResourceUsage     | Gauge         | 节点的总资源使用情况,按资源名称。        |
+| schedulingLatency     | Histogram     | 主调度例程的延迟,以秒为单位。    |
+| nodeSortingLatency    | Histogram     | 所有节点排序的延迟,以秒为单位。              |
+| appSortingLatency     | Histogram     | 所有应用程序排序的延迟,以秒为单位。      |
+| queueSortingLatency   | Histogram     | 所有队列排序的延迟,以秒为单位。             |
+| tryNodeLatency        | Histogram     | 节点条件检查容器分配的延迟,例如放置约束,以秒为单位。 |
+
+###    队列指标
+
+| 指标名称                   | 指标类型        | 描述        |
+| ------------------------- | ------------- | ----------- |
+| appMetrics                | Counter       | 应用程序指标,记录申请总数。 应用程序的状态包括`accepted`、`rejected`和`Completed`。    |
+| usedResourceMetrics       | Gauge         | 排队使用的资源。     |
+| pendingResourceMetrics    | Gauge         | 排队等待的资源。  |
+| availableResourceMetrics  | Gauge         | 与队列等相关的已用资源指标。    |
+
+###    事件指标
+
+| 指标名称                   | 指标类型        | 描述        |
+| ------------------------ | ------------  | ----------- |
+| totalEventsCreated       | Gauge         | 创建的事件总数。          |
+| totalEventsChanneled     | Gauge         | 引导的事件总数。        |
+| totalEventsNotChanneled  | Gauge         | 引导的事件总数。    |
+| totalEventsProcessed     | Gauge         | 处理的事件总数。        |
+| totalEventsStored        | Gauge         | 存储的事件总数。           |
+| totalEventsNotStored     | Gauge         | 未存储的事件总数。       |
+| totalEventsCollected     | Gauge         | 收集的事件总数。        |
+
+## 访问指标
+
+YuniKorn指标通过Prometheus客户端库收集,并通过调度程序RESTful服务公开。
+启动后,可以通过端点http://localhost:9080/ws/v1/metrics访问它们。
+
+## Prometheus 的聚合指标
+
+设置 Prometheus 服务器以定期获取 YuniKorn 指标很简单。 按着这些次序:
+
+- 设置Prometheus(从[Prometheus 文档](https://prometheus.io/docs/prometheus/latest/installation/)了解更多信息)
+
+- 配置Prometheus规则:示例配置
+
+```yaml
+global:
+  scrape_interval:     3s
+  evaluation_interval: 15s
+
+scrape_configs:
+  - job_name: 'yunikorn'
+    scrape_interval: 1s
+    metrics_path: '/ws/v1/metrics'
+    static_configs:
+    - targets: ['docker.for.mac.host.internal:9080']
+```
+
+- 启动 Prometheus
+
+```shell script
+docker pull prom/prometheus:latest
+docker run -p 9090:9090 -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
+```
+
+如果您在Mac OS上的本地docker容器中运行Prometheus,请使用`docker.for.mac.host.internal`而不是`localhost`。 启动后,打开Prometheus网页界面:http://localhost:9090/graph。 您将看到来自YuniKorn调度程序的所有可用指标。
+
diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/performance/performance_tutorial.md b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/performance/performance_tutorial.md
new file mode 100644
index 000000000..32e4df7d0
--- /dev/null
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/performance/performance_tutorial.md
@@ -0,0 +1,451 @@
+---
+id: performance_tutorial
+title: 基准测试教程
+keywords:
+ - 性能
+ - 教程
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## 概述
+
+YuniKorn社区不断优化调度器的性能,确保YuniKorn满足大规模批处理工作负载的性能要求。 因此,社区为性能基准测试构建了一些有用的工具,可以跨版本重用。 本文档介绍了所有这些工具和运行它们的步骤。
+
+## 硬件
+
+请注意,性能结果因底层硬件而异。 文档中发布的所有结果只能作为参考。 我们鼓励每个人在自己的环境中运行类似的测试,以便根据您自己的硬件获得结果。 本文档仅用于演示目的。
+
+本次测试中使用的服务器列表是(非常感谢[国立台中教育大学](http://www.ntcu.edu.tw/newweb/index.htm), [Kuan-Chou Lai](http://www.ntcu.edu.tw/kclai/) 为运行测试提供这些服务器):
+
+| 机型                   | CPU |  内存  |   下载/上传(Mbps)       |
+| --------------------- | --- | ------ | --------------------- |
+| HP                    | 16  | 36G    | 525.74/509.86         |
+| HP                    | 16  | 30G    | 564.84/461.82         |
+| HP                    | 16  | 30G    | 431.06/511.69         |
+| HP                    | 24  | 32G    | 577.31/576.21         |
+| IBM blade H22         | 16  | 38G    | 432.11/4.15           |
+| IBM blade H22         | 16  | 36G    | 714.84/4.14           |
+| IBM blade H22         | 16  | 42G    | 458.38/4.13           |
+| IBM blade H22         | 16  | 42G    | 445.42/4.13           |
+| IBM blade H22         | 16  | 32G    | 400.59/4.13           |
+| IBM blade H22         | 16  | 12G    | 499.87/4.13           |
+| IBM blade H23         | 8   | 32G    | 468.51/4.14           |
+| WS660T                | 8   | 16G    | 87.73/86.30           |
+| ASUSPRO D640MB_M640SA | 4   | 8G     | 92.43/93.77           |
+| PRO E500 G6_WS720T    | 16  | 8G     | 90/87.18              |
+| WS E500 G6_WS720T     | 8   | 40G    | 92.61/89.78           |
+| E500 G5               | 8   | 8G     | 91.34/85.84           |
+| WS E500 G5_WS690T     | 12  | 16G    | 92.2/93.76            |
+| WS E500 G5_WS690T     | 8   | 32G    | 91/89.41              |
+| WS E900 G4_SW980T     | 80  | 512G   | 89.24/87.97           |
+
+每个服务器都需要执行以下步骤,否则由于用户/进程/打开文件的数量有限,大规模测试可能会失败。
+
+### 1. 设置/etc/sysctl.conf
+```
+kernel.pid_max=400000
+fs.inotify.max_user_instances=50000
+fs.inotify.max_user_watches=52094
+```
+### 2. 设置/etc/security/limits.conf
+
+```
+* soft nproc 4000000
+* hard nproc 4000000
+root soft nproc 4000000
+root hard nproc 4000000
+* soft nofile 50000
+* hard nofile 50000
+root soft nofile 50000
+root hard nofile 50000
+```
+---
+
+## 部署工作流
+
+在进入细节之前,这里是我们测试中使用的一般步骤:
+
+- [步骤 1](#Kubernetes): 正确配置Kubernetes API服务器和控制器管理器,然后添加工作节点。
+- [步骤 2](#Setup-Kubemark): 部署空pod,将模拟工作节点,命名空节点。 在所有空节点都处于就绪状态后,我们需要封锁(cordon)所有本地节点,这些本地节点是集群中的物理存在,而不是模拟节点,以避免我们将测试工作负载 pod 分配给本地节点。
+- [步骤 3](#Deploy-YuniKorn): 在主节点上使用Helm chart部署YuniKorn,并将 Deployment 缩减为 0 副本,并在`prometheus.yml`中 [修改端口](#Setup-Prometheus) 以匹配服务的端口。
+- [步骤 4](#Run-tests): 部署50k Nginx pod进行测试,API服务器将创建它们。 但是由于YuniKorn调度程序Deployment已经被缩减到0个副本,所有的Nginx pod都将停留在等待状态。
+- [步骤 5](../user_guide/trouble_shooting.md#restart-the-scheduler): 将YuniKorn部署扩展回1个副本,并封锁主节点以避免YuniKorn 在那里分配Nginx pod。 在这一步中,YuniKorn将开始收集指标。
+- [步骤 6](#Collect-and-Observe-YuniKorn-metrics): 观察Prometheus UI中公开的指标。
+---
+
+## 设置 Kubemark
+
+[Kubemark](https://github.com/kubernetes/kubernetes/tree/master/test/kubemark)是一个性能测试工具,允许用户在模拟集群上运行实验。 主要用例是可扩展性测试。 基本思想是在一个物理节点上运行数十或数百个假kubelet节点,以模拟大规模集群。 在我们的测试中,我们利用 Kubemark 在少于20个物理节点上模拟多达4K节点的集群。
+
+### 1. 构建镜像
+
+##### 克隆kubernetes仓库,并构建kubemark二进制文件
+
+```
+git clone https://github.com/kubernetes/kubernetes.git
+```
+```
+cd kubernetes
+```
+```
+KUBE_BUILD_PLATFORMS=linux/amd64 make kubemark GOFLAGS=-v GOGCFLAGS="-N -l"
+```
+
+##### 将kubemark二进制文件复制到镜像文件夹并构建kubemark docker镜像
+
+```
+cp _output/bin/kubemark cluster/images/kubemark
+```
+```
+IMAGE_TAG=v1.XX.X make build
+```
+完成此步骤后,您可以获得可以模拟集群节点的kubemark镜像。 您可以将其上传到Docker-Hub或仅在本地部署。
+
+### 2. 安装Kubermark
+
+##### 创建kubemark命名空间
+
+```
+kubectl create ns kubemark
+```
+
+##### 创建configmap
+
+```
+kubectl create configmap node-configmap -n kubemark --from-literal=content.type="test-cluster"
+```
+
+##### 创建secret
+
+```
+kubectl create secret generic kubeconfig --type=Opaque --namespace=kubemark --from-file=kubelet.kubeconfig={kubeconfig_file_path} --from-file=kubeproxy.kubeconfig={kubeconfig_file_path}
+```
+### 3. 标签节点
+
+我们需要给所有的原生节点打上标签,否则调度器可能会将空pod分配给其他模拟的空节点。 我们可以利用yaml中的节点选择器将空pod分配给本地节点。
+
+```
+kubectl label node {node name} tag=tagName
+```
+
+### 4. 部署Kubemark
+
+hollow-node.yaml如下所示,我们可以配置一些参数。
+
+```
+apiVersion: v1
+kind: ReplicationController
+metadata:
+  name: hollow-node
+  namespace: kubemark
+spec:
+  replicas: 2000  # 要模拟的节点数
+  selector:
+      name: hollow-node
+  template:
+    metadata:
+      labels:
+        name: hollow-node
+    spec:
+      nodeSelector:  # 利用标签分配给本地节点
+        tag: tagName  
+      initContainers:
+      - name: init-inotify-limit
+        image: docker.io/busybox:latest
+        imagePullPolicy: IfNotPresent
+        command: ['sysctl', '-w', 'fs.inotify.max_user_instances=200'] # 设置为与实际节点中的max_user_instance相同
+        securityContext:
+          privileged: true
+      volumes:
+      - name: kubeconfig-volume
+        secret:
+          secretName: kubeconfig
+      - name: logs-volume
+        hostPath:
+          path: /var/log
+      containers:
+      - name: hollow-kubelet
+        image: 0yukali0/kubemark:1.20.10 # 您构建的kubemark映像 
+        imagePullPolicy: IfNotPresent
+        ports:
+        - containerPort: 4194
+        - containerPort: 10250
+        - containerPort: 10255
+        env:
+        - name: NODE_NAME
+          valueFrom:
+            fieldRef:
+              fieldPath: metadata.name
+        command:
+        - /kubemark
+        args:
+        - --morph=kubelet
+        - --name=$(NODE_NAME)
+        - --kubeconfig=/kubeconfig/kubelet.kubeconfig
+        - --alsologtostderr
+        - --v=2
+        volumeMounts:
+        - name: kubeconfig-volume
+          mountPath: /kubeconfig
+          readOnly: true
+        - name: logs-volume
+          mountPath: /var/log
+        resources:
+          requests:    # 空pod的资源,可以修改。
+            cpu: 20m
+            memory: 50M
+        securityContext:
+          privileged: true
+      - name: hollow-proxy
+        image: 0yukali0/kubemark:1.20.10 # 您构建的kubemark映像 
+        imagePullPolicy: IfNotPresent
+        env:
+        - name: NODE_NAME
+          valueFrom:
+            fieldRef:
+              fieldPath: metadata.name
+        command:
+        - /kubemark
+        args:
+        - --morph=proxy
+        - --name=$(NODE_NAME)
+        - --use-real-proxier=false
+        - --kubeconfig=/kubeconfig/kubeproxy.kubeconfig
+        - --alsologtostderr
+        - --v=2
+        volumeMounts:
+        - name: kubeconfig-volume
+          mountPath: /kubeconfig
+          readOnly: true
+        - name: logs-volume
+          mountPath: /var/log
+        resources:  # 空pod的资源,可以修改。
+          requests:
+            cpu: 20m
+            memory: 50M
+      tolerations:
+      - effect: NoExecute
+        key: node.kubernetes.io/unreachable
+        operator: Exists
+      - effect: NoExecute
+        key: node.kubernetes.io/not-ready
+        operator: Exists
+```
+
+完成编辑后,将其应用于集群:
+
+```
+kubectl apply -f hollow-node.yaml
+```
+
+---
+
+## 部署 YuniKorn
+
+#### 使用helm安装YuniKorn
+
+我们可以用 Helm 安装 YuniKorn,请参考这个[文档](https://yunikorn.apache.org/docs/#install)。 我们需要根据默认配置调整一些参数。 我们建议克隆[发布仓库](https://github.com/apache/yunikorn-release)并修改`value.yaml`中的参数。
+
+```
+git clone https://github.com/apache/yunikorn-release.git
+cd helm-charts/yunikorn
+```
+
+#### 配置
+
+`value.yaml`中的修改是:
+
+- 增加调度程序 pod 的内存/cpu 资源
+- 禁用 admission controller
+- 将应用排序策略设置为 FAIR
+
+请参阅以下更改:
+
+```
+resources:
+  requests:
+    cpu: 14
+    memory: 16Gi
+  limits:
+    cpu: 14
+    memory: 16Gi
+```
+```
+embedAdmissionController: false
+```
+```
+configuration: |
+  partitions:
+    -
+      name: default
+      queues:
+        - name: root
+          submitacl: '*'
+          queues:
+            -
+              name: sandbox
+              properties:
+                application.sort.policy: fair
+```
+
+#### 使用本地版本库安装YuniKorn
+
+```
+Helm install yunikorn . --namespace yunikorn
+```
+
+---
+
+## 设置Prometheus
+
+YuniKorn通过Prometheus公开其调度指标。 因此,我们需要设置一个Prometheus服务器来收集这些指标。
+
+### 1. 下载Prometheus版本
+
+```
+wget https://github.com/prometheus/prometheus/releases/download/v2.30.3/prometheus-2.30.3.linux-amd64.tar.gz
+```
+```
+tar xvfz prometheus-*.tar.gz
+cd prometheus-*
+```
+
+### 2. 配置prometheus.yml
+
+```
+global:
+  scrape_interval:     3s
+  evaluation_interval: 15s
+
+scrape_configs:
+  - job_name: 'yunikorn'
+    scrape_interval: 1s
+    metrics_path: '/ws/v1/metrics'
+    static_configs:
+    - targets: ['docker.for.mac.host.internal:9080'] 
+    # 9080为内部端口,需要端口转发或修改9080为服务端口
+```
+
+### 3. 启动Prometheus
+```
+./prometheus --config.file=prometheus.yml
+```
+
+---
+## 运行测试
+
+设置环境后,您就可以运行工作负载并收集结果了。 YuniKorn社区有一些有用的工具来运行工作负载和收集指标,更多详细信息将在此处发布。
+
+---
+
+## 收集和观察YuniKorn指标
+
+Prometheus 启动后,可以轻松收集 YuniKorn 指标。 这是 YuniKorn 指标的[文档](metrics.md)。 YuniKorn 跟踪一些关键调度指标,这些指标衡量一些关键调度路径的延迟。 这些指标包括:
+
+ - **scheduling_latency_seconds:** 主调度例程的延迟,以秒为单位。
+ - **app_sorting_latency_seconds**: 所有应用程序排序的延迟,以秒为单位。
+ - **node_sorting_latency_seconds**: 所有节点排序的延迟,以秒为单位。
+ - **queue_sorting_latency_seconds**: 所有队列排序的延迟,以秒为单位。
+ - **container_allocation_attempt_total**: 尝试分配容器的总次数。 尝试状态包括 `allocated`、`rejected`、`error`、`released`。 该指标仅增加。
+
+您可以在Prometheus UI上轻松选择和生成图形,例如:
+
+![Prometheus Metrics List](./../assets/prometheus.png)
+
+
+---
+
+## 性能调优
+
+### Kubernetes
+
+默认的 K8s 设置限制了并发请求,这限制了集群的整体吞吐量。 在本节中,我们介绍了一些需要调整的参数,以提高集群的整体吞吐量。
+
+#### kubeadm
+
+设置pod网络掩码
+
+```
+kubeadm init --pod-network-cidr=10.244.0.0/8
+```
+
+#### CNI
+
+修改CNI掩码和资源。
+
+```
+  net-conf.json: |
+    {
+      "Network": "10.244.0.0/8",
+      "Backend": {
+        "Type": "vxlan"
+      }
+    }
+```
+```
+  resources:
+    requests:
+      cpu: "100m"
+      memory: "200Mi"
+    limits:
+      cpu: "100m"
+      memory: "200Mi"
+```
+
+
+#### Api-Server
+
+在 Kubernetes API 服务器中,我们需要修改两个参数:`max-mutating-requests-inflight`和`max-requests-inflight`。 这两个参数代表API请求带宽。 因为我们会产生大量的Pod请求,所以我们需要增加这两个参数。修改`/etc/kubernetes/manifest/kube-apiserver.yaml`:
+
+```
+--max-mutating-requests-inflight=3000
+--max-requests-inflight=3000
+```
+
+#### Controller-Manager
+
+在Kubernetes控制器管理器中,我们需要增加三个参数的值:`node-cidr-mask-size`、`kube-api-burst` `kube-api-qps`. `kube-api-burst`和`kube-api-qps`控制服务器端请求带宽。`node-cidr-mask-size`表示节点 CIDR。 为了扩展到数千个节点,它也需要增加。
+
+
+Modify `/etc/kubernetes/manifest/kube-controller-manager.yaml`:
+
+```
+--node-cidr-mask-size=21 //log2(集群中的最大pod数)
+--kube-api-burst=3000
+--kube-api-qps=3000
+```
+
+#### kubelet
+
+在单个工作节点中,我们可以默认运行110个pod。 但是为了获得更高的节点资源利用率,我们需要在Kubelet启动命令中添加一些参数,然后重启它。
+
+修改`/etc/systemd/system/kubelet.service.d/10-kubeadm.conf`中的起始参数,在起始参数后面添加`--max-Pods=300`并重启。
+
+```
+systemctl daemon-reload
+systemctl restart kubelet
+```
+
+---
+
+## 概括
+
+借助Kubemark和Prometheus,我们可以轻松运行基准测试、收集YuniKorn指标并分析性能。 这有助于我们识别调度程序中的性能瓶颈并进一步消除它们。 YuniKorn社区未来将继续改进这些工具,并继续获得更多的性能改进。
diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/performance/profiling.md b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/performance/profiling.md
new file mode 100644
index 000000000..eb2ae7442
--- /dev/null
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/performance/profiling.md
@@ -0,0 +1,115 @@
+---
+id: profiling
+title: 分析
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+使用[pprof](https://github.com/google/pprof)做CPU,Memory profiling可以帮助你了解YuniKorn调度器的运行状态。YuniKorn REST服务中添加了分析工具,我们可以轻松地从HTTP端点检索和分析它们。
+
+## CPU 分析
+
+在这一步,确保你已经运行了YuniKorn,它可以通过`make run`命令从本地运行,也可以部署为在K8s内运行的pod。 然后运行
+
+```
+go tool pprof http://localhost:9080/debug/pprof/profile
+```
+
+配置文件数据将保存在本地文件系统中,一旦完成,它就会进入交互模式。 现在您可以运行分析命令,例如
+
+```
+(pprof) top
+Showing nodes accounting for 14380ms, 44.85% of 32060ms total
+Dropped 145 nodes (cum <= 160.30ms)
+Showing top 10 nodes out of 106
+      flat  flat%   sum%        cum   cum%
+    2130ms  6.64%  6.64%     2130ms  6.64%  __tsan_read
+    1950ms  6.08% 12.73%     1950ms  6.08%  __tsan::MetaMap::FreeRange
+    1920ms  5.99% 18.71%     1920ms  5.99%  __tsan::MetaMap::GetAndLock
+    1900ms  5.93% 24.64%     1900ms  5.93%  racecall
+    1290ms  4.02% 28.67%     1290ms  4.02%  __tsan_write
+    1090ms  3.40% 32.06%     3270ms 10.20%  runtime.mallocgc
+    1080ms  3.37% 35.43%     1080ms  3.37%  __tsan_func_enter
+    1020ms  3.18% 38.62%     1120ms  3.49%  runtime.scanobject
+    1010ms  3.15% 41.77%     1010ms  3.15%  runtime.nanotime
+     990ms  3.09% 44.85%      990ms  3.09%  __tsan::DenseSlabAlloc::Refill
+```
+
+您可以键入诸如`web`或`gif`之类的命令来获得可以更好地帮助您的图表
+了解关键代码路径的整体性能。 你可以得到一些东西
+如下所示:
+
+![CPU Profiling](./../assets/cpu_profile.jpg)
+
+注意,要使用这些选项,您需要先安装虚拟化工具`graphviz`,如果您使用的是 Mac,只需运行`brew install graphviz`,更多信息请参考[这里](https://graphviz. gitlab.io/)。
+
+## 内存分析
+
+同样,您可以运行
+
+```
+go tool pprof http://localhost:9080/debug/pprof/heap
+```
+
+这将返回当前堆的快照,允许我们检查内存使用情况。 进入交互模式后,您可以运行一些有用的命令。 比如top可以列出top内存消耗的对象。
+```
+(pprof) top
+Showing nodes accounting for 83.58MB, 98.82% of 84.58MB total
+Showing top 10 nodes out of 86
+      flat  flat%   sum%        cum   cum%
+      32MB 37.84% 37.84%       32MB 37.84%  github.com/apache/yunikorn-core/pkg/cache.NewClusterInfo
+      16MB 18.92% 56.75%       16MB 18.92%  github.com/apache/yunikorn-core/pkg/rmproxy.NewRMProxy
+      16MB 18.92% 75.67%       16MB 18.92%  github.com/apache/yunikorn-core/pkg/scheduler.NewScheduler
+      16MB 18.92% 94.59%       16MB 18.92%  github.com/apache/yunikorn-k8shim/pkg/dispatcher.init.0.func1
+    1.04MB  1.23% 95.81%     1.04MB  1.23%  k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypeWithName
+    0.52MB  0.61% 96.43%     0.52MB  0.61%  github.com/gogo/protobuf/proto.RegisterType
+    0.51MB  0.61% 97.04%     0.51MB  0.61%  sync.(*Map).Store
+    0.50MB   0.6% 97.63%     0.50MB   0.6%  regexp.onePassCopy
+    0.50MB  0.59% 98.23%     0.50MB  0.59%  github.com/json-iterator/go.(*Iterator).ReadString
+    0.50MB  0.59% 98.82%     0.50MB  0.59%  text/template/parse.(*Tree).newText
+```
+
+您还可以运行 `web`、`pdf` 或 `gif` 命令来获取堆图形。
+
+## 下载分析样本并在本地进行分析
+
+我们在调度程序docker映像中包含了基本的go/go-tool二进制文件,您应该能够进行一些基本的分析
+docker容器内的分析。 但是,如果您想深入研究一些问题,最好进行分析
+本地。 然后您需要先将示例文件复制到本地环境。 复制文件的命令如下:
+
+```
+kubectl cp ${SCHEDULER_POD_NAME}:${SAMPLE_PATH_IN_DOCKER_CONTAINER} ${LOCAL_COPY_PATH}
+```
+
+例如
+
+```
+kubectl cp yunikorn-scheduler-cf8f8dd8-6szh5:/root/pprof/pprof.k8s_yunikorn_scheduler.samples.cpu.001.pb.gz /Users/wyang/Downloads/pprof.k8s_yunikorn_scheduler.samples.cpu.001.pb.gz
+```
+
+在本地环境中获取文件后,您可以运行“pprof”命令进行分析。
+
+```
+go tool pprof /Users/wyang/Downloads/pprof.k8s_yunikorn_scheduler.samples.cpu.001.pb.gz
+```
+
+## 资源
+
+* pprof 文档 https://github.com/google/pprof/tree/master/doc。
diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/gang_scheduling.md b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/gang_scheduling.md
new file mode 100644
index 000000000..f7593a573
--- /dev/null
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/gang_scheduling.md
@@ -0,0 +1,288 @@
+---
+id: gang_scheduling
+title: Gang Scheduling
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## What is Gang Scheduling
+
+When Gang Scheduling is enabled, YuniKorn schedules the app only when
+the app’s minimal resource request can be satisfied. Otherwise, apps
+will be waiting in the queue. Apps are queued in hierarchy queues,
+with gang scheduling enabled, each resource queue is assigned with the
+maximum number of applications running concurrently with min resource guaranteed.
+
+![Gang Scheduling](./../assets/gang_scheduling_iintro.png)
+
+## Enable Gang Scheduling
+
+There is no cluster-wide configuration needed to enable Gang Scheduling.
+The scheduler actively monitors the metadata of each app, if the app has included
+a valid taskGroups definition, it will be considered as gang scheduling desired.
+
+:::info Task Group
+A task group is a “gang” of tasks in an app, these tasks are having the same resource profile
+and the same placement constraints. They are considered as homogeneous requests that can be
+treated as the same kind in the scheduler.
+:::
+
+### Prerequisite
+
+For the queues which runs gang scheduling enabled applications, the queue sorting policy needs to be set either
+`FIFO` or `StateAware`. To configure queue sorting policy, please refer to doc: [app sorting policies](user_guide/sorting_policies.md#Application_sorting).
+
+:::info Why FIFO based sorting policy?
+When Gang Scheduling is enabled, the scheduler proactively reserves resources
+for each application. If the queue sorting policy is not FIFO based (StateAware is FIFO based sorting policy),
+the scheduler might reserve partial resources for each app and causing resource segmentation issues.
+:::
+
+### App Configuration
+
+On Kubernetes, YuniKorn discovers apps by loading metadata from individual pod, the first pod of the app
+is required to enclosed with a full copy of app metadata. If the app doesn’t have any notion about the first or second pod,
+then all pods are required to carry the same taskGroups info. Gang scheduling requires taskGroups definition,
+which can be specified via pod annotations. The required fields are:
+
+| Annotation                                     | Value |
+|----------------------------------------------- |---------------------	|
+| yunikorn.apache.org/task-group-name 	         | Task group name, it must be unique within the application |
+| yunikorn.apache.org/task-groups                | A list of task groups, each item contains all the info defined for the certain task group |
+| yunikorn.apache.org/schedulingPolicyParameters | Optional. A arbitrary key value pairs to define scheduling policy parameters. Please read [schedulingPolicyParameters section](#scheduling-policy-parameters) |
+
+#### How many task groups needed?
+
+This depends on how many different types of pods this app requests from K8s. A task group is a “gang” of tasks in an app,
+these tasks are having the same resource profile and the same placement constraints. They are considered as homogeneous
+requests that can be treated as the same kind in the scheduler. Use Spark as an example, each job will need to have 2 task groups,
+one for the driver pod and the other one for the executor pods.
+
+#### How to define task groups?
+
+The task group definition is a copy of the app’s real pod definition, values for fields like resources, node-selector, toleration
+and affinity should be the same as the real pods. This is to ensure the scheduler can reserve resources with the
+exact correct pod specification.
+
+#### Scheduling Policy Parameters
+
+Scheduling policy related configurable parameters. Apply the parameters in the following format in pod's annotation:
+
+```yaml
+annotations:
+   yunikorn.apache.org/schedulingPolicyParameters: "PARAM1=VALUE1 PARAM2=VALUE2 ..."
+```
+
+Currently, the following parameters are supported:
+
+`placeholderTimeoutInSeconds`
+
+Default value: *15 minutes*.
+This parameter defines the reservation timeout for how long the scheduler should wait until giving up allocating all the placeholders.
+The timeout timer starts to tick when the scheduler *allocates the first placeholder pod*. This ensures if the scheduler
+could not schedule all the placeholder pods, it will eventually give up after a certain amount of time. So that the resources can be
+freed up and used by other apps. If non of the placeholders can be allocated, this timeout won't kick-in. To avoid the placeholder
+pods stuck forever, please refer to [troubleshooting](trouble_shooting.md#gang-scheduling) for solutions.
+
+` gangSchedulingStyle`
+
+Valid values: *Soft*, *Hard*
+
+Default value: *Soft*.
+This parameter defines the fallback mechanism if the app encounters gang issues due to placeholder pod allocation.
+See more details in [Gang Scheduling styles](#gang-scheduling-styles) section
+
+More scheduling parameters will added in order to provide more flexibility while scheduling apps.
+
+#### Example
+
+The following example is a yaml file for a job. This job launches 2 pods and each pod sleeps 30 seconds.
+The notable change in the pod spec is *spec.template.metadata.annotations*, where we defined `yunikorn.apache.org/task-group-name`
+and `yunikorn.apache.org/task-groups`.
+
+```yaml
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: gang-scheduling-job-example
+spec:
+  completions: 2
+  parallelism: 2
+  template:
+    metadata:
+      labels:
+        app: sleep
+        applicationId: "gang-scheduling-job-example"
+        queue: root.sandbox
+      annotations:
+        yunikorn.apache.org/task-group-name: task-group-example
+        yunikorn.apache.org/task-groups: |-
+          [{
+              "name": "task-group-example",
+              "minMember": 2,
+              "minResource": {
+                "cpu": "100m",
+                "memory": "50M"
+              },
+              "nodeSelector": {},
+              "tolerations": [],
+              "affinity": {}
+          }]
+    spec:
+      schedulerName: yunikorn
+      restartPolicy: Never
+      containers:
+        - name: sleep30
+          image: "alpine:latest"
+          command: ["sleep", "30"]
+          resources:
+            requests:
+              cpu: "100m"
+              memory: "50M"
+```
+
+When this job is submitted to Kubernetes, 2 pods will be created using the same template, and they all belong to one taskGroup:
+*“task-group-example”*. YuniKorn will create 2 placeholder pods, each uses the resources specified in the taskGroup definition.
+When all 2 placeholders are allocated, the scheduler will bind the the real 2 sleep pods using the spot reserved by the placeholders.
+
+You can add more than one taskGroups if necessary, each taskGroup is identified by the taskGroup name,
+it is required to map each real pod with a pre-defined taskGroup by setting the taskGroup name. Note,
+the task group name is only required to be unique within an application.
+
+### Enable Gang scheduling for Spark jobs
+
+Each Spark job runs 2 types of pods, driver and executor. Hence, we need to define 2 task groups for each job.
+The annotations for the driver pod looks like:
+
+```yaml
+Annotations:
+  yunikorn.apache.org/schedulingPolicyParameters: “placeholderTimeoutSeconds=30”
+  yunikorn.apache.org/taskGroupName: “spark-driver”
+  yunikorn.apache.org/taskGroup: “
+    TaskGroups: [
+     {
+       Name: “spark-driver”,
+       minMember: 1,
+       minResource: {
+         Cpu: 1,
+         Memory: 2Gi
+       },
+       Node-selector: ...,
+       Tolerations: ...,
+       Affinity: ...
+     },
+      {
+        Name: “spark-executor”,
+        minMember: 10, 
+        minResource: {
+          Cpu: 1,
+          Memory: 2Gi
+        }
+      }
+  ]
+  ”
+```
+
+:::note
+Spark driver and executor pod has memory overhead, that needs to be considered in the taskGroup resources. 
+:::
+
+For all the executor pods,
+
+```yaml
+Annotations:
+  # the taskGroup name should match to the names
+  # defined in the taskGroups field
+  yunikorn.apache.org/taskGroupName: “spark-executor”
+```
+
+Once the job is submitted to the scheduler, the job won’t be scheduled immediately.
+Instead, the scheduler will ensure it gets its minimal resources before actually starting the driver/executors. 
+
+## Gang scheduling Styles
+
+There are 2 gang scheduling styles supported, Soft and Hard respectively. It can be configured per app-level to define how the app will behave in case the gang scheduling fails.
+
+- `Hard style`: when this style is used, we will have the initial behavior, more precisely if the application cannot be scheduled according to gang scheduling rules, and it times out, it will be marked as failed, without retrying to schedule it.
+- `Soft style`: when the app cannot be gang scheduled, it will fall back to the normal scheduling, and the non-gang scheduling strategy will be used to achieve the best-effort scheduling. When this happens, the app transits to the Resuming state and all the remaining placeholder pods will be cleaned up.
+
+**Default style used**: `Soft`
+
+**Enable a specific style**: the style can be changed by setting in the application definition the ‘gangSchedulingStyle’ parameter to Soft or Hard.
+
+#### Example
+
+```yaml
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: gang-app-timeout
+spec:
+  completions: 4
+  parallelism: 4
+  template:
+    metadata:
+      labels:
+        app: sleep
+        applicationId: gang-app-timeout
+        queue: fifo
+      annotations:
+        yunikorn.apache.org/task-group-name: sched-style
+        yunikorn.apache.org/schedulingPolicyParameters: "placeholderTimeoutInSeconds=60 gangSchedulingStyle=Hard"
+        yunikorn.apache.org/task-groups: |-
+          [{
+              "name": "sched-style",
+              "minMember": 4,
+              "minResource": {
+                "cpu": "1",
+                "memory": "1000M"
+              },
+              "nodeSelector": {},
+              "tolerations": [],
+              "affinity": {}
+          }]
+    spec:
+      schedulerName: yunikorn
+      restartPolicy: Never
+      containers:
+        - name: sleep30
+          image: "alpine:latest"
+          imagePullPolicy: "IfNotPresent"
+          command: ["sleep", "30"]
+          resources:
+            requests:
+              cpu: "1"
+              memory: "1000M"
+
+```
+
+## Verify Configuration
+
+To verify if the configuration has been done completely and correctly, check the following things:
+1. When an app is submitted, verify the expected number of placeholders are created by the scheduler.
+If you define 2 task groups, 1 with minMember 1 and the other with minMember 5, that means we are expecting 6 placeholder
+gets created once the job is submitted.
+2. Verify the placeholder spec is correct. Each placeholder needs to have the same info as the real pod in the same taskGroup.
+Check field including: namespace, pod resources, node-selector, toleration and affinity.
+3. Verify the placeholders can be allocated on correct type of nodes, and verify the real pods are started by replacing the placeholder pods.
+
+## Troubleshooting
+
+Please see the troubleshooting doc when gang scheduling is enabled [here](trouble_shooting.md#gang-scheduling).
diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/trouble_shooting.md b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/trouble_shooting.md
new file mode 100644
index 000000000..deada946f
--- /dev/null
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/trouble_shooting.md
@@ -0,0 +1,192 @@
+---
+id: trouble_shooting
+title: Trouble Shooting
+---
+
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+ 
+## Scheduler logs
+
+### Retrieve scheduler logs
+
+Currently, the scheduler writes its logs to stdout/stderr, docker container handles the redirection of these logs to a
+local location on the underneath node, you can read more document [here](https://docs.docker.com/config/containers/logging/configure/).
+These logs can be retrieved by [kubectl logs](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#logs). Such as:
+
+```shell script
+// get the scheduler pod
+kubectl get pod -l component=yunikorn-scheduler -n yunikorn
+NAME                                  READY   STATUS    RESTARTS   AGE
+yunikorn-scheduler-766d7d6cdd-44b82   2/2     Running   0          33h
+
+// retrieve logs
+kubectl logs yunikorn-scheduler-766d7d6cdd-44b82 yunikorn-scheduler-k8s -n yunikorn
+```
+
+In most cases, this command cannot get all logs because the scheduler is rolling logs very fast. To retrieve more logs in
+the past, you will need to setup the [cluster level logging](https://kubernetes.io/docs/concepts/cluster-administration/logging/#cluster-level-logging-architectures).
+The recommended setup is to leverage [fluentd](https://www.fluentd.org/) to collect and persistent logs on an external storage, e.g s3. 
+
+### Set Logging Level
+
+:::note
+Changing the logging level requires a restart of the scheduler pod.
+:::
+
+Stop the scheduler:
+
+```shell script
+kubectl scale deployment yunikorn-scheduler -n yunikorn --replicas=0
+```
+edit the deployment config in vim:
+
+```shell script
+kubectl edit deployment yunikorn-scheduler -n yunikorn
+```
+
+add `LOG_LEVEL` to the `env` field of the container template. For example setting `LOG_LEVEL` to `0` sets the logging
+level to `INFO`.
+
+```yaml
+apiVersion: extensions/v1beta1
+kind: Deployment
+metadata:
+ ...
+spec:
+  template: 
+   ...
+    spec:
+      containers:
+      - env:
+        - name: LOG_LEVEL
+          value: '0'
+```
+
+Start the scheduler:
+
+```shell script
+kubectl scale deployment yunikorn-scheduler -n yunikorn --replicas=1
+```
+
+Available logging levels:
+
+| Value 	| Logging Level 	|
+|:-----:	|:-------------:	|
+|   -1  	|     DEBUG     	|
+|   0   	|      INFO     	|
+|   1   	|      WARN     	|
+|   2   	|     ERROR     	|
+|   3   	|     DPanic    	|
+|   4   	|     Panic     	|
+|   5   	|     Fatal     	|
+
+## Pods are stuck at Pending state
+
+If some pods are stuck at Pending state, that means the scheduler could not find a node to allocate the pod. There are
+several possibilities to cause this:
+
+### 1. Non of the nodes satisfy pod placement requirement
+
+A pod can be configured with some placement constraints, such as [node-selector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector),
+[affinity/anti-affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity),
+do not have certain toleration for node [taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/), etc.
+To debug such issues, you can describe the pod by:
+
+```shell script
+kubectl describe pod <pod-name> -n <namespace>
+```
+
+the pod events will contain the predicate failures and that explains why nodes are not qualified for allocation.
+
+### 2. The queue is running out of capacity
+
+If the queue is running out of capacity, pods will be pending for available queue resources. To check if a queue is still
+having enough capacity for the pending pods, there are several approaches:
+
+1) check the queue usage from yunikorn UI
+
+If you do not know how to access the UI, you can refer the document [here](../get_started/get_started.md#访问-web-ui). Go
+to the `Queues` page, navigate to the queue where this job is submitted to. You will be able to see the available capacity
+left for the queue.
+
+2) check the pod events
+
+Run the `kubectl describe pod` to get the pod events. If you see some event like:
+`Application <appID> does not fit into <queuePath> queue`. That means the pod could not get allocated because the queue
+is running out of capacity.
+
+The pod will be allocated if some other pods in this queue is completed or removed. If the pod remains pending even
+the queue has capacity, that may because it is waiting for the cluster to scale up.
+
+## Restart the scheduler
+
+YuniKorn can recover its state upon a restart. YuniKorn scheduler pod is deployed as a deployment, restart the scheduler
+can be done by scale down and up the replica:
+
+```shell script
+kubectl scale deployment yunikorn-scheduler -n yunikorn --replicas=0
+kubectl scale deployment yunikorn-scheduler -n yunikorn --replicas=1
+```
+
+## Gang Scheduling
+
+### 1. No placeholders created, app's pods are pending
+
+*Reason*: This is usually because the app is rejected by the scheduler, therefore non of the pods are scheduled.
+The common reasons caused the rejection are: 1) The taskGroups definition is invalid. The scheduler does the
+sanity check upon app submission, to ensure all the taskGroups are defined correctly, if these info are malformed,
+the scheduler rejects the app; 2) The total min resources defined in the taskGroups is bigger than the queues' max
+capacity, scheduler rejects the app because it won't fit into the queue's capacity. Check the pod event for relevant messages,
+and you will also be able to find more detail error messages from the schedulers' log.
+
+*Solution*: Correct the taskGroups definition and retry submitting the app. 
+
+### 2. Not all placeholders can be allocated
+
+*Reason*: The placeholders also consume resources, if not all of them can be allocated, that usually means either the queue
+or the cluster has no sufficient resources for them. In this case, the placeholders will be cleaned up after a certain
+amount of time, defined by the `placeholderTimeoutInSeconds` scheduling policy parameter.
+
+*Solution*: Note, if the placeholder timeout reaches, currently the app will transit to failed state and can not be scheduled
+anymore. You can increase the placeholder timeout value if you are willing to wait for a longer time. In the future, a fallback policy
+might be added to provide some retry other than failing the app.
+
+### 3. Not all placeholders are swapped
+
+*Reason*: This usually means the actual app's pods are less than the minMembers defined in the taskGroups.
+
+*Solution*: Check the `minMember` in the taskGroup field and ensure it is correctly set. The `minMember` can be less than
+the actual pods, setting it to bigger than the actual number of pods is invalid.
+
+### 4.Placeholders are not cleaned up when the app terminated
+
+*Reason*: All the placeholders are set an [ownerReference](https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#owners-and-dependents)
+to the first real pod of the app, or the controller reference. If the placeholder could not be cleaned up, that means
+the garbage collection is not working properly. 
+
+*Solution*: check the placeholder `ownerReference` and the garbage collector in Kubernetes.    
+
+
+## Still got questions?
+
+No problem! The Apache YuniKorn community will be happy to help. You can reach out to the community with the following options:
+
+1. Post your questions to dev@yunikorn.apache.org
+2. Join the [YuniKorn slack channel](https://join.slack.com/t/yunikornworkspace/shared_invite/enQtNzAzMjY0OTI4MjYzLTBmMDdkYTAwNDMwNTE3NWVjZWE1OTczMWE4NDI2Yzg3MmEyZjUyYTZlMDE5M2U4ZjZhNmYyNGFmYjY4ZGYyMGE) and post your questions to the `#yunikorn-user` channel.
+3. Join the [community sync up meetings](http://yunikorn.apache.org/community/getInvolved#community-meetings) and directly talk to the community members. 
\ No newline at end of file
diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/workloads/run_flink.md b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/workloads/run_flink.md
new file mode 100644
index 000000000..40eb05b19
--- /dev/null
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/workloads/run_flink.md
@@ -0,0 +1,66 @@
+---
+id: run_flink
+title: 运行Flink作业
+description: 如何与YuniKorn一起运行Flink作业
+image: https://svn.apache.org/repos/asf/flink/site/img/logo/png/100/flink_squirrel_100_color.png
+keywords:
+ - spark
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+使用 YuniKorn 在 Kubernetes 上运行 [Apache Flink](https://flink.apache.org/) 非常容易。
+根据在 Kubernetes 上运行 Flink 的模式不同,配置会略有不同。
+
+## Standalone(独立)模式
+
+请关注 [Kubernetes 设置](https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/kubernetes.html) 以获取 standalone 部署模式的细节和示例。
+在这种模式下,我们可以直接在 Deployment/Job spec 中添加需要的标签(applicationId 和 queue)来使用 YuniKorn 调度器运行 flink 应用程序,以及 [使用 YuniKorn 调度器运行 workloads](#run-workloads-with-yunikorn-scheduler) .
+
+## Native(原生)模式
+
+请关注 [原生 Kubernetes 设置](https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/native_kubernetes.html) 以获取原生部署模式的细节和示例。
+只有 Flink 1.11 或更高版本才支持在 native 模式下使用 YuniKorn 调度程序运行 Flink 应用程序,我们可以利用两个 Flink 配置 `kubernetes.jobmanager.labels` 和 `kubernetes.taskmanager.labels` 来设置所需的标签。
+例子:
+
+* 启动一个 Flink session
+```
+./bin/kubernetes-session.sh \
+  -Dkubernetes.cluster-id=<ClusterId> \
+  -Dtaskmanager.memory.process.size=4096m \
+  -Dkubernetes.taskmanager.cpu=2 \
+  -Dtaskmanager.numberOfTaskSlots=4 \
+  -Dresourcemanager.taskmanager-timeout=3600000 \
+  -Dkubernetes.jobmanager.labels=applicationId:MyOwnApplicationId,queue:root.sandbox \
+  -Dkubernetes.taskmanager.labels=applicationId:MyOwnApplicationId,queue:root.sandbox
+```
+
+* 启动一个 Flink application
+```
+./bin/flink run-application -p 8 -t kubernetes-application \
+  -Dkubernetes.cluster-id=<ClusterId> \
+  -Dtaskmanager.memory.process.size=4096m \
+  -Dkubernetes.taskmanager.cpu=2 \
+  -Dtaskmanager.numberOfTaskSlots=4 \
+  -Dkubernetes.container.image=<CustomImageName> \
+  -Dkubernetes.jobmanager.labels=applicationId:MyOwnApplicationId,queue:root.sandbox \
+  -Dkubernetes.taskmanager.labels=applicationId:MyOwnApplicationId,queue:root.sandbox \
+  local:///opt/flink/usrlib/my-flink-job.jar
+```
\ No newline at end of file
diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/workloads/run_spark.md b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/workloads/run_spark.md
new file mode 100644
index 000000000..b7f4f3ded
--- /dev/null
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/workloads/run_spark.md
@@ -0,0 +1,145 @@
+---
+id: run_spark
+title: 运行Spark作业
+description: 如何使用YuniKorn运行Spark作业
+keywords:
+ - spark
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+:::note 注意
+本文档假设您已安装YuniKorn及其准入控制器。请参阅 [开始](../../get_started/get_started.md) 查看如何操作。
+:::
+
+## 为Spark准备docker镜像
+
+要在Kubernetes上运行Spark,您需要Spark的docker镜像。您可以
+1)使用YuniKorn团队提供的docker镜像
+2)从头开始构建一个镜像。如果你想建立自己的Spark的docker镜像,您可以
+* 下载一个支持Kubernetes的Spark版本,URL: https://github.com/apache/spark
+* 构建支持Kubernetes的Spark版本:
+```shell script
+mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.4 -Phive -Pkubernetes -Phive-thriftserver -DskipTests package
+```
+
+## 为Spark作业创建一个命名空间
+
+创建一个命名空间:
+
+```shell script
+cat <<EOF | kubectl apply -f -
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: spark-test
+EOF
+```
+
+在 `spark-test` 命名空间下创建 service account 和 cluster role bindings :
+
+```shell script
+cat <<EOF | kubectl apply -n spark-test -f -
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: spark
+  namespace: spark-test
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: spark-cluster-role
+  namespace: spark-test
+rules:
+- apiGroups: [""]
+  resources: ["pods"]
+  verbs: ["get", "watch", "list", "create", "delete"]
+- apiGroups: [""]
+  resources: ["configmaps"]
+  verbs: ["get", "create", "delete"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: spark-cluster-role-binding
+  namespace: spark-test
+subjects:
+- kind: ServiceAccount
+  name: spark
+  namespace: spark-test
+roleRef:
+  kind: ClusterRole
+  name: spark-cluster-role
+  apiGroup: rbac.authorization.k8s.io
+EOF
+```
+
+:::注意
+不可以在生产环境使用 `ClusterRole` 和 `ClusterRoleBinding` 去运行一个Spark作业!
+请为运行Spark作业配置更细粒度的安全上下文。有关如何配置正确的RBAC规则的详细信息,请参见[链接](https://kubernetes.io/docs/reference/access-authn-authz/rbac/)。
+:::
+
+## 提交一个Spark作业
+
+如果这是从本地计算机运行的,您需要启动代理才能与api服务器通信。
+```shell script
+kubectl proxy
+```
+
+运行一个简单的 SparkPi 作业(这假设Spark二进制文件已安装到 `/usr/local` 目录)。
+```shell script
+export SPARK_HOME=/usr/local/spark-2.4.4-bin-hadoop2.7/
+${SPARK_HOME}/bin/spark-submit --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi \
+   --master k8s://http://localhost:8001 --deploy-mode cluster --name spark-pi \
+   --class org.apache.spark.examples.SparkPi \
+   --conf spark.executor.instances=1 \
+   --conf spark.kubernetes.namespace=spark-test \
+   --conf spark.kubernetes.executor.request.cores=1 \
+   --conf spark.kubernetes.container.image=apache/yunikorn:spark-2.4.4 \
+   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-test:spark \
+   local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar
+```
+
+您可以看见Spark的driver和executors在Kubernetes上创建:
+
+![spark-pods](./../../assets/spark-pods.png)
+
+您还可以从 YuniKorn UI 查看作业信息。如果您不知道如何访问 YuniKorn UI,请阅读文档
+[链接](../../get_started/get_started.md#访问-web-ui).
+
+![spark-jobs-on-ui](./../../assets/spark-jobs-on-ui.png)
+
+## 幕后发生了什么?
+
+当Spark作业提交到集群时,该作业将提交到 `spark-test` 命名空间。Spark的driver的pod将首先在此名称空间下创建。
+由于该集群已启用YuniKorn准入控制器,当driver的pod创建后,准入控制器会修改pod的规范并注入 `schedulerName=yunikorn`,
+通过这样做默认K8s调度程序将跳过此pod,而由YuniKorn调度。请查看这里[在Kubernetes配置其他调度器](https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/)来了解是如何完成的.
+
+默认配置已启用放置规则,该规则会自动将 `spark-test` 命名空间映射到YuniKorn的队列 `root.spark test`。
+提交到此命名空间的所有Spark作业将首先自动提交到该队列。
+要了解有关放置规则如何工作的更多信息,请参阅文档[应用放置规则](user_guide/placement_rules.md)。
+到目前为止,名称空间定义pod的安全上下文,队列考虑到作业顺序、队列资源公平性等因素会确定如何调度作业和pod。
+注意,这是最简单的设置,不强制执行队列容量。该队列被视为具有无限容量。
+
+YuniKor在标签 `spark-app-selector` 中重复设置Spark的应用程序ID,并提交此作业去YuniKorn,同时被视为一份作业。
+当集群中有足够的资源时,作业被调度并运行。
+YuniKorn将driver的pod分配给一个节点,绑定pod并启动所有容器。
+一旦driver的pod启动,它会请求一堆executor的pod来运行它的任务。这些pod也将在相同的名称空间中创建,并且被YuniKorn所调度。
diff --git a/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/workloads/run_tensorflow.md b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/workloads/run_tensorflow.md
new file mode 100644
index 000000000..c5d708c5e
--- /dev/null
+++ b/i18n/zh-cn/docusaurus-plugin-content-docs/version-1.1.0/user_guide/workloads/run_tensorflow.md
@@ -0,0 +1,93 @@
+---
+id: run_tf
+title: 运行TensorFlow作业
+description: 如何使用 YuniKorn 运行 TensorFlow 作业
+keywords:
+ - tensorflow
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+本章节概述了如何设置 [training-operator](https://github.com/kubeflow/training-operator) 以及如何使用 YuniKorn 调度器运行 Tensorflow 作业。
+training-operator 是由 Kubeflow 维护的一体化集成的训练 operator。它不仅支持 TensorFlow,还支持 PyTorch、XGboots 等。
+
+## 安装 training-operator
+您可以使用以下命令在 kubeflow 命名空间中默认安装 training operator。如果安装有问题,
+请参阅 [此文档](https://github.com/kubeflow/training-operator#installation) 来查找相关的详细信息。
+```
+kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone?ref=v1.3.0"
+```
+
+## 准备 docker 镜像
+在开始于 Kubernetes 上运行 TensorFlow 作业之前,您需要构建 docker 镜像。
+1. 从 [deployment/examples/tfjob](https://github.com/apache/yunikorn-k8shim/tree/master/deployments/examples/tfjob) 上下载文件
+2. 使用以下命令构建这个 docker 镜像
+
+```
+docker build -f Dockerfile -t kubeflow/tf-dist-mnist-test:1.0 .
+```
+
+## 运行一个 TensorFlow 作业
+以下是一个使用 MNIST [样例](https://github.com/apache/yunikorn-k8shim/blob/master/deployments/examples/tfjob/tf-job-mnist.yaml) 的 TFJob yaml. 
+
+```yaml
+apiVersion: kubeflow.org/v1
+kind: TFJob
+metadata:
+  name: dist-mnist-for-e2e-test
+  namespace: kubeflow
+spec:
+  tfReplicaSpecs:
+    PS:
+      replicas: 2
+      restartPolicy: Never
+      template:
+        metadata:
+          labels:
+            applicationId: "tf_job_20200521_001"
+            queue: root.sandbox
+        spec:
+          schedulerName: yunikorn
+          containers:
+            - name: tensorflow
+              image: kubeflow/tf-dist-mnist-test:1.0
+    Worker:
+      replicas: 4
+      restartPolicy: Never
+      template:
+        metadata:
+          labels:
+            applicationId: "tf_job_20200521_001"
+            queue: root.sandbox
+        spec:
+          schedulerName: yunikorn
+          containers:
+            - name: tensorflow
+              image: kubeflow/tf-dist-mnist-test:1.0
+```
+创建 TFJob
+```
+kubectl create -f deployments/examples/tfjob/tf-job-mnist.yaml
+```
+
+您可以从 YuniKorn UI 中查看作业信息。 如果您不知道如何访问 YuniKorn UI,
+请阅读此 [文档](../../get_started/get_started.md#访问-web-ui)。
+
+![tf-job-on-ui](../../assets/tf-job-on-ui.png)
diff --git a/package.json b/package.json
index a5ac5ba74..ba0000903 100644
--- a/package.json
+++ b/package.json
@@ -15,6 +15,7 @@
     "@docusaurus/theme-search-algolia": "^2.0.1",
     "@mdx-js/react": "^1.5.8",
     "clsx": "^1.1.1",
+    "node": "^18.8.0",
     "react": "17.0.2",
     "react-dom": "17.0.2"
   },
diff --git a/src/pages/community/download.md b/src/pages/community/download.md
index eb9f0bee6..4a375a60e 100644
--- a/src/pages/community/download.md
+++ b/src/pages/community/download.md
@@ -31,13 +31,13 @@ All release artifacts should be checked for tampering using GPG or SHA-512.
 
 We publish prebuilt docker images for everyone's convenience.
 
-The latest release of Apache YuniKorn is v1.0.0.
+The latest release of Apache YuniKorn is v1.1.0.
 
 | Version | Release date | Source download                                                                                                                                                                                                                                                                                                                                                              | Docker images                                                                                      [...]
 |---------|--------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------- [...]
-| v1.0.0  | 2022-05-06   | [Download](https://www.apache.org/dyn/closer.lua/yunikorn/1.0.0/apache-yunikorn-1.0.0-src.tar.gz) <br />[Checksum](https://downloads.apache.org/yunikorn/1.0.0/apache-yunikorn-1.0.0-src.tar.gz.sha512) & [Signature](https://downloads.apache.org/yunikorn/1.0.0/apache-yunikorn-1.0.0-src.tar.gz.asc)                                                                      | [scheduler](https://hub.docker.com/layers/apache/yunikorn/scheduler-1.0.0/images/sha256-a38ef73733 [...]
+| v1.1.0  | 2022-09-08   | [Download](https://www.apache.org/dyn/closer.lua/yunikorn/1.1.0/apache-yunikorn-1.1.0-src.tar.gz) <br />[Checksum](https://downloads.apache.org/yunikorn/1.1.0/apache-yunikorn-1.1.0-src.tar.gz.sha512) & [Signature](https://downloads.apache.org/yunikorn/1.1.0/apache-yunikorn-1.1.0-src.tar.gz.asc)                                                                      | [scheduler](https://hub.docker.com/layers/apache/yunikorn/scheduler-1.1.0/images/sha256-5a45cede35 [...]
+| v1.0.0  | 2022-05-06   | [Download](https://archive.apache.org/dist/yunikorn/1.0.0/apache-yunikorn-1.0.0-src.tar.gz) <br />[Checksum](https://archive.apache.org/dist/yunikorn/1.0.0/apache-yunikorn-1.0.0-src.tar.gz.sha512) & [Signature](https://archive.apache.org/dist/yunikorn/1.0.0/apache-yunikorn-1.0.0-src.tar.gz.sha512)                                                                      | [scheduler](https://hub.docker.com/layers/apache/yunikorn/scheduler-1.0.0/images/sha256-a38ef73 [...]
 | v0.12.2 | 2022-02-03   | [Download](https://archive.apache.org/dist/incubator/yunikorn/0.12.2/apache-yunikorn-0.12.2-incubating-src.tar.gz) <br />[Checksum](https://archive.apache.org/dist/incubator/yunikorn/0.12.2/apache-yunikorn-0.12.2-incubating-src.tar.gz.sha512) & [Signature](https://archive.apache.org/dist/incubator/yunikorn/0.12.2/apache-yunikorn-0.12.2-incubating-src.tar.gz.asc) | [scheduler](https://hub.docker.com/layers/apache/yunikorn/scheduler-0.12.2/images/sha256-aa2de246f [...]
-| v0.11.0 | 2021-08-18   | [Download](https://archive.apache.org/dist/incubator/yunikorn/0.11.0/apache-yunikorn-0.11.0-incubating-src.tar.gz) <br />[Checksum](https://archive.apache.org/dist/incubator/yunikorn/0.11.0/apache-yunikorn-0.11.0-incubating-src.tar.gz.sha512) & [Signature](https://archive.apache.org/dist/incubator/yunikorn/0.11.0/apache-yunikorn-0.11.0-incubating-src.tar.gz.asc) | [scheduler](https://hub.docker.com/layers/apache/yunikorn/scheduler-0.11.0/images/sha256-7d156e4df [...]
 
 ## Verifying the signature
 
diff --git a/src/pages/release-announce/1.1.0.md b/src/pages/release-announce/1.1.0.md
new file mode 100644
index 000000000..7ac0a18ee
--- /dev/null
+++ b/src/pages/release-announce/1.1.0.md
@@ -0,0 +1,58 @@
+---
+id: rn-1.1.0
+title: Release Announcement v1.1.0
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Release Announcement v1.1.0
+We are pleased to announce that the Apache YuniKorn community has voted to release 1.1.0. Apache YuniKorn is a standalone resource scheduler, designed for managing, and scheduling Big Data workloads on container orchestration frameworks like Kubernetes for on-prem and on-cloud use cases.
+
+## Overview
+The Apache YuniKorn community has fixed 87 [JIRAs](https://issues.apache.org/jira/issues/?filter=12352202) in this release. 
+
+Release manager: Peter Bacsko
+
+Release date: 2022-09-08
+
+## Highlights
+
+### REST API documentation and enhancements
+The REST API now can return the details of a [specific application](https://issues.apache.org/jira/browse/YUNIKORN-1217) and list the [pending allocations](https://issues.apache.org/jira/browse/YUNIKORN-1263) of an application.
+Documentation of the REST API have also been enhanced.
+
+### Multi-architecture build
+With the ARM architecture becoming more popular, we now [build](https://issues.apache.org/jira/browse/YUNIKORN-1215) binaries and Docker images for both `amd64` and `arm64` targets.
+
+
+### Recovery stabilization
+Several issues have been identified during the recovery phase of Yunikorn which mostly affected gang scheduling (eg. running placeholders getting replaced [immediately](https://issues.apache.org/jira/browse/YUNIKORN-1197)) but also Spark [workloads](https://issues.apache.org/jira/browse/YUNIKORN-1217). 
+
+### DaemonSet scheduling
+Scheduling of DaemonSet pods were problematic before this release. If the node was full, then those pods might not have been scheduled. However, it is usually important to start DeamonSet pods as they often perform tasks that are necessary on all nodes like log collection and aggregation, resource monitoring, storage management, etc. meaning they have priority over regular application pods.
+
+The implementation of [YUNIKORN-1085](https://issues.apache.org/jira/browse/YUNIKORN-1085) ensures that we have a predictable preemption mechanism which terminates running pods if necessary to make room for DaemonSet pods.
+
+### e2e testing improvements
+Additional end-to-end tests have been [written](https://issues.apache.org/jira/browse/YUNIKORN-751) to increase the coverage of Yunikorn as we support more K8s versions.
+
+## Community
+The Apache YuniKorn community is pleased to welcome new PMC members Peter Bacsko, Manikandan Ramaraj and committer Ted Lin.
+
diff --git a/versioned_docs/version-1.1.0/api/cluster.md b/versioned_docs/version-1.1.0/api/cluster.md
new file mode 100644
index 000000000..524bfb069
--- /dev/null
+++ b/versioned_docs/version-1.1.0/api/cluster.md
@@ -0,0 +1,85 @@
+---
+id: cluster
+title: Cluster
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Clusters
+
+Returns general information about the clusters managed by the YuniKorn Scheduler. Information includes number of (total, failed, pending, running, completed) applications and containers and the build information of resource managers.  
+
+**URL** : `/ws/v1/clusters`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+As an example, here is a response from a 2-node cluster with 3 applications and 4 running containers and 1 resource manager.
+
+```json
+[
+    {
+        "startTime": 1649167576110754000,
+        "rmBuildInformation": [
+            {
+                "buildDate": "2022-02-21T19:09:16+0800",
+                "buildVersion": "latest",
+                "isPluginVersion": "false",
+                "rmId": "rm-123"
+            }
+        ],
+        "partition": "default",
+        "clusterName": "kubernetes",
+        "totalApplications": "3",
+        "failedApplications": "1",
+        "pendingApplications": "",
+        "runningApplications": "3",
+        "completedApplications": "",
+        "totalContainers": "4",
+        "failedContainers": "",
+        "pendingContainers": "",
+        "runningContainers": "4",
+        "activeNodes": "2",
+        "totalNodes": "2",
+        "failedNodes": ""
+    }
+]
+```
+
+### Error response
+
+**Code** : `500 Internal Server Error`
+
+**Content examples**
+
+```json
+{
+    "status_code": 500,
+    "message": "system error message. for example, json: invalid UTF-8 in string: ..",
+    "description": "system error message. for example, json: invalid UTF-8 in string: .."
+}
+```
diff --git a/versioned_docs/version-1.1.0/api/scheduler.md b/versioned_docs/version-1.1.0/api/scheduler.md
new file mode 100644
index 000000000..ef292d1d7
--- /dev/null
+++ b/versioned_docs/version-1.1.0/api/scheduler.md
@@ -0,0 +1,1479 @@
+---
+id: scheduler
+title: Scheduler
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Overview
+
+The scheduler REST API returns information about various objects used by the YuniKorn Scheduler.
+
+Many of these APIs return collections of resources. Internally, all resources are represented as raw
+64-bit signed integer types. When interpreting responses from the REST API, resources of type `memory`
+are returned in units of bytes while resources of type `vcore` are returned in units of millicores
+(thousands of a core). All other resource types have no specific unit assigned.
+
+Under the `allocations` field in the response content for the app/node-related calls in the following spec, `placeholderUsed` refers to whether or not the allocation is a replacement for a placeholder. If true, `requestTime` is the creation time of its placeholder allocation, otherwise it's that of the allocation's ask. `allocationTime` is the creation time of the allocation, and `allocationDelay` is simply the difference between `allocationTime` and `requestTime`.
+
+## Partitions
+
+Displays general information about the partition like name, state, capacity, used capacity, utilization, and node sorting policy.
+
+**URL** : `/ws/v1/partitions`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```json
+[
+    {
+        "clusterId": "mycluster",
+        "name": "default",
+        "state": "Active",
+        "lastStateTransitionTime": 1649167576110754000,
+        "capacity": {
+            "capacity": {
+                "ephemeral-storage": 188176871424,
+                "hugepages-1Gi": 0,
+                "hugepages-2Mi": 0,
+                "memory": 1000000000,
+                "pods": 330,
+                "vcore": 1000
+            },
+            "usedCapacity": {
+                "memory": 800000000,
+                "vcore": 500
+            },
+            "utilization": {
+                "memory": 80,
+                "vcore": 50
+            }
+        },
+        "nodeSortingPolicy": {
+            "type": "fair",
+            "resourceWeights": {
+                "memory": 1.5,
+                "vcore": 1.3
+            }
+        },
+        "applications": {
+            "New": 5,
+            "Pending": 5,
+            "total": 10
+        }
+    },
+    {
+        "clusterId": "mycluster",
+        "name": "gpu",
+        "state": "Active",
+        "lastStateTransitionTime": 1649167576111236000,
+        "capacity": {
+            "capacity": {
+                "memory": 2000000000,
+                "vcore": 2000
+            },
+            "usedCapacity": {
+                "memory": 500000000,
+                "vcore": 300
+            },
+            "utilization": {
+                "memory": 25,
+                "vcore": 15
+            }
+        },
+        "nodeSortingPolicy": {
+            "type": "binpacking",
+            "resourceWeights": {
+                "memory": 0,
+                "vcore": 4.11
+            }
+        },
+        "applications": {
+            "New": 5,
+            "Running": 10,
+            "Pending": 5,
+            "total": 20
+        }
+    }
+]
+```
+
+### Error response
+
+**Code** : `500 Internal Server Error`
+
+**Content examples**
+
+```json
+{
+    "status_code": 500,
+    "message": "system error message. for example, json: invalid UTF-8 in string: ..",
+    "description": "system error message. for example, json: invalid UTF-8 in string: .."
+}
+```
+
+## Queues
+
+### Partition queues
+
+Fetch all Queues associated with given Partition and displays general information about the queues like name, status, capacities and properties. 
+The queues' hierarchy is kept in the response json.  
+
+**URL** : `/ws/v1/partition/{partitionName}/queues`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+For the default queue hierarchy (only `root.default` leaf queue exists) a similar response to the following is sent back to the client:
+
+```json
+[
+    {
+        "queuename": "root",
+        "status": "Active",
+        "maxResource": {
+            "ephemeral-storage": 188176871424,
+            "hugepages-1Gi": 0,
+            "hugepages-2Mi": 0,
+            "memory": 8000000000,
+            "pods": 330,
+            "vcore": 8000
+        },
+        "guaranteedResource": {
+            "memory": 54000000,
+            "vcore": 80
+        },
+        "allocatedResource": {
+            "memory": 54000000,
+            "vcore": 80
+        },
+        "isLeaf": "false",
+        "isManaged": "false",
+        "properties": {
+            "application.sort.policy": "stateaware"
+        },
+        "parent": "",
+        "template": {
+            "maxResource": {
+                "memory": 8000000000,
+                "vcore": 8000
+            },
+            "guaranteedResource": {
+                "memory": 54000000,
+                "vcore": 80
+            },
+            "properties": {
+                "application.sort.policy": "stateaware"
+            }
+        },
+        "partition": "default",
+        "children": [
+            {
+                "queuename": "root.default",
+                "status": "Active",
+                "maxResource": {
+                    "memory": 8000000000,
+                    "vcore": 8000
+                },
+                "guaranteedResource": {
+                    "memory": 54000000,
+                    "vcore": 80
+                },
+                "allocatedResource": {
+                    "memory": 54000000,
+                    "vcore": 80
+                },
+                "isLeaf": "true",
+                "isManaged": "false",
+                "properties": {
+                    "application.sort.policy": "stateaware"
+                },
+                "parent": "root",
+                "template": null,
+                "children": [],
+                "absUsedCapacity": {
+                    "memory": 1,
+                    "vcore": 0
+                }
+            }
+        ],
+        "absUsedCapacity": {
+            "memory": 1,
+            "vcore": 0
+        }
+    } 
+]
+```
+
+### Error response
+
+**Code** : `500 Internal Server Error`
+
+**Content examples**
+
+```json
+{
+    "status_code": 500,
+    "message": "system error message. for example, json: invalid UTF-8 in string: ..",
+    "description": "system error message. for example, json: invalid UTF-8 in string: .."
+}
+```
+
+## Applications
+
+### Queue applications
+
+Fetch all Applications for the given Partition/Queue combination and displays general information about the applications like used resources, queue name, submission time and allocations.
+
+**URL** : `/ws/v1/partition/{partitionName}/queue/{queueName}/applications`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+In the example below there are three allocations belonging to two applications, one with a pending request.
+
+```json
+[
+    {
+        "applicationID": "application-0001",
+        "usedResource": {
+            "memory": 4000000000,
+            "vcore": 4000
+        },
+        "maxUsedResource": {
+            "memory": 4000000000,
+            "vcore": 4000
+        },
+        "partition": "default",
+        "queueName": "root.default",
+        "submissionTime": 1648754032076020293,
+        "requests": [
+            {
+                "allocationKey": "f137fab6-3cfa-4536-93f7-bfff92689382",
+                "allocationTags": {
+                    "kubernetes.io/label/app": "sleep",
+                    "kubernetes.io/label/applicationId": "application-0001",
+                    "kubernetes.io/label/queue": "root.default",
+                    "kubernetes.io/meta/namespace": "default",
+                    "kubernetes.io/meta/podName": "task2"
+                },
+                "requestTime": 16487540320812345678,
+                "resource": {
+                    "memory": 4000000000,
+                    "vcore": 4000
+                },
+                "pendingCount": 1,
+                "priority": "0",
+                "requiredNodeId": "",
+                "applicationId": "application-0001",
+                "partition": "default",
+                "placeholder": false,
+                "placeholderTimeout": 0,
+                "taskGroupName": "",
+                "allocationLog": [
+                    {
+                        "message": "node(s) didn't match Pod's node affinity, node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate",
+                        "lastOccurrence": 16487540320812346001,
+                        "count": 81
+                    },
+                    {
+                        "message": "node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, node(s) didn't match Pod's node affinity",
+                        "lastOccurrence": 16487540320812346002,
+                        "count": 504
+                    },
+                    {
+                        "message": "node(s) didn't match Pod's node affinity",
+                        "lastOccurrence": 16487540320812346003,
+                        "count": 1170
+                    }
+                ]
+            }
+        ],
+        "allocations": [
+            {
+                "allocationKey": "deb12221-6b56-4fe9-87db-ebfadce9aa20",
+                "allocationTags": {
+                    "kubernetes.io/label/app": "sleep",
+                    "kubernetes.io/label/applicationId": "application-0001",
+                    "kubernetes.io/label/queue": "root.default",
+                    "kubernetes.io/meta/namespace": "default",
+                    "kubernetes.io/meta/podName": "task0"
+                },
+                "requestTime": 1648754034098912461,
+                "allocationTime": 1648754035973982920,
+                "allocationDelay": 1875070459,
+                "uuid": "9af35d44-2d6f-40d1-b51d-758859e6b8a8",
+                "resource": {
+                    "memory": 4000000000,
+                    "vcore": 4000
+                },
+                "priority": "0",
+                "nodeId": "node-0001",
+                "applicationId": "application-0001",
+                "partition": "default",
+                "placeholder": false,
+                "placeholderUsed": true
+            }
+        ],
+        "applicationState": "Running",
+        "user": "nobody",
+        "rejectedMessage": "",
+        "stateLog": [
+            {
+                "time": 1648741409145224000,
+                "applicationState": "Accepted"
+            },
+            {
+                "time": 1648741409145509400,
+                "applicationState": "Starting"
+            },
+            {
+                "time": 1648741409147432100,
+                "applicationState": "Running"
+            }
+        ],
+        "placeholderData": [
+            {
+                "taskGroupName": "task-group-example",
+                "count": 2,
+                "minResource": {
+                    "memory": 1000000000,
+                    "vcore": 100
+                },
+                "replaced": 1,
+                "timedout": 1
+            }
+        ]
+    },
+    {
+        "applicationID": "application-0002",
+        "usedResource": {
+            "memory": 4000000000,
+            "vcore": 4000
+        },
+        "maxUsedResource": {
+            "memory": 4000000000,
+            "vcore": 4000
+        },
+        "partition": "default",
+        "queueName": "root.default",
+        "submissionTime": 1648754032076020293,
+        "requests": [],
+        "allocations": [
+            {
+                "allocationKey": "54e5d77b-f4c3-4607-8038-03c9499dd99d",
+                "allocationTags": {
+                    "kubernetes.io/label/app": "sleep",
+                    "kubernetes.io/label/applicationId": "application-0002",
+                    "kubernetes.io/label/queue": "root.default",
+                    "kubernetes.io/meta/namespace": "default",
+                    "kubernetes.io/meta/podName": "task0"
+                },
+                "requestTime": 1648754034098912461,
+                "allocationTime": 1648754035973982920,
+                "allocationDelay": 1875070459,
+                "uuid": "08033f9a-4699-403c-9204-6333856b41bd",
+                "resource": {
+                    "memory": 2000000000,
+                    "vcore": 2000
+                },
+                "priority": "0",
+                "nodeId": "node-0001",
+                "applicationId": "application-0002",
+                "partition": "default",
+                "placeholder": false,
+                "placeholderUsed": false
+            },
+            {
+                "allocationKey": "af3bd2f3-31c5-42dd-8f3f-c2298ebdec81",
+                "allocationTags": {
+                    "kubernetes.io/label/app": "sleep",
+                    "kubernetes.io/label/applicationId": "application-0002",
+                    "kubernetes.io/label/queue": "root.default",
+                    "kubernetes.io/meta/namespace": "default",
+                    "kubernetes.io/meta/podName": "task1"
+                },
+                "requestTime": 1648754034098912461,
+                "allocationTime": 1648754035973982920,
+                "allocationDelay": 1875070459,
+                "uuid": "96beeb45-5ed2-4c19-9a83-2ac807637b3b",
+                "resource": {
+                    "memory": 2000000000,
+                    "vcore": 2000
+                },
+                "priority": "0",
+                "nodeId": "node-0002",
+                "applicationId": "application-0002",
+                "partition": "default",
+                "placeholder": false,
+                "placeholderUsed": false
+            }
+        ],
+        "applicationState": "Running",
+        "user": "nobody",
+        "rejectedMessage": "",
+        "stateLog": [
+            {
+                "time": 1648741409145224000,
+                "applicationState": "Accepted"
+            },
+            {
+                "time": 1648741409145509400,
+                "applicationState": "Starting"
+            },
+            {
+                "time": 1648741409147432100,
+                "applicationState": "Running"
+            }
+        ],
+        "placeholderData": []
+    }
+]
+```
+
+### Error response
+
+**Code** : `500 Internal Server Error`
+
+**Content examples**
+
+```json
+{
+    "status_code": 500,
+    "message": "system error message. for example, json: invalid UTF-8 in string: ..",
+    "description": "system error message. for example, json: invalid UTF-8 in string: .."
+}
+```
+
+## Application
+
+### Queue application
+
+Fetch an Application given a Partition, Queue and Application ID and displays general information about the application like used resources, queue name, submission time and allocations.
+
+**URL** : `/ws/v1/partition/{partitionName}/queue/{queueName}/application/{appId}`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content example**
+
+```json
+{
+    "applicationID": "application-0001",
+    "usedResource": {
+        "memory": 4000000000,
+        "vcore": 4000
+    },
+    "maxUsedResource": {
+        "memory": 4000000000,
+        "vcore": 4000
+    },
+    "partition": "default",
+    "queueName": "root.default",
+    "submissionTime": 1648754032076020293,
+    "requests": [
+        {
+            "allocationKey": "f137fab6-3cfa-4536-93f7-bfff92689382",
+            "allocationTags": {
+                "kubernetes.io/label/app": "sleep",
+                "kubernetes.io/label/applicationId": "application-0001",
+                "kubernetes.io/label/queue": "root.default",
+                "kubernetes.io/meta/namespace": "default",
+                "kubernetes.io/meta/podName": "task2"
+            },
+            "requestTime": 16487540320812345678,
+            "resource": {
+                "memory": 4000000000,
+                "vcore": 4000
+            },
+            "pendingCount": 1,
+            "priority": "0",
+            "requiredNodeId": "",
+            "applicationId": "application-0001",
+            "partition": "default",
+            "placeholder": false,
+            "placeholderTimeout": 0,
+            "taskGroupName": "",
+            "allocationLog": [
+                {
+                    "message": "node(s) didn't match Pod's node affinity, node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate",
+                    "lastOccurrence": 16487540320812346001,
+                    "count": 81
+                },
+                {
+                    "message": "node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, node(s) didn't match Pod's node affinity",
+                    "lastOccurrence": 16487540320812346002,
+                    "count": 504
+                },
+                {
+                    "message": "node(s) didn't match Pod's node affinity",
+                    "lastOccurrence": 16487540320812346003,
+                    "count": 1170
+                }
+            ]
+        }
+    ],
+    "allocations": [
+        {
+            "allocationKey": "deb12221-6b56-4fe9-87db-ebfadce9aa20",
+            "allocationTags": {
+                "kubernetes.io/label/app": "sleep",
+                "kubernetes.io/label/applicationId": "application-0001",
+                "kubernetes.io/label/queue": "root.default",
+                "kubernetes.io/meta/namespace": "default",
+                "kubernetes.io/meta/podName": "task0"
+            },
+            "requestTime": 1648754034098912461,
+            "allocationTime": 1648754035973982920,
+            "allocationDelay": 1875070459,
+            "uuid": "9af35d44-2d6f-40d1-b51d-758859e6b8a8",
+            "resource": {
+                "memory": 4000000000,
+                "vcore": 4000
+            },
+            "priority": "0",
+            "nodeId": "node-0001",
+            "applicationId": "application-0001",
+            "partition": "default",
+            "placeholder": false,
+            "placeholderUsed": true
+        }
+    ],
+    "applicationState": "Running",
+    "user": "nobody",
+    "rejectedMessage": "",
+    "stateLog": [
+        {
+            "time": 1648741409145224000,
+            "applicationState": "Accepted"
+        },
+        {
+            "time": 1648741409145509400,
+            "applicationState": "Starting"
+        },
+        {
+            "time": 1648741409147432100,
+            "applicationState": "Running"
+        }
+    ],
+    "placeholderData": [
+        {
+            "taskGroupName": "task-group-example",
+            "count": 2,
+            "minResource": {
+                "memory": 1000000000,
+                "vcore": 100
+            },
+            "replaced": 1,
+            "timedout": 1
+        }
+    ]
+}
+```
+
+### Error response
+
+**Code** : `500 Internal Server Error`
+
+**Content examples**
+
+```json
+{
+    "status_code": 500,
+    "message": "system error message. for example, json: invalid UTF-8 in string: ..",
+    "description": "system error message. for example, json: invalid UTF-8 in string: .."
+}
+```
+
+## Nodes
+
+### Partition nodes
+
+Fetch all Nodes associated with given Partition and displays general information about the nodes managed by YuniKorn. 
+Node details include host and rack name, capacity, resources, utilization, and allocations.
+
+**URL** : `/ws/v1/partition/{partitionName}/nodes`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+Here you can see an example response from a 2-node cluster having 3 allocations.
+
+```json
+[
+    {
+        "nodeID": "node-0001",
+        "hostName": "",
+        "rackName": "",
+        "capacity": {
+            "ephemeral-storage": 75850798569,
+            "hugepages-1Gi": 0,
+            "hugepages-2Mi": 0,
+            "memory": 14577000000,
+            "pods": 110,
+            "vcore": 10000
+        },
+        "allocated": {
+            "memory": 6000000000,
+            "vcore": 6000
+        },
+        "occupied": {
+            "memory": 154000000,
+            "vcore" :750
+        },
+        "available": {
+            "ephemeral-storage": 75850798569,
+            "hugepages-1Gi": 0,
+            "hugepages-2Mi": 0,
+            "memory": 6423000000,
+            "pods": 110,
+            "vcore": 1250
+        },
+        "utilized": {
+            "memory": 3,
+            "vcore": 13
+        },
+        "allocations": [
+            {
+                "allocationKey": "54e5d77b-f4c3-4607-8038-03c9499dd99d",
+                "allocationTags": {
+                    "kubernetes.io/label/app": "sleep",
+                    "kubernetes.io/label/applicationId": "application-0001",
+                    "kubernetes.io/label/queue": "root.default",
+                    "kubernetes.io/meta/namespace": "default",
+                    "kubernetes.io/meta/podName": "task0"
+                },
+                "requestTime": 1648754034098912461,
+                "allocationTime": 1648754035973982920,
+                "allocationDelay": 1875070459,
+                "uuid": "08033f9a-4699-403c-9204-6333856b41bd",
+                "resource": {
+                    "memory": 2000000000,
+                    "vcore": 2000
+                },
+                "priority": "0",
+                "nodeId": "node-0001",
+                "applicationId": "application-0001",
+                "partition": "default",
+                "placeholder": false,
+                "placeholderUsed": false
+            },
+            {
+                "allocationKey": "deb12221-6b56-4fe9-87db-ebfadce9aa20",
+                "allocationTags": {
+                    "kubernetes.io/label/app": "sleep",
+                    "kubernetes.io/label/applicationId": "application-0002",
+                    "kubernetes.io/label/queue": "root.default",
+                    "kubernetes.io/meta/namespace": "default",
+                    "kubernetes.io/meta/podName": "task0"
+                },
+                "requestTime": 1648754034098912461,
+                "allocationTime": 1648754035973982920,
+                "allocationDelay": 1875070459,
+                "uuid": "9af35d44-2d6f-40d1-b51d-758859e6b8a8",
+                "resource": {
+                    "memory": 4000000000,
+                    "vcore": 4000
+                },
+                "priority": "0",
+                "nodeId": "node-0001",
+                "applicationId": "application-0002",
+                "partition": "default",
+                "placeholder": false,
+                "placeholderUsed": false
+            }
+        ],
+        "schedulable": true
+    },
+    {
+        "nodeID": "node-0002",
+        "hostName": "",
+        "rackName": "",
+        "capacity": {
+            "ephemeral-storage": 75850798569,
+            "hugepages-1Gi": 0,
+            "hugepages-2Mi": 0,
+            "memory": 14577000000,
+            "pods": 110,
+            "vcore": 10000
+        },
+        "allocated": {
+            "memory": 2000000000,
+            "vcore": 2000
+        },
+        "occupied": {
+            "memory": 154000000,
+            "vcore" :750
+        },
+        "available": {
+            "ephemeral-storage": 75850798569,
+            "hugepages-1Gi": 0,
+            "hugepages-2Mi": 0,
+            "memory": 6423000000,
+            "pods": 110,
+            "vcore": 1250
+        },
+        "utilized": {
+            "memory": 8,
+            "vcore": 38
+        },
+        "allocations": [
+            {
+                "allocationKey": "af3bd2f3-31c5-42dd-8f3f-c2298ebdec81",
+                "allocationTags": {
+                    "kubernetes.io/label/app": "sleep",
+                    "kubernetes.io/label/applicationId": "application-0001",
+                    "kubernetes.io/label/queue": "root.default",
+                    "kubernetes.io/meta/namespace": "default",
+                    "kubernetes.io/meta/podName": "task1"
+                },
+                "requestTime": 1648754034098912461,
+                "allocationTime": 1648754035973982920,
+                "allocationDelay": 1875070459,
+                "uuid": "96beeb45-5ed2-4c19-9a83-2ac807637b3b",
+                "resource": {
+                    "memory": 2000000000,
+                    "vcore": 2000
+                },
+                "priority": "0",
+                "nodeId": "node-0002",
+                "applicationId": "application-0001",
+                "partition": "default",
+                "placeholder": false,
+                "placeholderUsed": false
+            }
+        ],
+        "schedulable": true
+    }
+]
+```
+
+### Error response
+
+**Code** : `500 Internal Server Error`
+
+**Content examples**
+
+```json
+{
+    "status_code": 500,
+    "message": "system error message. for example, json: invalid UTF-8 in string: ..",
+    "description": "system error message. for example, json: invalid UTF-8 in string: .."
+}
+```
+
+## Goroutines info
+
+Dumps the stack traces of the currently running goroutines.
+
+**URL** : `/ws/v1/stack`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```text
+goroutine 356 [running
+]:
+github.com/apache/yunikorn-core/pkg/webservice.getStackInfo.func1(0x30a0060,
+0xc003e900e0,
+0x2)
+	/yunikorn/go/pkg/mod/github.com/apache/yunikorn-core@v0.0.0-20200717041747-f3e1c760c714/pkg/webservice/handlers.go: 41 +0xab
+github.com/apache/yunikorn-core/pkg/webservice.getStackInfo(0x30a0060,
+0xc003e900e0,
+0xc00029ba00)
+	/yunikorn/go/pkg/mod/github.com/apache/yunikorn-core@v0.0.0-20200717041747-f3e1c760c714/pkg/webservice/handlers.go: 48 +0x71
+net/http.HandlerFunc.ServeHTTP(0x2df0e10,
+0x30a0060,
+0xc003e900e0,
+0xc00029ba00)
+	/usr/local/go/src/net/http/server.go: 1995 +0x52
+github.com/apache/yunikorn-core/pkg/webservice.Logger.func1(0x30a0060,
+0xc003e900e0,
+0xc00029ba00)
+	/yunikorn/go/pkg/mod/github.com/apache/yunikorn-core@v0.0.0-20200717041747-f3e1c760c714/pkg/webservice/webservice.go: 65 +0xd4
+net/http.HandlerFunc.ServeHTTP(0xc00003a570,
+0x30a0060,
+0xc003e900e0,
+0xc00029ba00)
+	/usr/local/go/src/net/http/server.go: 1995 +0x52
+github.com/gorilla/mux.(*Router).ServeHTTP(0xc00029cb40,
+0x30a0060,
+0xc003e900e0,
+0xc0063fee00)
+	/yunikorn/go/pkg/mod/github.com/gorilla/mux@v1.7.3/mux.go: 212 +0x140
+net/http.serverHandler.ServeHTTP(0xc0000df520,
+0x30a0060,
+0xc003e900e0,
+0xc0063fee00)
+	/usr/local/go/src/net/http/server.go: 2774 +0xcf
+net/http.(*conn).serve(0xc0000eab40,
+0x30a61a0,
+0xc003b74000)
+	/usr/local/go/src/net/http/server.go: 1878 +0x812
+created by net/http.(*Server).Serve
+	/usr/local/go/src/net/http/server.go: 2884 +0x4c5
+
+goroutine 1 [chan receive,
+	26 minutes
+]:
+main.main()
+	/yunikorn/pkg/shim/main.go: 52 +0x67a
+
+goroutine 19 [syscall,
+	26 minutes
+]:
+os/signal.signal_recv(0x1096f91)
+	/usr/local/go/src/runtime/sigqueue.go: 139 +0x9f
+os/signal.loop()
+	/usr/local/go/src/os/signal/signal_unix.go: 23 +0x30
+created by os/signal.init.0
+	/usr/local/go/src/os/signal/signal_unix.go: 29 +0x4f
+
+...
+```
+
+### Error response
+
+**Code** : `500 Internal Server Error`
+
+**Content examples**
+
+```json
+{
+    "status_code": 500,
+    "message": "system error message. for example, json: invalid UTF-8 in string: ..",
+    "description": "system error message. for example, json: invalid UTF-8 in string: .."
+}
+```
+
+## Metrics
+
+Endpoint to retrieve metrics from the Prometheus server. 
+The metrics are dumped with help messages and type information.
+
+**URL** : `/ws/v1/metrics`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```text
+# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
+# TYPE go_gc_duration_seconds summary
+go_gc_duration_seconds{quantile="0"} 2.567e-05
+go_gc_duration_seconds{quantile="0.25"} 3.5727e-05
+go_gc_duration_seconds{quantile="0.5"} 4.5144e-05
+go_gc_duration_seconds{quantile="0.75"} 6.0024e-05
+go_gc_duration_seconds{quantile="1"} 0.00022528
+go_gc_duration_seconds_sum 0.021561648
+go_gc_duration_seconds_count 436
+# HELP go_goroutines Number of goroutines that currently exist.
+# TYPE go_goroutines gauge
+go_goroutines 82
+# HELP go_info Information about the Go environment.
+# TYPE go_info gauge
+go_info{version="go1.12.17"} 1
+# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
+# TYPE go_memstats_alloc_bytes gauge
+go_memstats_alloc_bytes 9.6866248e+07
+
+...
+
+# HELP yunikorn_scheduler_vcore_nodes_usage Nodes resource usage, by resource name.
+# TYPE yunikorn_scheduler_vcore_nodes_usage gauge
+yunikorn_scheduler_vcore_nodes_usage{range="(10%, 20%]"} 0
+yunikorn_scheduler_vcore_nodes_usage{range="(20%,30%]"} 0
+yunikorn_scheduler_vcore_nodes_usage{range="(30%,40%]"} 0
+yunikorn_scheduler_vcore_nodes_usage{range="(40%,50%]"} 0
+yunikorn_scheduler_vcore_nodes_usage{range="(50%,60%]"} 0
+yunikorn_scheduler_vcore_nodes_usage{range="(60%,70%]"} 0
+yunikorn_scheduler_vcore_nodes_usage{range="(70%,80%]"} 1
+yunikorn_scheduler_vcore_nodes_usage{range="(80%,90%]"} 0
+yunikorn_scheduler_vcore_nodes_usage{range="(90%,100%]"} 0
+yunikorn_scheduler_vcore_nodes_usage{range="[0,10%]"} 0
+```
+
+## Configuration validation
+
+**URL** : `/ws/v1/validate-conf`
+
+**Method** : `POST`
+
+**Auth required** : NO
+
+### Success response
+
+Regardless whether the configuration is allowed or not if the server was able to process the request, it will yield a 200 HTTP status code.
+
+**Code** : `200 OK`
+
+#### Allowed configuration
+
+Sending the following simple configuration yields an accept
+
+```yaml
+partitions:
+  - name: default
+    queues:
+      - name: root
+        queues:
+          - name: test
+```
+
+Reponse
+
+```json
+{
+    "allowed": true,
+    "reason": ""
+}
+```
+
+#### Disallowed configuration
+
+The following configuration is not allowed due to the "wrong_text" field put into the yaml file.
+
+```yaml
+partitions:
+  - name: default
+    queues:
+      - name: root
+        queues:
+          - name: test
+  - wrong_text
+```
+
+Reponse
+
+```json
+{
+    "allowed": false,
+    "reason": "yaml: unmarshal errors:\n  line 7: cannot unmarshal !!str `wrong_text` into configs.PartitionConfig"
+}
+```
+
+## Configuration Create
+
+Endpoint to create scheduler configuration, but currently limited for configuration validation purpose alone
+
+**URL** : `/ws/v1/config`
+
+**Method** : `POST`
+
+**Query Params** : 
+
+1. dry_run
+
+Mandatory Parameter. Only dry_run=1 is allowed and can be used for configuration validation only, not for the actual config creation.
+
+**Auth required** : NO
+
+### Success response
+
+Regardless whether the configuration is allowed or not if the server was able to process the request, it will yield a 200 HTTP status code.
+
+**Code** : `200 OK`
+
+#### Allowed configuration
+
+Sending the following simple configuration yields an accept
+
+```yaml
+partitions:
+  - name: default
+    queues:
+      - name: root
+        queues:
+          - name: test
+```
+
+Response
+
+```json
+{
+    "allowed": true,
+    "reason": ""
+}
+```
+
+#### Disallowed configuration
+
+The following configuration is not allowed due to the "wrong_text" field put into the yaml file.
+
+```yaml
+partitions:
+  - name: default
+    queues:
+      - name: root
+        queues:
+          - name: test
+  - wrong_text
+```
+
+Response
+
+```json
+{
+    "allowed": false,
+    "reason": "yaml: unmarshal errors:\n  line 7: cannot unmarshal !!str `wrong_text` into configs.PartitionConfig"
+}
+```
+
+### Error response
+
+**Code** : `400 Bad Request`
+
+**Content examples**
+
+```json
+{
+    "status_code": 400,
+    "message": "Dry run param is missing. Please check the usage documentation",
+    "description": "Dry run param is missing. Please check the usage documentation"
+}
+```
+
+**Code** : `500 Internal Server Error`
+
+**Content examples**
+
+```json
+{
+    "status_code": 500,
+    "message": "system error message. for example, json: invalid UTF-8 in string: ..",
+    "description": "system error message. for example, json: invalid UTF-8 in string: .."
+}
+```
+
+## Configuration
+
+Endpoint to retrieve the current scheduler configuration
+
+**URL** : `/ws/v1/config`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content example**
+
+```yaml
+partitions:
+- name: default
+  queues:
+  - name: root
+    parent: true
+    submitacl: '*'
+  placementrules:
+  - name: tag
+    create: true
+    value: namespace
+checksum: D75996C07D5167F41B33E27CCFAEF1D5C55BE3C00EE6526A7ABDF8435DB4078E
+```
+
+## Configuration update
+
+Endpoint to override scheduler configuration. 
+
+**URL** : `/ws/v1/config`
+
+**Method** : `PUT`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content example**
+
+```yaml
+partitions:
+  -
+    name: default
+    placementrules:
+      - name: tag
+        value: namespace
+        create: true
+    queues:
+      - name: root
+        submitacl: '*'
+        properties:
+          application.sort.policy: stateaware
+checksum: BAB3D76402827EABE62FA7E4C6BCF4D8DD9552834561B6B660EF37FED9299791
+```
+**Note:** Updates must use a current running configuration as the base. 
+The base configuration is the configuration version that was retrieved earlier via a GET request and updated by the user.
+The update request must contain the checksum of the _base_ configuration. 
+If the checksum provided in the update request differs from the currently running configuration checksum the update will be rejected.
+
+### Failure response
+
+The configuration update can fail due to different reasons such as:
+- invalid configuration,
+- incorrect base checksum.
+
+In each case the transaction will be rejected, and the proper
+error message will be returned as a response.
+
+**Code** : `409 Conflict`
+
+**Message example** :  root queue must not have resource limits set
+
+**Content example**
+
+```yaml
+partitions:
+  -
+    name: default
+    placementrules:
+      - name: tag
+        value: namespace
+        create: true
+    queues:
+      - name: root
+        submitacl: '*'
+        resources:
+          guaranteed:
+            memory: "512M"
+            vcore: "1"
+        properties:
+          application.sort.policy: stateaware
+checksum: BAB3D76402827EABE62FA7E4C6BCF4D8DD9552834561B6B660EF37FED9299791
+```
+
+### Error response
+
+**Code** : `500 Internal Server Error`
+
+**Content examples**
+
+```json
+{
+    "status_code": 500,
+    "message": "system error message. for example, json: invalid UTF-8 in string: ..",
+    "description": "system error message. for example, json: invalid UTF-8 in string: .."
+}
+```
+
+## Application history
+
+Endpoint to retrieve historical data about the number of total applications by timestamp.
+
+**URL** : `/ws/v1/history/apps`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```json
+[
+    {
+        "timestamp": 1595939966153460000,
+        "totalApplications": "1"
+    },
+    {
+        "timestamp": 1595940026152892000,
+        "totalApplications": "1"
+    },
+    {
+        "timestamp": 1595940086153799000,
+        "totalApplications": "2"
+    },
+    {
+        "timestamp": 1595940146154497000,
+        "totalApplications": "2"
+    },
+    {
+        "timestamp": 1595940206155187000,
+        "totalApplications": "2"
+    }
+]
+```
+
+### Error response
+
+**Code** : `500 Internal Server Error`
+
+**Content examples**
+
+```json
+{
+    "status_code": 500,
+    "message": "system error message. for example, json: invalid UTF-8 in string: ..",
+    "description": "system error message. for example, json: invalid UTF-8 in string: .."
+}
+```
+
+## Container history
+
+Endpoint to retrieve historical data about the number of total containers by timestamp.
+
+**URL** : `/ws/v1/history/containers`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```json
+[
+    {
+        "timestamp": 1595939966153460000,
+        "totalContainers": "1"
+    },
+    {
+        "timestamp": 1595940026152892000,
+        "totalContainers": "1"
+    },
+    {
+        "timestamp": 1595940086153799000,
+        "totalContainers": "3"
+    },
+    {
+        "timestamp": 1595940146154497000,
+        "totalContainers": "3"
+    },
+    {
+        "timestamp": 1595940206155187000,
+        "totalContainers": "3"
+    }
+]
+```
+
+### Error response
+
+**Code** : `500 Internal Server Error`
+
+**Content examples**
+
+```json
+{
+    "status_code": 500,
+    "message": "system error message. for example, json: invalid UTF-8 in string: ..",
+    "description": "system error message. for example, json: invalid UTF-8 in string: .."
+}
+```
+
+
+## Endpoint healthcheck
+
+Endpoint to retrieve historical data about critical logs, negative resource on node/cluster/app, ...
+
+**URL** : `/ws/v1/scheduler/healthcheck`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```json
+{
+    "Healthy": true,
+    "HealthChecks": [
+        {
+            "Name": "Scheduling errors",
+            "Succeeded": true,
+            "Description": "Check for scheduling error entries in metrics",
+            "DiagnosisMessage": "There were 0 scheduling errors logged in the metrics"
+        },
+        {
+            "Name": "Failed nodes",
+            "Succeeded": true,
+            "Description": "Check for failed nodes entries in metrics",
+            "DiagnosisMessage": "There were 0 failed nodes logged in the metrics"
+        },
+        {
+            "Name": "Negative resources",
+            "Succeeded": true,
+            "Description": "Check for negative resources in the partitions",
+            "DiagnosisMessage": "Partitions with negative resources: []"
+        },
+        {
+            "Name": "Negative resources",
+            "Succeeded": true,
+            "Description": "Check for negative resources in the nodes",
+            "DiagnosisMessage": "Nodes with negative resources: []"
+        },
+        {
+            "Name": "Consistency of data",
+            "Succeeded": true,
+            "Description": "Check if a node's allocated resource <= total resource of the node",
+            "DiagnosisMessage": "Nodes with inconsistent data: []"
+        },
+        {
+            "Name": "Consistency of data",
+            "Succeeded": true,
+            "Description": "Check if total partition resource == sum of the node resources from the partition",
+            "DiagnosisMessage": "Partitions with inconsistent data: []"
+        },
+        {
+            "Name": "Consistency of data",
+            "Succeeded": true,
+            "Description": "Check if node total resource = allocated resource + occupied resource + available resource",
+            "DiagnosisMessage": "Nodes with inconsistent data: []"
+        },
+        {
+            "Name": "Consistency of data",
+            "Succeeded": true,
+            "Description": "Check if node capacity >= allocated resources on the node",
+            "DiagnosisMessage": "Nodes with inconsistent data: []"
+        },
+        {
+            "Name": "Reservation check",
+            "Succeeded": true,
+            "Description": "Check the reservation nr compared to the number of nodes",
+            "DiagnosisMessage": "Reservation/node nr ratio: [0.000000]"
+        }
+    ]
+}
+```
+
+## Retrieve full state dump
+
+Endpoint to retrieve the following information in a single response:
+
+* List of partitions
+* List of applications (running and completed)
+* Application history
+* Nodes
+* Utilization of nodes
+* Generic cluster information
+* Cluster utilization
+* Container history
+* Queues
+
+**URL** : `/ws/v1/fullstatedump`
+
+**Method** : `GET`
+
+**Auth required** : NO
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+The output of this REST query can be rather large and it is a combination of those which have already been demonstrated.
+
+### Failure response
+
+**Code**: `500 Internal Server Error`
+
+## Enable or disable periodic state dump
+
+Endpoint to enable a state dump to be written periodically. By default, it is 60 seconds. The output goes to a file called `yunikorn-state.txt`. In the current version, the file is located in the current working directory of Yunikorn and it is not configurable.
+
+Trying to enable or disable this feature more than once in a row results in an error.
+
+**URL** : `/ws/v1/periodicstatedump/{switch}/{periodSeconds}`
+
+**Method** : `PUT`
+
+**Auth required** : NO
+
+The value `{switch}` can be either `disable` or `enable`. The `{periodSeconds}` defines how often state snapshots should be taken. It is expected to be a positive integer and only interpreted in case of `enable`.
+
+### Success response
+
+**Code** : `200 OK`
+
+### Error response
+
+**Code**: `400 Bad Request`
+
+**Content examples**
+
+```json
+{
+    "status_code": 400,
+    "message": "required parameter enabled/disabled is missing",
+    "description": "required parameter enabled/disabled is missing"
+}
+```
diff --git a/versioned_docs/version-1.1.0/api/system.md b/versioned_docs/version-1.1.0/api/system.md
new file mode 100644
index 000000000..1d685ffa5
--- /dev/null
+++ b/versioned_docs/version-1.1.0/api/system.md
@@ -0,0 +1,225 @@
+---
+id: system
+title: System
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+These endpoints are for the [pprof](https://github.com/google/pprof) profiling tool.
+
+## pprof
+
+**URL** : `/debug/pprof/`
+
+**Method** : `GET`
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```text
+/debug/pprof/
+
+Types of profiles available:
+Count	Profile
+273	allocs
+0	block
+0	cmdline
+78	goroutine
+273	heap
+0	mutex
+0	profile
+29	threadcreate
+0	trace
+full goroutine stack dump
+Profile Descriptions:
+
+allocs: A sampling of all past memory allocations
+block: Stack traces that led to blocking on synchronization primitives
+cmdline: The command line invocation of the current program
+goroutine: Stack traces of all current goroutines
+heap: A sampling of memory allocations of live objects. You can specify the gc GET parameter to run GC before taking the heap sample.
+mutex: Stack traces of holders of contended mutexes
+profile: CPU profile. You can specify the duration in the seconds GET parameter. After you get the profile file, use the go tool pprof command to investigate the profile.
+threadcreate: Stack traces that led to the creation of new OS threads
+trace: A trace of execution of the current program. You can specify the duration in the seconds GET parameter. After you get the trace file, use the go tool trace command to investigate the trace.
+```
+
+## Heap
+
+**URL** : `/debug/pprof/heap`
+
+**Method** : `GET`
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```proto
+// binary data from proto
+```
+
+## Thread create
+
+**URL** : `/debug/pprof/threadcreate`
+
+**Method** : `GET`
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```proto
+// binary data from proto
+```
+
+## Goroutine
+
+**URL** : `/debug/pprof/goroutine`
+
+**Method** : `GET`
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```proto
+// binary data from proto
+```
+
+## Allocations
+
+**URL** : `/debug/pprof/allocs`
+
+**Method** : `GET`
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```proto
+// binary data from proto
+```
+
+## Block
+
+**URL** : `/debug/pprof/block`
+
+**Method** : `GET`
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```proto
+// binary data from proto
+```
+
+## Mutex
+
+**URL** : `/debug/pprof/mutex`
+
+**Method** : `GET`
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```proto
+// binary data from proto
+```
+
+## Cmdline
+
+**URL** : `/debug/pprof/cmdline`
+
+**Method** : `GET`
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```proto
+// binary data from proto
+```
+
+## Profile
+
+**URL** : `/debug/pprof/profile`
+
+**Method** : `GET`
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```proto
+// binary data from proto
+```
+
+## Symbol
+
+**URL** : `/debug/pprof/symbol`
+
+**Method** : `GET`
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```proto
+// binary data from proto
+```
+
+## Trace		
+
+**URL** : `/debug/pprof/trace`
+
+**Method** : `GET`
+
+### Success response
+
+**Code** : `200 OK`
+
+**Content examples**
+
+```proto
+// binary data from proto
+```
diff --git a/versioned_docs/version-1.1.0/assets/allocation_4k.png b/versioned_docs/version-1.1.0/assets/allocation_4k.png
new file mode 100644
index 000000000..03346f52d
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/allocation_4k.png differ
diff --git a/versioned_docs/version-1.1.0/assets/application-state.png b/versioned_docs/version-1.1.0/assets/application-state.png
new file mode 100644
index 000000000..6c7f27e09
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/application-state.png differ
diff --git a/versioned_docs/version-1.1.0/assets/architecture.png b/versioned_docs/version-1.1.0/assets/architecture.png
new file mode 100644
index 000000000..a19dcaa42
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/architecture.png differ
diff --git a/versioned_docs/version-1.1.0/assets/cpu_profile.jpg b/versioned_docs/version-1.1.0/assets/cpu_profile.jpg
new file mode 100644
index 000000000..7e99f6230
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/cpu_profile.jpg differ
diff --git a/versioned_docs/version-1.1.0/assets/dashboard_secret.png b/versioned_docs/version-1.1.0/assets/dashboard_secret.png
new file mode 100644
index 000000000..60b4f976c
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/dashboard_secret.png differ
diff --git a/versioned_docs/version-1.1.0/assets/dashboard_token_select.png b/versioned_docs/version-1.1.0/assets/dashboard_token_select.png
new file mode 100644
index 000000000..59173fdab
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/dashboard_token_select.png differ
diff --git a/versioned_docs/version-1.1.0/assets/docker-dektop-minikube.png b/versioned_docs/version-1.1.0/assets/docker-dektop-minikube.png
new file mode 100644
index 000000000..48b3584f2
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/docker-dektop-minikube.png differ
diff --git a/versioned_docs/version-1.1.0/assets/docker-desktop.png b/versioned_docs/version-1.1.0/assets/docker-desktop.png
new file mode 100644
index 000000000..922436039
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/docker-desktop.png differ
diff --git a/versioned_docs/version-1.1.0/assets/fifo-state-example.png b/versioned_docs/version-1.1.0/assets/fifo-state-example.png
new file mode 100644
index 000000000..ca04c17d8
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/fifo-state-example.png differ
diff --git a/versioned_docs/version-1.1.0/assets/gang_clean_up.png b/versioned_docs/version-1.1.0/assets/gang_clean_up.png
new file mode 100644
index 000000000..baf5accb4
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/gang_clean_up.png differ
diff --git a/versioned_docs/version-1.1.0/assets/gang_generic_flow.png b/versioned_docs/version-1.1.0/assets/gang_generic_flow.png
new file mode 100644
index 000000000..381b7d00c
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/gang_generic_flow.png differ
diff --git a/versioned_docs/version-1.1.0/assets/gang_scheduling_iintro.png b/versioned_docs/version-1.1.0/assets/gang_scheduling_iintro.png
new file mode 100644
index 000000000..b3be207ea
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/gang_scheduling_iintro.png differ
diff --git a/versioned_docs/version-1.1.0/assets/gang_timeout.png b/versioned_docs/version-1.1.0/assets/gang_timeout.png
new file mode 100644
index 000000000..a8ea0daf1
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/gang_timeout.png differ
diff --git a/versioned_docs/version-1.1.0/assets/gang_total_ask.png b/versioned_docs/version-1.1.0/assets/gang_total_ask.png
new file mode 100644
index 000000000..928d7fdf5
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/gang_total_ask.png differ
diff --git a/versioned_docs/version-1.1.0/assets/goland_debug.jpg b/versioned_docs/version-1.1.0/assets/goland_debug.jpg
new file mode 100644
index 000000000..c9ab94ca0
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/goland_debug.jpg differ
diff --git a/versioned_docs/version-1.1.0/assets/k8shim-application-state.png b/versioned_docs/version-1.1.0/assets/k8shim-application-state.png
new file mode 100644
index 000000000..c28659196
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/k8shim-application-state.png differ
diff --git a/versioned_docs/version-1.1.0/assets/k8shim-node-state.png b/versioned_docs/version-1.1.0/assets/k8shim-node-state.png
new file mode 100644
index 000000000..5d5db56a8
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/k8shim-node-state.png differ
diff --git a/versioned_docs/version-1.1.0/assets/k8shim-scheduler-state.png b/versioned_docs/version-1.1.0/assets/k8shim-scheduler-state.png
new file mode 100644
index 000000000..2eb6e298e
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/k8shim-scheduler-state.png differ
diff --git a/versioned_docs/version-1.1.0/assets/k8shim-task-state.png b/versioned_docs/version-1.1.0/assets/k8shim-task-state.png
new file mode 100644
index 000000000..11c95d131
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/k8shim-task-state.png differ
diff --git a/versioned_docs/version-1.1.0/assets/namespace-mapping.png b/versioned_docs/version-1.1.0/assets/namespace-mapping.png
new file mode 100644
index 000000000..9ad07da8f
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/namespace-mapping.png differ
diff --git a/versioned_docs/version-1.1.0/assets/node-bin-packing.png b/versioned_docs/version-1.1.0/assets/node-bin-packing.png
new file mode 100644
index 000000000..9267a0359
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/node-bin-packing.png differ
diff --git a/versioned_docs/version-1.1.0/assets/node-fair.png b/versioned_docs/version-1.1.0/assets/node-fair.png
new file mode 100644
index 000000000..2e404abbc
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/node-fair.png differ
diff --git a/versioned_docs/version-1.1.0/assets/node_fairness_conf.png b/versioned_docs/version-1.1.0/assets/node_fairness_conf.png
new file mode 100644
index 000000000..80e1dd576
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/node_fairness_conf.png differ
diff --git a/versioned_docs/version-1.1.0/assets/object-state.png b/versioned_docs/version-1.1.0/assets/object-state.png
new file mode 100644
index 000000000..9baca07ce
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/object-state.png differ
diff --git a/versioned_docs/version-1.1.0/assets/perf-tutorial-build.png b/versioned_docs/version-1.1.0/assets/perf-tutorial-build.png
new file mode 100644
index 000000000..5c2f28bd0
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/perf-tutorial-build.png differ
diff --git a/versioned_docs/version-1.1.0/assets/perf-tutorial-resultDiagrams.png b/versioned_docs/version-1.1.0/assets/perf-tutorial-resultDiagrams.png
new file mode 100644
index 000000000..6d9686d48
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/perf-tutorial-resultDiagrams.png differ
diff --git a/versioned_docs/version-1.1.0/assets/perf-tutorial-resultLog.png b/versioned_docs/version-1.1.0/assets/perf-tutorial-resultLog.png
new file mode 100755
index 000000000..dad58bf14
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/perf-tutorial-resultLog.png differ
diff --git a/versioned_docs/version-1.1.0/assets/perf_e2e_test.png b/versioned_docs/version-1.1.0/assets/perf_e2e_test.png
new file mode 100644
index 000000000..79763d03e
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/perf_e2e_test.png differ
diff --git a/versioned_docs/version-1.1.0/assets/perf_e2e_test_conf.png b/versioned_docs/version-1.1.0/assets/perf_e2e_test_conf.png
new file mode 100644
index 000000000..5aeef943c
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/perf_e2e_test_conf.png differ
diff --git a/versioned_docs/version-1.1.0/assets/perf_node_fairness.png b/versioned_docs/version-1.1.0/assets/perf_node_fairness.png
new file mode 100644
index 000000000..fbacc13be
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/perf_node_fairness.png differ
diff --git a/versioned_docs/version-1.1.0/assets/perf_throughput.png b/versioned_docs/version-1.1.0/assets/perf_throughput.png
new file mode 100644
index 000000000..e89a7f388
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/perf_throughput.png differ
diff --git a/versioned_docs/version-1.1.0/assets/pluggable-app-mgmt.jpg b/versioned_docs/version-1.1.0/assets/pluggable-app-mgmt.jpg
new file mode 100644
index 000000000..443b8ad33
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/pluggable-app-mgmt.jpg differ
diff --git a/versioned_docs/version-1.1.0/assets/predicateComaparation.png b/versioned_docs/version-1.1.0/assets/predicateComaparation.png
new file mode 100755
index 000000000..d3498c8bd
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/predicateComaparation.png differ
diff --git a/versioned_docs/version-1.1.0/assets/predicate_4k.png b/versioned_docs/version-1.1.0/assets/predicate_4k.png
new file mode 100644
index 000000000..850036c26
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/predicate_4k.png differ
diff --git a/versioned_docs/version-1.1.0/assets/prometheus.png b/versioned_docs/version-1.1.0/assets/prometheus.png
new file mode 100644
index 000000000..964c79ca0
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/prometheus.png differ
diff --git a/versioned_docs/version-1.1.0/assets/queue-fairness.png b/versioned_docs/version-1.1.0/assets/queue-fairness.png
new file mode 100644
index 000000000..7c78ed70b
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/queue-fairness.png differ
diff --git a/versioned_docs/version-1.1.0/assets/queue-resource-quotas.png b/versioned_docs/version-1.1.0/assets/queue-resource-quotas.png
new file mode 100644
index 000000000..fb7213894
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/queue-resource-quotas.png differ
diff --git a/versioned_docs/version-1.1.0/assets/resilience-node-recovery.jpg b/versioned_docs/version-1.1.0/assets/resilience-node-recovery.jpg
new file mode 100644
index 000000000..384745164
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/resilience-node-recovery.jpg differ
diff --git a/versioned_docs/version-1.1.0/assets/resilience-workflow.jpg b/versioned_docs/version-1.1.0/assets/resilience-workflow.jpg
new file mode 100644
index 000000000..40ab6baf2
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/resilience-workflow.jpg differ
diff --git a/versioned_docs/version-1.1.0/assets/scheduling_no_predicate_4k.png b/versioned_docs/version-1.1.0/assets/scheduling_no_predicate_4k.png
new file mode 100644
index 000000000..0ebe41c81
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/scheduling_no_predicate_4k.png differ
diff --git a/versioned_docs/version-1.1.0/assets/scheduling_with_predicate_4k_.png b/versioned_docs/version-1.1.0/assets/scheduling_with_predicate_4k_.png
new file mode 100644
index 000000000..2cee7c014
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/scheduling_with_predicate_4k_.png differ
diff --git a/versioned_docs/version-1.1.0/assets/simple_preemptor.png b/versioned_docs/version-1.1.0/assets/simple_preemptor.png
new file mode 100644
index 000000000..c5165c341
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/simple_preemptor.png differ
diff --git a/versioned_docs/version-1.1.0/assets/spark-jobs-on-ui.png b/versioned_docs/version-1.1.0/assets/spark-jobs-on-ui.png
new file mode 100644
index 000000000..dabeb3086
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/spark-jobs-on-ui.png differ
diff --git a/versioned_docs/version-1.1.0/assets/spark-pods.png b/versioned_docs/version-1.1.0/assets/spark-pods.png
new file mode 100644
index 000000000..e1f72e0d6
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/spark-pods.png differ
diff --git a/versioned_docs/version-1.1.0/assets/tf-job-on-ui.png b/versioned_docs/version-1.1.0/assets/tf-job-on-ui.png
new file mode 100644
index 000000000..06acabec2
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/tf-job-on-ui.png differ
diff --git a/versioned_docs/version-1.1.0/assets/throughput.png b/versioned_docs/version-1.1.0/assets/throughput.png
new file mode 100644
index 000000000..8ced22c81
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/throughput.png differ
diff --git a/versioned_docs/version-1.1.0/assets/throughput_3types.png b/versioned_docs/version-1.1.0/assets/throughput_3types.png
new file mode 100644
index 000000000..a4a583b51
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/throughput_3types.png differ
diff --git a/versioned_docs/version-1.1.0/assets/throughput_conf.png b/versioned_docs/version-1.1.0/assets/throughput_conf.png
new file mode 100644
index 000000000..dd5d72063
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/throughput_conf.png differ
diff --git a/versioned_docs/version-1.1.0/assets/yk-ui-screenshots.gif b/versioned_docs/version-1.1.0/assets/yk-ui-screenshots.gif
new file mode 100644
index 000000000..77dec5635
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/yk-ui-screenshots.gif differ
diff --git a/versioned_docs/version-1.1.0/assets/yunirkonVSdefault.png b/versioned_docs/version-1.1.0/assets/yunirkonVSdefault.png
new file mode 100755
index 000000000..a123ff8db
Binary files /dev/null and b/versioned_docs/version-1.1.0/assets/yunirkonVSdefault.png differ
diff --git a/versioned_docs/version-1.1.0/design/architecture.md b/versioned_docs/version-1.1.0/design/architecture.md
new file mode 100644
index 000000000..40db1bc00
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/architecture.md
@@ -0,0 +1,62 @@
+---
+id: architecture
+title: Architecture
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Apache YuniKorn is a light-weight, universal resource scheduler for container orchestrator systems.
+It is created to achieve fine-grained resource sharing for various workloads efficiently on a large scale, multi-tenant,
+and cloud-native environment. YuniKorn brings a unified, cross-platform, scheduling experience for mixed workloads that
+consist of stateless batch workloads and stateful services.
+
+YuniKorn now supports K8s and can be deployed as a custom K8s scheduler. YuniKorn's architecture design also allows
+adding different shim layer and adopt to different ResourceManager implementation including Apache Hadoop YARN,
+or any other systems.
+
+## Architecture
+
+Following chart illustrates the high-level architecture of YuniKorn.
+
+<img src={require('./../assets/architecture.png').default} />
+
+## Components
+
+### Scheduler interface
+
+[Scheduler interface](https://github.com/apache/yunikorn-scheduler-interface) is an abstract layer
+which resource management platform (like YARN/K8s) will speak with, via API like GRPC/programing language bindings.
+
+### Scheduler core
+
+Scheduler core encapsulates all scheduling algorithms, it collects resources from underneath resource management
+platforms (like YARN/K8s), and is responsible for container allocation requests. It makes the decision where is the
+best spot for each request and then sends response allocations to the resource management platform.
+Scheduler core is agnostic about underneath platforms, all the communications are through the [scheduler interface](https://github.com/apache/yunikorn-scheduler-interface).
+Please read more about the design of schedule core [here](scheduler_core_design.md).
+
+### Kubernetes shim
+
+The YuniKorn Kubernetes shim is responsible for talking to Kubernetes, it is responsible for translating the Kubernetes
+cluster resources, and resource requests via scheduler interface and send them to the scheduler core.
+And when a scheduler decision is made, it is responsible for binding the pod to the specific node. All the communication
+between the shim and the scheduler core is through the [scheduler interface](https://github.com/apache/yunikorn-scheduler-interface).
+Please read more about the design of the Kubernetes shim [here](k8shim.md).
+
diff --git a/versioned_docs/version-1.1.0/design/cache_removal.md b/versioned_docs/version-1.1.0/design/cache_removal.md
new file mode 100644
index 000000000..f78ba07c4
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/cache_removal.md
@@ -0,0 +1,451 @@
+---
+id: cache_removal
+title: Scheduler cache removal design
+---
+
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+
+:::caution
+The Interface Message definitions described in this design doc has undergone major refactoring to reduce the complexity. [YUNIKORN-337](https://issues.apache.org/jira/browse/YUNIKORN-337) was committed and simplified the message communication between Core and Shim to greater extent. 
+See [Simplifying Interface Messages and Breaking Shim build dependency on Core](interface_message_simplification.md) to know the updated message definitions.
+:::
+
+# Proposal to combine Cache and Scheduler's implementation in the core
+This document describes the current state of the scheduler and cache implementation.
+It describes the changes planned based on the analysis that was done of the current behaviour.
+
+## Goals
+The goal is to provide the same functionality before and after the change.
+- unit tests before and after the merge must all pass.
+- Smoke tests defined in the core should all pass without major changes <sup id="s1">[definition](#f1)</sup>.
+- End-to-end tests that are part of the shim code must all pass without changes.
+
+## Background 
+The current Scheduler Core is build up around two major components to store the data: the cache and scheduler objects.
+The cache objects form the base for most data to be tracked. 
+The Scheduler objects track specific in flight details and are build on top of a cache object.
+ 
+The communication between the two layers uses a-synchronous events and in some cases direct updates.
+A synchronous update between the scheduler and the cache does mean that there is a short period the scheduler is "out of sync" with the cache.
+This short period can have an impact on the scheduling decisions. 
+One of which is logged as [YUNIKORN-169](https://issues.apache.org/jira/browse/YUNIKORN-169).
+
+A further point is the complexity that the two structure brings to the code.
+A distinct set of messages to communicate between the scheduler and the cache.
+A one on one mapping between the scheduler and cache objects shows that the distinction is probably more artificial than required.
+---
+<b id="f1"></b>definition: Major changes for smoke tests are defined as changes to the tests that alter use case and thus test flows. Some changes will be needed as checks made could rely on cache objects which have been removed. [↩](#s1)
+## Structure analysis
+### Objects
+The existing objects as per the code analysis.
+The overlap between the scheduler and the cache objects is shown by showing them at the same line.
+N/A means that there is no equivalent object in either the scheduler or cache.
+
+| Cache Object                   | Scheduler Object               |
+| ------------------------------ | ------------------------------ |
+| ClusterInfo                    | ClusterSchedulingContext       |
+| PartitionInfo                  | partitionSchedulingContext     |
+| AllocationInfo                 | schedulingAllocation           |
+| N/A                            | schedulingAllocationAsk        |
+| N/A                            | reservation                    |
+| ApplicationInfo                | SchedulingApplication          |
+| applicationState               | N/A                            |
+| NodeInfo                       | SchedulingNode                 |
+| QueueInfo                      | SchedulingQueue                |
+| SchedulingObjectState          | N/A                            |
+
+The `initializer` code that is part of the cache does not define a specific object.
+It contains a mixture of code defined at the package level and code that is part of the `ClusterInfo` object.
+
+### Events
+Events defined in the core have multiple origins and destinations.
+Some events are only internal for the core between the cache and scheduler.
+These events will be removed.
+
+| Event                                     | Flow                  | Proposal |
+| ----------------------------------------- | --------------------- | -------- |
+| AllocationProposalBundleEvent             | Scheduler -> Cache    | Remove   |
+| RejectedNewApplicationEvent               | Scheduler -> Cache    | Remove   |
+| ReleaseAllocationsEvent                   | Scheduler -> Cache    | Remove   |
+| RemoveRMPartitionsEvent                   | Scheduler -> Cache    | Remove   |
+| RemovedApplicationEvent                   | Scheduler -> Cache    | Remove   |
+| SchedulerNodeEvent                        | Cache -> Scheduler    | Remove   |
+| SchedulerAllocationUpdatesEvent           | Cache -> Scheduler    | Remove   |
+| SchedulerApplicationsUpdateEvent          | Cache -> Scheduler    | Remove   |
+| SchedulerUpdatePartitionsConfigEvent      | Cache -> Scheduler    | Remove   |
+| SchedulerDeletePartitionsConfigEvent      | Cache -> Scheduler    | Remove   |
+| RMApplicationUpdateEvent (add/remove app) | Cache/Scheduler -> RM | Modify   |
+| RMRejectedAllocationAskEvent              | Cache/Scheduler -> RM | Modify   |
+| RemoveRMPartitionsEvent                   | RM -> Scheduler       |          |
+| RMUpdateRequestEvent                      | RM -> Cache           | Modify   |
+| RegisterRMEvent                           | RM -> Cache           | Modify   |
+| ConfigUpdateRMEvent                       | RM -> Cache           | Modify   |
+| RMNewAllocationsEvent                     | Cache -> RM           | Modify   |
+| RMReleaseAllocationEvent                  | Cache -> RM           | Modify   |
+| RMNodeUpdateEvent                         | Cache -> RM           | Modify   |
+|                                           |                       |          |
+
+Events that are handled by the cache will need to be handled by the core code after the removal of the cache.
+Two events are handled by the cache and the scheduler.
+
+## Detailed flow analysis
+### Object existing in both cache and scheduler
+The current design is based on the fact that the cache object is the basis for all data storage.
+Each cache object must have a corresponding scheduler object.
+The contract in the core around the cache and scheduler objects was simple.
+If the object exists in both scheduler and cache the object will be added to cache triggering the creation of the corresponding scheduler object.
+Removing the object is always handled in reverse: first from the scheduler which will trigger the removal from the cache.
+An example would be the creation of an application triggered by the `RMUpdateRequestEvent` would be processed by the cache.
+Creating a `SchedulerApplicationsUpdateEvent` to create the corresponding application in the scheduler.
+
+When the application and object state were added they were added into the cache objects.
+The cache objects were considered the data store and thus also contain the state.
+There were no corresponding state objects in the scheduler.
+Maintaining two states for the same object is not possible. 
+
+The other exceptions to that rule are two objects that were considered volatile and scheduler only.
+The `schedulingAllocationAsk` tracks outstanding requests for an application in the scheduler.
+The `reservation` tracks a temporary reservation of a node for an application and ask combination. 
+
+### Operations to add/remove app
+The RM (shim) sends a complex `UpdateRequest` as defined in the scheduler interface.
+This message is wrapped by the RM proxy and forwarded to the cache for processing.
+The RM can request an application to be added or removed.
+
+**application add or delete**
+```
+1. RMProxy sends cacheevent.RMUpdateRequestEvent to cache
+2. cluster_info.processApplicationUpdateFromRMUpdate
+   2.1: Add new apps to the partition.
+   2.2: Send removed apps to scheduler (but not remove anything from cache)
+3. scheduler.processApplicationUpdateEvent
+   3.1: Add new apps to scheduler 
+        (when fails, send RejectedNewApplicationEvent to cache)
+        No matter if failed or not, send RMApplicationUpdateEvent to RM.
+   3.2: Remove app from scheduler
+        Send RemovedApplicationEvent to cache
+```
+
+### Operations to remove allocations and add or remove asks
+The RM (shim) sends a complex `UpdateRequest` as defined in the scheduler interface.
+This message is wrapped by the RM proxy and forwarded to the cache for processing.
+The RM can request an allocation to be removed.
+The RM can request an ask to be added or removed
+
+**allocation delete**
+This describes the allocation delete initiated by the RM only
+````
+1. RMProxy sends cacheevent.RMUpdateRequestEvent to cache
+2. cluster_info.processNewAndReleaseAllocationRequests
+   2.1: (by-pass): Send to scheduler via event SchedulerAllocationUpdatesEvent
+3. scheduler.processAllocationUpdateEvent 
+   3.1: Update ReconcilePlugin
+   3.2: Send confirmation of the releases back to Cache via event ReleaseAllocationsEvent
+4. cluster_info.processAllocationReleases to process the confirmed release
+````
+
+**ask add**
+If the ask already exists this add is automatically converted into an update.
+```
+1. RMProxy sends cacheevent.RMUpdateRequestEvent to cache
+2. cluster_info.processNewAndReleaseAllocationRequests
+   2.1: Ask sanity check (such as existence of partition/app), rejections are send back to the RM via RMRejectedAllocationAskEvent
+   2.2: pass checked asks to scheduler via SchedulerAllocationUpdatesEvent
+3. scheduler.processAllocationUpdateEvent
+   3.1: Update scheduling application with the new or updated ask. 
+   3.2: rejections are send back to the RM via RMRejectedAllocationAskEvent 
+   3.3: accepted asks are not confirmed to RM or cache
+```
+
+**ask delete**
+```
+1. RMProxy sends cacheevent.RMUpdateRequestEvent to cache
+2. cluster_info.processNewAndReleaseAllocationRequests
+   2.1: (by-pass): Send to scheduler via event SchedulerAllocationUpdatesEvent
+3. scheduler.processAllocationReleaseByAllocationKey
+   3.1: Update scheduling application and remove the ask. 
+```
+
+### Operations to add, update or remove nodes
+The RM (shim) sends a complex `UpdateRequest` as defined in the scheduler interface.
+This message is wrapped by the RM proxy and forwarded to the cache for processing.
+The RM can request a node to be added, updated or removed.
+
+**node add** 
+```
+1. RMProxy sends cacheevent.RMUpdateRequestEvent to cache
+2. cluster_info.processNewSchedulableNodes
+   2.1: node sanity check (such as existence of partition/node)
+   2.2: Add new nodes to the partition.
+   2.3: notify scheduler of new node via SchedulerNodeEvent
+3. notify RM of node additions and rejections via RMNodeUpdateEvent
+   3.1: notify the scheduler of allocations to recover via SchedulerAllocationUpdatesEvent
+4. scheduler.processAllocationUpdateEvent
+   4.1: scheduler creates a new ask based on the Allocation to recover 
+   4.2: recover the allocation on the new node using a special process
+   4.3: confirm the allocation in the scheduler, on failure update the cache with a ReleaseAllocationsEvent
+```
+
+**node update and removal**
+```
+1. RMProxy sends cacheevent.RMUpdateRequestEvent to cache
+2. cluster_info.processNodeActions
+   2.1: node sanity check (such as existence of partition/node)
+   2.2: Node info update (resource change)
+        2.2.1: update node in cache
+        2.2.2: notify scheduler of the node update via SchedulerNodeEvent
+   2.3: Node status update (not removal), update node status in cache only
+   2.4: Node removal
+        2.4.1: update node status and remove node from the cache
+        2.4.2: remove alloations and inform RM via RMReleaseAllocationEvent
+        2.4.3: notify scheduler of the node removal via SchedulerNodeEvent
+3. scheduler.processNodeEvent add/remove/update the node  
+```
+
+### Operations to add, update or remove partitions
+**Add RM**
+```
+1. RMProxy sends commonevents.RemoveRMPartitionsEvent
+   if RM is already registered
+   1.1: scheduler.removePartitionsBelongToRM
+        1.1.1: scheduler cleans up
+        1.1.2: scheduler sends commonevents.RemoveRMPartitionsEvent
+   1.2: cluster_info.processRemoveRMPartitionsEvent
+        1.2.1: cache cleans up
+2. RMProxy sends commonevents.RegisterRMEvent
+3. cluster_info.processRMRegistrationEvent
+   2.1: cache update internal partitions/queues accordingly.
+   2.2: cache sends to scheduler SchedulerUpdatePartitionsConfigEvent.
+3. scheduler.processUpdatePartitionConfigsEvent
+   3.1: Scheduler update partition/queue info accordingly.
+```
+
+**Update and Remove partition**
+Triggered by a configuration file update.
+```
+1. RMProxy sends commonevents.ConfigUpdateRMEvent
+2. cluster_info.processRMConfigUpdateEvent
+   2.1: cache update internal partitions/queues accordingly.
+   2.2: cache sends to scheduler SchedulerUpdatePartitionsConfigEvent.
+   2.3: cache marks partitions for deletion (not removed yet).
+   2.4: cache sends to scheduler SchedulerDeletePartitionsConfigEvent
+3. scheduler.processUpdatePartitionConfigsEvent
+   3.1: scheduler updates internal partitions/queues accordingly.
+4. scheduler.processDeletePartitionConfigsEvent
+   4.1: Scheduler set partitionManager.stop = true.
+   4.2: PartitionManager removes queues, applications, nodes async.
+        This is the REAL CLEANUP including the cache
+```
+
+### Allocations
+Allocations are initiated by the scheduling process.
+The scheduler creates a SchedulingAllocation on the scheduler side which then gets wrapped in an AllocationProposal.
+The scheduler has checked resources etc already and marked the allocation as inflight.
+This description picks up at the point the allocation will be confirmed and finalised.
+
+**New allocation**
+```
+1. Scheduler wraps an SchedulingAllocation in an AllocationProposalBundleEvent 
+2. cluster_info.processAllocationProposalEvent
+   preemption case: release preempted allocations
+   2.1: release the allocation in the cache
+   2.2: inform the scheduler the allocation is released via SchedulerNodeEvent
+   2.3: inform the RM the allocation is released via RMReleaseAllocationEvent
+   all cases: add the new allocation
+   2.4: add the new allocation to the cache
+   2.5: rejections are send back to the scheduler via SchedulerAllocationUpdatesEvent 
+   2.6: inform the scheduler the allocation is added via SchedulerAllocationUpdatesEvent
+   2.7: inform the RM the allocation is added via RMNewAllocationsEvent
+3. scheduler.processAllocationUpdateEvent
+   3.1: confirmations are added to the scheduler and change from inflight to confirmed.
+        On failure of processing a ReleaseAllocationsEvent is send to the cache *again* to clean up.
+        This is part of the issue in [YUNIKORN-169]
+        cluster_info.processAllocationReleases
+   3.2: rejections remove the inflight allocation from the scheduler. 
+```
+
+## Current locking
+**Cluster Lock:**  
+A cluster contains one or more Partition objects. A partition is a sub object of Cluster.  
+Adding or Removing ANY Partition requires a write-lock of the cluster.
+Retrieving any object within the cluster will require iterating over the Partition list and thus a read-lock of the cluster
+
+**Partition Lock:**  
+The partition object contains all links to Queue, Application or Node objects.
+Adding or Removing ANY Queue, Application or Node needs a write-lock of the partition.
+Retrieving any object within the partition will require a read-lock of the partition to prevent data races
+
+Examples of operation needing a write-lock
+- Allocation processing after scheduling, will change application, queue and node objects.
+  Partition lock is required due to possible updates to reservations.
+- Update of Node Resource 
+  It not only affect node's available resource, it also affects the Partition's total allocatable Resource 
+
+Example of operations that need a read-lock:
+- Retrieving any Queue, Application or Node needs a read-lock
+  The object itself is not locked as part of the retrieval
+- Confirming an allocation after processing in the cache
+  The partition is only locked for reading to allow retrieval of the objects that will be changed.
+  The changes are made on the underlying objects.
+
+Example of operations that do not need any lock: 
+- Scheduling  
+  Locks are taken on the specific objects when needed, no direct updates to the partition until the allocation is confirmed. 
+
+**Queue lock:**  
+A queue can track either applications (leaf type) or other queues (parent type).
+Resources are tracked for both types in the same way.
+
+Adding or removing an Application (leaf type), or a direct child queue (parent type) requires a write-lock of the queue.  
+Updating tracked resources requires a write-lock.
+Changes are made recursively never locking more than 1 queue at a time.  
+Updating any configuration property on the queue requires a write-lock.
+Retrieving any configuration value, or tracked resource, application or queue requires a read-lock.  
+
+Examples of operation needing a write-lock
+- Adding an application to a leaf queue
+- Updating the reservations
+
+Examples of operation needing a read-lock
+- Retrieving an application from a leaf type queue
+- Retrieving the pending resources 
+
+**Application lock:**  
+An application tracks resources of different types, the allocations and outstanding requests.  
+Updating any tracked resources, allocations or requests requires a write-lock.
+Retrieving any of those values requires a read-lock.
+
+Scheduling also requires a write-lock of the application.
+During scheduling the write-lock is held for the application.
+Locks will be taken on the node or queue that need to be accessed or updated.  
+Examples of the locks taken on other objects are:
+- a read lock to access queue tracked resources
+- a write-lock to update the in progress allocations on the node 
+
+Examples of operation needing a write-lock
+- Adding a new ask
+- Trying to schedule a pending request 
+
+Examples of operation needing a read-lock
+- Retrieving the allocated resources
+- Retrieving the pending requests
+
+**Node lock:**  
+An node tracks resources of different types and allocations.
+Updating any tracked resources or allocations requires a write-lock.
+Retrieving any of those values requires a read-lock.
+
+Checks run during the allocation phases take locks as required.
+Read-locks when checking write-locks when updating.
+A node is not locked for the whole allocation cycle.
+
+Examples of operation needing a write-lock
+- Adding a new allocation
+- updating the node resources
+
+Examples of operation needing a read-lock
+- Retrieving the allocated resources
+- Retrieving the reservation status
+
+## How to merge Cache and scheduler objects
+Since there is no longer the requirement to distinguish the objects in the cache and scheduler the `scheduling` and `info` parts of the name will be dropped.
+
+Overview of the main moves and merges:
+1. `application_info` & `scheduling_application`: **merge** to `scheduler.object.application`
+2. `allocation_info` & `scheduling_allocation`: **merge** to `scheduler.object.allocation`
+3. `node_info` & `scheduling_node`: **merge** to `scheduler.object.node`
+4. `queue_info` & `scheduling_queue`: **merge** to `scheduler.object.queue`
+5. `partition_info` & `scheduling_partition`: **merge** to `scheduler.PartitionContext`
+6. `cluster_info` & `scheduling_context`: **merge** to `scheduler.ClusterContext`
+7. `application_state`: **move** to `scheduler.object.applicationState`
+8. `object_state`: **move** to `scheduler.object.objectState`
+9. `initializer`: **merge** into `scheduler.ClusterContext`
+
+This move and merge of code includes a refactor of the objects into their own package.
+That thus affects the two scheduler only objects, reservations and schedulingAsk, that are already defined.
+Both will be moved into the objects package.
+
+The top level scheduler package remains for the contexts and scheduling code.
+
+## Code merges
+The first change is the event processing.
+All RM events will now directly be handled in the scheduler.
+Event handling will undergo a major change, far more than a simple merge.
+Only the RM generated events will be left after the merge.
+As described in the analysis above the scheduler is, in almost all cases, notified of changes from RM events.
+
+Broadly speaking there are only three types of changes triggered by the event removal: 
+- configuration changes: new scheduler code required as the cache handling is not transferable to the scheduler
+- node, ask and application changes: merge of the cache code into the scheduler
+- allocation changes: removal of confirmation cycle and simplification of the scheduler code
+
+Part of the event handling is the processing of the configuration changes.
+All configuration changes will now update the scheduler objects directly.
+The way the scheduler works is slightly different from the cache which means the code is not transferable. 
+
+Nodes and applications are really split between the cache and scheduler.
+Anything that is tracked in the cache object that does not have an equivalent value in the scheduler object will be moved into the scheduler object.
+All references to scheduler objects will be removed.
+With the code merges existing scheduler code that calls out directly into the cache objects will return the newly tracked value in the scheduler object.
+These calls will thus become locked calls in the scheduler.
+
+The concept of an in flight allocation will be removed.
+Allocation will be made in the same scheduling iteration without events or creation of a proposal.
+Removing the need for tracking of allocating resources on the scheduler objects.
+In flight resource tracking was required to make sure that an allocation while not confirmed by the cache would being taken into account while making scheduling decisions.
+
+The application and object state will be an integrated part of the scheduler object.
+A state change is thus immediate and this should prevent an issue like [YUNIKORN-169](https://issues.apache.org/jira/browse/YUNIKORN-169) from occuring.
+
+## Locking after merge
+
+### Direction of lock 
+It is possible to acquire another lock while holding a lock, but we need to make sure that we do not allow: 
+- Holding A.lock and acquire B's lock. 
+- Holding B.lock and acquire B's lock. 
+
+The current code in the scheduler takes a lock as late as possible and only for the time period needed.
+Some actions are not locked on the scheduler side just on the cache side as each object has its own lock.
+This means that a read of a value from the cache would not lock the scheduling object.
+
+With the integration of the cache into the scheduler the number of locks will decrease as the number of objects decreases.
+Each equivalent object, cache and scheduler, which used to have their own lock will now have just one.
+After the merge of the code is performed one lock will be left.
+Locking will occur more frequently as the number of fields in the scheduler objects has increased.
+
+Calls that did not lock the scheduler object before the merge will become locked.
+Lock contention could lead to performance degradation.
+The reduced overhead in objects and event handling can hopefully compensate for this.
+One point to keep track of is the change in locking behaviour.
+New behaviour could lead to new deadlock situations when code is simply merged without looking at the order.
+
+### Mitigations for deadlocks
+The locking inside the scheduler will be left as is.
+This means that the main scheduling logic will be taking and releasing locks as required on the objects.
+There are no long held read-locks or write-locks until the application is locked to schedule it.
+
+A major point of attention will need to be that no iterations of objects should be performed while holding on to a lock.
+For instance during scheduling while iterating over a queue's application we should not lock the queue.
+
+Another example would be that event processing in the partition should not lock the partition unneeded.
+The partition should be locked while retrieving for instance the node that needs updating and release the lock before it tries to lock the node itself.
+
+This approach fits in with the current locking approach and will keep the locking changes to a minimum.
+Testing, specifically end-to-end testing, should catch these deadlocks. 
+There are no known tools that could be used to detect or describe lock order.
diff --git a/versioned_docs/version-1.1.0/design/cross_queue_preemption.md b/versioned_docs/version-1.1.0/design/cross_queue_preemption.md
new file mode 100644
index 000000000..51c803308
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/cross_queue_preemption.md
@@ -0,0 +1,126 @@
+---
+id: cross_queue_preemption
+title: Cross Queue Preemption
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Problems:
+
+According to lessons we learned from YARN Scheduler preemption. 
+
+**Here're top bad things:** 
+
+- Preemption is a shotgun instead of a sniper, when a preemption decision is made, nobody knows if preempted resources will go to demanding queue/app/user or not.
+- Preemption logic and allocation is different, we have to implement (and mimic) what we have done in scheduler allocation logic. 
+
+**Here're top good things:**
+
+- Preemption is fast (thanks to the shotgun), reclaiming thousands of containers only takes ~ 1 sec. 
+- We have understand how painful it is to handle DRF, multiple preemption policies (inter/intra-queue, shotgun/surgical preemption, etc.) And we have developed some good logic 
+to make sure better modularization and plug-ability  
+
+## Answer some questions for design/implementation choices
+
+**1\. Do we really want preemption-delay? (Or we just want to control pace)**
+
+In CS, we have preemption-delay, which select victims in preemption candidates, wait for a certain time before killing it. 
+
+The purposes of preemption delay are: a. give heads-up time to apps so 
+they can prepare bad things happen (unfortunately no app do anything for these heads up, at least from what I knew). b. control preemption pace.   
+
+And in practice, I found it causes a lot of issues, for example when a 
+cluster state keep changing, it is very hard to ensure accurate preemption. 
+
+**Proposal:**
+
+Remove the preemption-delay, keep the logics of controlling preemption pace. (such as ```yarn.resourcemanager.monitor.capacity.preemption
+.total_preemption_per_round```). And we can do allocation together with preemption.
+This don't mean containers will be stopped immediately after preemption issued. Instead, RM can control delays between signal a container and kill a container. Such as grace 
+termination of POD in K8s: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods   
+
+**2\. Do we want to do preemption for every scheduling logic, or we can do periodically?**
+
+In CS, we have preemption logic runs periodically, like every 1 sec or 3 sec. 
+
+Since preemption logic involves some heavy logics, like calculating shares of queues/apps. And when doing accurate preemption, we may need to scan nodes for preemption candidate. 
+Considering this, I propose to have preemption runs periodically. But it is important to note that, we need to try to use as much code as possible for 
+allocation-inside-preemption, otherwise there will be too much duplicated logic and very hard to be maintained in the future.
+
+**3\. Preemption cost and function**
+
+We found it is helpful to add cost for preemption, such as container live time, priority, type of container. It could be a cost function (Which returns a numeric value) or it 
+could be a comparator (which compare two allocations for preemption ask).
+
+## Pseudo code
+
+Logic of allocation (invoked every allocation cycle)
+
+```
+input:
+  - nAlloc, allocate N allocations for this allocation cycle.
+
+for partition: 
+  askCandidates := findAskCandidates(nAlloc, preemption=false)
+  
+  allocated, failed_to_allocated := tryAllocate(askCandidates);
+  
+  send-allocated-to-cache-to-commit;
+  
+  update-missed-opportunity (allocated, failed_to_allocated);
+  
+  nAlloc -= len(allocated)   
+```
+
+Logic of preemption (invoked every preemption cycle)
+
+```
+// It has to be done for every preemption-policy because calculation is different.
+for preemption-policy: 
+  preempt_results := policy.preempt()
+  for preempt_results: 
+     send-preempt-result-to-cache-to-commit;
+     updated-missed-opportunity (allocated)
+```
+
+Inside preemption policy
+
+```
+inter-queue-preempt-policy:
+  calculate-preemption-quotas;
+  
+  for partitions:
+    total_preempted := resource(0);
+    
+    while total_preempted < partition-limited:
+      // queues will be sorted by allocating - preempting
+      // And ignore any key in preemption_mask
+      askCandidates := findAskCandidates(N, preemption=true)
+      
+      preempt_results := tryAllocate(askCandidates, preemption=true);
+      
+      total_preempted += sigma(preempt_result.allocResource)
+      
+      send-allocated-to-cache-to-commit;
+      
+      update-missed-opportunity (allocated, failed_to_allocated);
+      
+      update-preemption-mask(askCandidates.allocKeys - preempt_results.allocKeys)
+```
\ No newline at end of file
diff --git a/versioned_docs/version-1.1.0/design/gang_scheduling.md b/versioned_docs/version-1.1.0/design/gang_scheduling.md
new file mode 100644
index 000000000..f475395c5
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/gang_scheduling.md
@@ -0,0 +1,605 @@
+---
+id: gang_scheduling
+title: Gang scheduling design
+---
+
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+# Gang Scheduling Implementation
+A new way of scheduling applications by taking into account the demand for resources the application expects it will generate over time.
+It guarantees the expected demand resources for the application by reserving the resources.
+
+There are two parts to this implementation:
+*   Kubernetes Shim
+*   Core and scheduling
+
+This document describes the implementation on the core side.
+
+## Document goals
+This document describes the following implementation design points:
+1. Define changes required for the shim to core communication (scheduler interface)
+2. Scheduler storage object changes
+3. Scheduler logic changes
+
+## Excluded design points
+Currently, the Kubernetes shim side implementation is not covered in this design document.
+
+Generalised preemption on the core side will _not_ be discussed in this design.
+
+## Generic flow
+The flow is triggered by a pod that is submitted which triggers the application creation.
+This first pod is in the case of a Spark application, the driver pod.
+In case the flow is triggered from the creation of an application CRD there will not be a first pod.
+This is however outside the core scheduling logic. From the core side there should be no difference between the two cases.
+More details are in the chapter on the [Scheduler logic changes](#scheduler-logic-changes).
+
+The flow of an application submitted. The numbers in the diagram correspond to the description below the diagram.
+
+![generic flow](./../assets/gang_generic_flow.png)
+
+Combined flow for the shim and core during startup of an application:
+*   An application is submitted with TaskGroup(s) defined. (1)
+*   The shim creates the application and passes the application to the core. (2)
+*   The shim creates placeholder pods for each of the members of the TaskGroup(s) (3)
+*   The pods are processed and passed to the core, as per the normal behaviour, as AllocationAsks for the application with the correct info set. (4)
+*   The placeholder AllocationAsk’s are scheduled by the core as if they were normal AllocationAsk’s. (5)
+*   All Allocations, even if they are the result of the placeholder AllocationAsks being allocated by the scheduler, are communicated back to the shim.
+*   The original real pod is passed to the core as an AllocationAsk. (6)
+*   After the real pod and all all the placeholder pods are scheduled the shim starts the real pod that triggered the application creation. (7)
+
+After the first, real, pod is started the following pods should all be handled in the same way (8):
+*   A real pod is created on k8s.
+*   The pod is processed and an AllocationAsk is created.
+*   The scheduler processes the AllocationAsk (more detail below) and replaces a placeholder with the real allocation.
+
+## Application submit handling
+### Total placeholder size
+
+The application if requesting one or more TaskGroups should provide the total size of all the TaskGroup members it is going to request.
+The total resource size is required for the case that the application is scheduled in a queue with a resource limit set.
+
+The value is important for three cases:
+1. gang request is larger than the queue quota
+2. start of scheduling reservations
+3. resource pressure while scheduling reservations
+
+Further detail will be given below in [scheduling in queues with a quota set](#scheduling-in-queues-with-a-quota-set)
+
+The information passed on from the shim should be part of the AddApplicationRequest.
+Detailed information on the build up of the taskGroup(s), or the number of members are not relevant.
+The total resource requested by all taskGroup members is calculated using:
+
+![ask caclulation](./../assets/gang_total_ask.png)
+
+This total placeholderAsk is added as an optional field to the AddApplicationRequest message.
+The calculation can be made by the shim based on the CRD or annotation provided in the pod description.
+
+If the placeholderAsk is larger than the queue quota set on the queue the application must be rejected.
+This rejection is based on the fact that we cannot in any way honor the request.
+For all other cases the application is accepted and will be scheduled as per normal.
+
+### Handling queue with a FAIR sort policy
+If an application is submitted to a queue that has a FAIR sort policy set it must be rejected.
+Queue sorting for the queue that the application runs in must be set to _FIFO_ or _StateAware_.
+
+Other queue policies cannot guarantee that there is only one _New_ application processed at a time.
+In the case of the _FAIR_ policy we could be allocating multiple _New_ applications at the same time making quota management impossible to enforce.
+The other side effect of using _FAIR_ as a policy could be that we get multiple applications with only a partial allocated guarantee.
+
+Auto-scaling can be triggered due to the fact that the core can not place the placeholders on any node.
+In case the queue would use the _FAIR_ sorting this could lead to other applications taking the scaled up nodes instead of the placeholders again breaking the gang.
+
+## Scheduling in queues with a quota set
+The main case already described above is handling a total placeholder request size that is larger than the quota set on the queue.
+When the application is submitted we can already assess that we cannot satisfy that requirement and reject the request.
+
+In the case that the total placeholder ask does fit in the queue we should not start scheduling until there are enough resources available in the queue to satisfy the total request.
+However this does not stop scheduling of other applications in the queue(s).
+Applications that are already running in the queue could ask for more resources.
+From an application perspective there is no limit set on the resource it can request.
+The gang defined on the application is a guaranteed number of resources, not a maximum number of resources the application can request.
+
+This is complicated by the fact that we have a queue hierarchy.
+There is the possibility that the quota is not set directly on the queue the application is running.
+It could be set on one of the parent queues.
+This case could become complex, and we need to make sure that we keep in mind that we could live lock the scheduling.
+
+In this first phase we should focus on the case that the gang resources requested are also the maximum number of resources the application will request.
+When we look at the queues we should focus on a single queue level with quotas.
+
+These two assumptions are correct for the spark use case without dynamic allocation using a dynamic mapping from a namespace to a queue.
+
+Furthermore, we assume that the quota set on the queue can be totally allocated.
+If the cluster does not have enough resources the cluster will scale up to the size needed to provide all queues with their full quota.
+
+The follow up should add further enhancements for deeper hierarchies and dynamic allocation support.
+This could also leverage preemption in certain use cases, like preempting allocations from applications over their total gang size.
+
+Further enhancements could be added by allowing specifying the time and application will wait for the placeholders to be allocated, or the time to start using the held resources.
+
+## Scheduler logic changes
+The scheduler logic change needs to account for two parts of cycle:
+*   The placeholder asks and their allocation.
+*   The allocation replacing the placeholder.
+
+The basic assumption is that all pods will generate a placeholder pod request to the core.
+This includes the pod that triggered the application creation if we do not use the application CRD.
+This assumption is needed to make sure that the scheduler core can behave in the same way for both ways of submitting the application.
+The placeholder pods must be communicated to the core before the real pod.
+
+Changes for the placeholder AllocationAsks are the first step.
+As part of the creation of the application the AllocationAsks get added.
+The addition of an AllocationsAsk normally will trigger the application state change as per the scheduling cycle.
+It moves the Application from a _New_ state to an _Accepted_ state. This is as per the current setup, and does not change.
+
+However, in the case that the AllocationAsk has the _placeholder_ flag set the allocation should not trigger a state change, the application stays in _Accepted_ state.
+AllocationAsks are processed until the application has no pending resources.
+AllocationAsks that do not have a _placeholder_ flag set should be ignored as a safety precaution.
+All resulting Allocations for the placeholder pods are confirmed to the shim as per the normal steps.
+This process continues until there are no more placeholder pods to be allocated.
+
+The shim at that point should create the AllocationAsk for the real pod(s) that it has buffered.
+The core cannot and must not assume that there is only one task group per application.
+The core is also not in the position to assume that it has received all AllocationAsks that belong to the task group if option 1 as described above is used by a shim.
+This is also why we have the assumption that every pod creates a placeholder request to the core.
+
+The second change is the replacement of the placeholder pods with the real pods.
+The shim creates an AllocationAsk with the _taskGroupName_ set but the _placeholder_ flag is not set.
+
+The process described here lines up with the process for generic pre-emption.
+An allocation is released by the core and then confirmed by the shim.
+For gang scheduling we have a simple one new to one release relation in the case of pre-emption we can use the same flow with a one new to multiple release relation.
+
+The scheduler processes the AllocationAsk as follows:
+1. Check if the application has an unreleased allocation for a placeholder allocation with the same _taskGroupName._ If no placeholder allocations are found a normal allocation cycle will be used to allocate the request.
+2. A placeholder allocation is selected and marked for release. A request to release the placeholder allocation is communicated to the shim. This must be an async process as the shim release process is dependent on the underlying K8s response which might not be instantaneous.  
+   NOTE: no allocations are released in the core at this point in time.
+3. The core “parks” the processing of the real AllocationAsk until the shim has responded with a confirmation that the placeholder allocation has been released.  
+   NOTE: locks are released to allow scheduling to continue
+4. After the confirmation of the release is received from the shim the “parked” AllocationAsk processing is finalised.
+5. The AllocationAsk is allocated on the same node as the placeholder used.
+   The removal of the placeholder allocation is finalised in either case. This all needs to happen as one update to the application, queue and node.
+    * On success: a new Allocation is created.
+    * On Failure: try to allocate on a different node, if that fails the AllocationAsk becomes unschedulable triggering scale up. 
+6. Communicate the allocation back to the shim (if applicable, based on step 5)
+
+## Application completion
+Application completion has been a long standing issue.
+Currently, applications do not transition to a _completed_ state when done.
+The current states for the application are [documented here](./scheduler_object_states.md).
+However, at this point in time an application will not reach the _completed_ state and will be stuck in _waiting_.
+
+This provides a number of issues specifically around memory usage and cleanup of queues in long running deployments.
+
+### Definition
+Since we cannot rely on the application, running as pods on Kubernetes, to show that it has finished we need to define when we consider an application _completed_.
+At this point we are defining that an application is _completed_ when it has been in the _waiting_ state for a defined time period.
+An application enters the waiting state at the time that there are no active allocations (allocated resources > 0) and pending allocation asks (pending resources > 0).
+
+The transition to a _waiting_ state is already implemented.
+The time out of the _waiting_ state is new functionality.
+
+Placeholders are not considered active allocations.
+Placeholder asks are considered pending resource asks.
+These cases will be handled in the [Cleanup](#Cleanup) below.
+
+### Cleanup
+When we look at gang scheduling there is a further issue around unused placeholders, placeholder asks and their cleanup.
+Placeholders could be converted into real allocations at any time there are pending allocation asks or active allocations.
+
+Placeholder asks will all be converted into placeholder allocations before the real allocations are processed.
+
+Entry into the _waiting_ state is already handled.
+If new allocation asks are added to the application it will transition back to a _running_ state.
+At the time we entered the waiting state. there were no pending requests or allocated resources.
+There could be allocated placeholders.
+
+For the entry into the _waiting_ state the application must be clean.
+However, we can not guarantee that all placeholders will be used by the application during the time the application runs.
+Transitioning out of the _waiting_ state into the _completed_ state requires no (placeholder) allocations or asks at all.
+The second case that impact transitions is that not all placeholder asks are allocated, and the application thus never requests any real allocations.
+These two cases could prevent an application from transitioning out of the _accepted_, or the _waiting_ state.
+
+Processing in the core thus needs to consider two cases that will impact the transition out of specific states:
+1. Placeholder asks pending (exit from _accepted_)
+2. Placeholders allocated (exit from _waiting_)
+
+Placeholder asks pending:  
+Pending placeholder asks are handled via a timeout.
+An application must only spend a limited time waiting for all placeholders to be allocated.
+This timeout is needed because an application’s partial placeholders allocation may occupy cluster resources without really using them.
+
+An application could be queued for an unknown time, waiting for placeholder allocation to start.
+The timeout for placeholder asks can thus not be linked to the creation of the application or the asks.
+The timeout must start at the time the first placeholder ask is allocated.
+
+The application cannot request real allocations until all placeholder asks are allocated.
+A placeholder ask is also tracked by the shim as it represents a pod.
+Releasing an ask in the core requires a message to flow between the core and shim to release that ask.
+However, in this case the timeout for allocating placeholder asks triggers an application failure.
+When the timeout is triggered and placeholder asks are pending the application will transition from the state it is in, which can only be _accepted_, to _killed_.
+
+The application state for this case can be summarised as:
+*   Application status is _accepted_
+*   Placeholder allocated resource is larger than zero, and less than the _placeholderAsk_ from the _AddApplicationRequest_
+*   Pending resource asks is larger than zero
+
+Entering into the _killed_ state must move the application out of the queue automatically.
+
+The state change and placeholder allocation releases can be handled in a single UpdateResponse message. The message will have the following content:
+*   _UpdatedApplication_ for the state change of the application
+*   one or more _AllocationRelease_ messages, one for each placeholder, with the  _TerminationType_ set to TIMEOUT
+*   one or more AllocationAskRelease messages with the _TerminationType_ set to TIMEOUT
+
+The shim processes the AllocationAskRelease messages first, followed by the _AllocationResponse_ messages, and finally the _UpdatedApplication_ message. The application state change to the _killed_ state on the core side is only dependent on the removal of all placeholders pods, not on a response to the _UpdatedApplication _message.
+
+![placeholder timeout](./../assets/gang_timeout.png)
+
+Combined flow for the shim and core during timeout of placeholder:
+*   The core times out the placeholder allocation. (1)
+*   The placeholder Allocations removal is passed to the shim. (2)
+*   All placeholder Allocations are released by the shim, and communicated back to the core.
+*   The placeholder AllocationAsks removal is passed to the shim. (3)
+*   All placeholder AllocationAsks are released by the shim, and communicated back to the core.
+*   After the placeholder Allocations and Asks are released the core moves the application to the killed state removing it from the queue (4).
+*   The state change is finalised in the core and shim. (5)
+
+Allocated placeholders:  
+Leftover placeholders need to be released by the core.
+The shim needs to be informed to remove them. This must be triggered on entry of the _completed_ state.
+After the placeholder release is requested by the core the state transition of the application can proceed.
+The core will process the _AllocationRelease_ messages for placeholder allocations that come back from the shim with the _TerminationType_ set to TIMEOUT as normal without triggering a state change.
+
+The state change and placeholder allocation releases can be handled in a single UpdateResponse message.
+The message will have the following content:
+*   _UpdatedApplication_ for the state change of the application
+*   zero or more _AllocationRelease_ messages, one for each placeholder, with the  _TerminationType_ set to TIMEOUT
+
+The shim processes the _AllocationResponse_ messages first followed by the _UpdatedApplication_ message.
+The application state change to the _completed_ state on the core side is only dependent on the removal of all placeholders pods, not on a response to the _UpdatedApplication _message.
+
+Entering into the _completed_ state will move the application out of the queue automatically.
+This should also handle the case we discussed earlier around a possible delayed processing of requests from the shim as we can move back from _waiting_ to _running_ if needed.
+A _completed_ application should also not prevent the case that was discussed around cron like submissions using the same application ID for each invocation.
+A _completed_ application with the same application ID must not prevent the submission of a new application with the same ID.
+
+![application cleanup flow](./../assets/gang_clean_up.png)
+
+Combined flow for the shim and core during cleanup of an application:
+*   A pod is released at the Kubernetes layer. (1)
+*   The shim passes the release of the allocation on to the core. (2)
+*   The core transitions the application to a waiting state if no pending or active allocations. (3)
+*   The waiting state times out and triggers the cleanup. (4)
+*   The placeholder Allocations removal is passed to the shim. (5)
+*   All placeholder Allocations are released by the shim, and communicated back to the core.
+*   After all placeholders are released the core moves the application to the completed state removing it from the queue (6).
+*   The state change is finalised in the core and shim. (7)
+
+## Application recovery
+During application recovery the placeholder pods are recovered as any other pod on a node.
+These pods are communicated to the core by the shim as part of the node as an existing allocation.
+Existing allocations do not have a corresponding _AllocationAsk_ in the core. The core generates an _AllocationAsk_ based on the recovered information.
+
+For gang scheduling the _AllocationAsk_ contains the _taskGroupName_ and _placeholder_ flag.
+During recovery that same information must be part of the _Allocation_ message.
+This is due to the fact that the same message is used in two directions, from the RM to the scheduler and vice versa means we need to update the message and its processing.
+
+If the information is missing from the _Allocation_ message the recovered allocation will not be correctly tagged in the core.
+The recovered allocation will be seen as a regular allocation.
+This means it is skipped as part of the normal allocation cycle that replaces the placeholders.
+
+The logic change only requires that the recovery of existing allocations copies the fields from the interface message into the allocation object in the core.
+
+## Interface changes
+Multiple changes are needed to the communication between the shim and the core to support the gang information needed.
+
+An application must provide the total size of the placeholder requests to prevent accepting an application that can never run.
+
+The current object that is sent from the shim to the core for allocation requests is defined in the AllocationAsk.
+The Allocation, as the result message passed back from the scheduler core does not change. For recovery, which uses the same Allocation message, from the shim to the core, however must contain the gang related fields.
+Gang related fields must be added to both messages.
+
+The allocation release request and response request need to support bidirectional traffic and will need to undergo major changes.
+
+### AddApplication
+The AddApplicationRequest message requires a new field to communicate the total placeholder resource request that will be requested.
+The field is used to reject the application if it is impossible to satisfy the request.
+It can also be used to stop the core from scheduling any real pods for that application until all placeholder pods are processed.
+
+In patched message form that would look like:
+```
+message AddApplicationRequest {
+...
+  // The total amount of resources gang placeholders will request
+  Resource placeholderAsk = 7;
+...
+}
+```
+
+### AllocationAsk
+The first part of the change is the base information for the task group.
+This will require an additional optional attribute to be added.
+The content of this optional attribute is a name, a string, which will be mapped to the name of the task group.
+The field can be present on a real allocation and on a placeholder.
+
+Proposed name for the new field is: _taskGroupName_
+
+To distinguish normal AllocationAsks and placeholder AllocationAsks a flag must be added.
+The flag will never have more than two values and thus maps to a boolean. As the default value for a boolean is _false_ the field should show the fact that it is an AllocationAsk that represents a placeholder as true.
+
+Proposed name for the field is: _placeholder_
+
+In patched message form that would look like:
+```
+message AllocationAsk {
+...
+  // The name of the TaskGroup this ask belongs to
+  string taskGroupName = 10;
+  // Is this a placeholder ask (true) or a real ask (false), defaults to false
+  // ignored if the taskGroupName is not set
+  bool placeholder = 11;
+...
+}
+```
+
+The last part of the task group information that needs to be communicated is the size of the task group.
+This does not require a change in the interface as the current AllocationAsk object can support both possible options.
+
+Requests can be handled in two ways:
+1. Each member of the task group is passed to the core as a separate AllocationAsk with a maxAllocations, or the ask repeat, of 1
+2. The task group is considered one AllocationAsk with a maxAllocations set to the same value as minMember of the task group information.
+
+With option A the shim will need to generate multiple AllocationAsk objects and pass each to the core for scheduling.
+Each AllocationAsk is linked to one pod.
+Option B will only generate one AllocationAsk for all placeholder pods.
+Option B requires less code and has less overhead on the core side.
+However the logic on the shim side might be more complex as the returned allocation needs to be linked to just one pod.
+
+Proposal is to use option: A
+
+### Allocation
+Similar to the change for the _AllocationAsk_ the _Allocation_ requires additional optional attributes to be added.
+The new fields distinguish a normal Allocation and placeholder Allocations on recovery.
+The same rules apply to these fields as the ones added to the _AllocationAsk_.
+
+The content of this optional attribute is a name, a string, which will be mapped to the name of the task group.
+The field can be present on a real allocation and on a placeholder.
+
+Proposed name for the new field is: _taskGroupName_
+
+The flag will never have more than two values and thus maps to a boolean.
+As the default value for a boolean is _false_ the field should show the fact that it is an Allocation that represents a placeholder as true.
+
+Proposed name for the field is: _placeholder_
+
+In patched message form that would look like:
+```
+message Allocation {
+...
+  // The name of the TaskGroup this allocation belongs to
+  string taskGroupName = 11;
+  // Is this a placeholder allocation (true) or a real allocation (false), defaults to false
+  // ignored if the taskGroupName is not set
+  bool placeholder = 12;
+...
+}
+```
+
+### AllocationRelease Response and Request
+The name for the messages are based on the fact that the release is always triggered by the shim.
+In case of preemption and or gang scheduling the release is not triggered from the shim but from the core.
+That means the message name does not cover the usage. A response message might not have an associated request message.
+It could be used to indicate direction but that is in this case confusing.
+
+When a release is triggered from the core, for preemption or the placeholder allocation, a response is expected from the shim to confirm that the release has been processed.
+This response must be distinguished from a request to release the allocation initiated by the shim.
+A release initiated by the shim must be followed by a confirmation from the core to the shim that the message is processed.
+For releases initiated by the core no such confirmation message can or must be sent.
+In the current request message there is no way to indicate that it is a confirmation message.
+
+To fix the possible confusing naming the proposal is to merge the two messages into one message: _AllocationRelease_.
+
+The _AllocationReleaseRequest_ is indirectly part of the _UpdateRequest_ message as it is contained in the _AllocationReleasesRequest_.
+The _AllocationReleaseResponse_ is part of the _UpdateResponse_ message.
+The flow-on effect of the rename and merge of the two messages is a change in the two messages that contain them.
+The message changes for _UpdateResponse_ and _AllocationReleasesRequest_ are limited to type changes of the existing fields.
+
+| Message                   | Field ID | Old type                  | New type          |
+| ------------------------- | -------- | ------------------------- | ----------------- |
+| UpdateResponse            | 3        | AllocationReleaseResponse | AllocationRelease |
+| AllocationReleasesRequest | 1        | AllocationReleaseRequest  | AllocationRelease |
+
+In patched message form that would look like:
+```
+message UpdateResponse {
+...
+  // Released allocation(s), allocations can be released by either the RM or scheduler.
+  // The TerminationType defines which side needs to act and process the message. 
+  repeated AllocationRelease releasedAllocations = 3;
+...
+}
+
+message AllocationReleasesRequest {
+  // Released allocation(s), allocations can be released by either the RM or scheduler.
+  // The TerminationType defines which side needs to act and process the message. 
+  repeated AllocationRelease releasedAllocations = 1;
+...
+}
+```
+
+The merged message _AllocationRelease_ will consist of:
+
+| Field name      | Content type      | Required |
+| --------------- | ----------------- | -------- |
+| partitionName   | string            | yes      |
+| applicationID   | string            | no       |
+| UUID            | string            | no       |
+| terminationType | _TerminationType_ | yes      |
+| message         | string            | no       |
+
+Confirmation behaviour of the action should be triggered on the type of termination received.
+The core will confirm the release to the shim of all types that originate in the shim and vice versa.
+
+A confirmation or response uses the same _TerminationType_ as was set in the original message.
+An example of this is a pod that is removed from K8s will trigger an _AllocationRelease _message to be sent from the shim to the core with the TerminationType STOPPED_BY_RM. The core processes the request removing the allocation from the internal structures, and when all processing is done it responds to the shim with a message using the same _TerminationType_.
+The shim can ignore that or make follow up changes if needed.
+
+A similar process happens for a release that originates in the core.
+Example of the core sending an _AllocationRelease_ message to the shim using the _TerminationType_ PREEMPTED_BY_SCHEDULER.
+The shim handles that by releasing the pod identified and responds to the core that it has released the pod.
+On receiving the confirmation that the pod has been released the core can progress with the allocation and preemption processing.
+
+In patched message form that would look like:
+```
+message AllocationRelease {
+  enum TerminationType {
+    STOPPED_BY_RM = 0;
+    TIMEOUT = 1; 
+    PREEMPTED_BY_SCHEDULER = 2;
+    PLACEHOLDER_REPLACED = 3;
+  }
+
+  // The name of the partition the allocation belongs to
+  string partitionName = 1;
+  // The application the allocation belongs to
+  string applicationID = 2;
+  // The UUID of the allocation to release, if not set all allocations are released for
+  // the applicationID
+  string UUID = 3;
+  // The termination type as described above 
+  TerminationType terminationType = 4;
+  // human-readable message
+  string message = 5;
+}
+```
+### TerminationType
+The currently defined _TerminationType_ values and specification of the side that generates (Sender), and the side that actions and confirms processing (Receiver):
+
+| Value                    | Sender | Receiver |
+| ------------------------ | ------ | -------- |
+| STOPPED_BY_RM            | shim   | core     |
+| TIMEOUT *                | core   | shim     |
+| PREEMPTED_BY_SCHEDULER * | core   | shim     |
+
+* currently not handled by the shim, core or both
+
+When the placeholder allocation gets released the _AllocationReleaseResponse_ is used to communicate the release back from the core to the shim.
+The response contains an enumeration called _TerminationType_, and a human-readable message.
+For tracking and tracing purposes we should add a new _TerminationType_ specifically for the placeholder replacement. The shim must take action based on the type and confirm the allocation release to the core.
+
+It should provide enough detail so we do not have to re-use an already existing type, or the human-readable message.
+The human-readable format can still be used to provide further detail on which new allocation replaced the placeholder.
+
+Proposal is to add: _PLACEHOLDER_REPLACED_
+
+| Value                | Sender | Receiver |
+| ---------------------| ------ | -------- |
+| PLACEHOLDER_REPLACED | shim   | core     |
+
+As part of the Scheduler Interface cleanup ([YUNIKORN-486](https://issues.apache.org/jira/browse/YUNIKORN-486)) the _TerminationType_ should be extracted from the _AllocationRelease_ and _AllocationaskRelease_ message.
+It is an enumeration that can be shared between multiple objects.
+[YUNIKORN-547](https://issues.apache.org/jira/browse/YUNIKORN-547) has been logged to handle this as it has an impact on the code outside of the scope of gang scheduling.
+
+
+### AllocationAskRelease Response and Request
+The allocation ask release right now can only be triggered by the shim.
+In order for the core to perform the cleanup when the placeholder allocation times out, we need to make this a bidirectional message.
+Similarly to the Allocation we would rename the _AllocationAskReleaseRequest_ to _AllocationAskRelease_, so we can use this message in both directions:
+```
+message AllocationReleasesRequest {
+...
+  // Released allocationask(s), allocationasks can be released by either the RM or
+  // scheduler. The TerminationType defines which side needs to act and process the
+  // message. 
+  repeated AllocationAskRelease allocationAsksToRelease = 2;
+}
+```
+
+Similar processing logic based on the _TerminationType_ which is used for allocations should be used for ask releases.
+In patched message form that would look like:
+```
+message AllocationAskRelease {
+  enum TerminationType {
+    STOPPED_BY_RM = 0;
+    TIMEOUT = 1; 
+    PREEMPTED_BY_SCHEDULER = 2;
+    PLACEHOLDER_REPLACED = 3;
+  }
+...
+  // The termination type as described above 
+  TerminationType terminationType = 4;
+...
+}
+```
+
+Confirmation behaviour of the action should be triggered on the type of termination received.
+The core will confirm the release to the shim of all types that originate in the shim and vice versa.
+
+A confirmation or response uses the same _TerminationType_ as was set in the original message.
+
+## Scheduler storage object changes
+### AllocationAsk
+In line with the changes for the communication the objects in the scheduler also need to be modified to persist some of the detail communicated.
+The AllocationAsk that is used in the communication has an equivalent object inside the scheduler with the same name.
+This object needs to be able to store the new fields proposed above.
+
+Proposed new fields: _taskGroupName_ and _placeholder_.
+
+In the current interface specification a field called _executionTimeoutMilliSeconds_ is defined.
+This is currently not mapped to the object inside the scheduler and should be added.
+Time or Duration are stored as native go objects and do not include a size specifier.
+
+Proposed new field: _execTimeout_
+
+### Allocation
+After the allocation is made an Allocation object is created in the core to track the real allocation. This Allocation object is directly linked to the application and should show that the allocation is a placeholder and for which task group. This detail is needed to also enable the correct display of the resources used in the web UI.
+
+The propagation of the placeholder information could be achieved indirectly as the allocation object references an AllocationAsk. This would require a lookup of the AllocationAsk to assess the type of allocation. We could also opt to propagate the data into the Allocation object itself. This would remove the lookup and allow us to directly filter allocations based on the type and or task group information.
+
+From a scheduling and scheduler logic perspective the indirect reference is not really desirable due to the overhead of the lookups required. This means that the same fields added in the AllocationAsk are also added to the Allocation object.
+
+Proposed new fields: _taskGroupName_ and _placeholder_.
+
+To support the release of the allocation being triggered from the core tracking of the release action is required. The release is not final until the shim has confirmed that release. However during that time period the allocation may not be released again.
+
+Proposed new field: _released_
+
+At the point that we replace the placeholder with a real allocation we need to release an existing placeholder.
+The Allocation object allows us to specify a list of Allocations to release.
+This field was added earlier to support preemption.
+This same field will be reused for the placeholder release.
+
+### Application
+The AddApplicationRequest has a new field added that needs to be persisted in the object inside the scheduler.
+
+Proposed new field: _placeholderAsk_
+
+In the current interface specification a field called _executionTimeoutMilliSeconds_ is defined. This is currently not mapped to the object inside the scheduler and should be added. Time or Duration are stored as native go objects and do not include a size specifier.
+
+Proposed new field: _execTimeout_
+
+The application object should be able to track the placeholder allocations separately from the real allocations. The split of the allocation types on the application will allow us to show the proper state in the web UI.
+
+Proposed new field: _allocatedPlaceholder_
+
+
+### Queue & Node
+No changes at this point.
+The placeholder allocations should be counted as “real” allocations on the Queue and Node.
+By counting the placeholder as normal the quota for the queue is enforced as expected.
+The Node object needs to also show normal usage to prevent interactions with the autoscaler.
diff --git a/versioned_docs/version-1.1.0/design/generic_resource.md b/versioned_docs/version-1.1.0/design/generic_resource.md
new file mode 100644
index 000000000..6bc354b24
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/generic_resource.md
@@ -0,0 +1,75 @@
+---
+id: generic_resource
+title: Generic Resource Types in Namespace Quota
+---
+
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+# Generic Resource Types in Namespace Quota
+Tracking jira: [YUNIKORN-1275](https://issues.apache.org/jira/browse/YUNIKORN-1279)
+
+## Functional gap
+The queue configuration allows all resource types to be set in a quota. Namespace annotations do not. Support for the same resource types in the annotations on namespaces should be possible.
+
+## Current solution
+In the current setup YuniKorn supports annotations on a namespace to specify a resource quota for that namespace. This is used in combination with placement rules to create a quota limited queue automatically based on the namespace in Kubernetes.
+
+The annotations that are supported limit the possible resource types that are supported on these auto created queues. Each resource type uses its own annotation. Current annotations supported as per the quota management documentation:
+```
+yunikorn.apache.org/namespace.max.cpu
+yunikorn.apache.org/namespace.max.memory
+```
+The queue configuration itself, as part of the yaml file, supports all Kubernetes resources including extended resources.
+## Proposed solution
+The current solution uses a specific annotation for each type that is supported. This means that each new resource would require a new annotation to be defined. Reading a new annotation requires a code change in the k8shim.
+
+In comparison when we look at the gang scheduling setup with the task groups specification we are far more flexible. In that case we allow a map of resources to be specified. The map uses the resource name as the key and allows a value as per the Kubenetes resource specification. This solution allows any resource type to be set as a request for a task group.
+
+An equivalent solution should be allowed for the quota annotation on the namespace. This would provide a more flexible solution that does not require code changes for every new resource type that must be supported as part of the namespace quota.
+
+### Annotation name
+The new name for the annotation should not interfere with the existing annotations that are used for the memory and cpu resource quota. Beside that rule we are free to use any name that complies with the naming conventions for names.
+
+The proposal is to use:
+```
+yunikorn.apache.org/namespace.quota
+```
+### Annotation content
+The content of the annotation must be a simple string. There are no length limits for a specific annotation. All annotations together on one object do have a size limit however that is not a restriction we have to plan around.
+
+Since the content must be a simple string we should use a simple json representation for the quota that contains a list of resources. Representing the quota:
+```
+yunikorn.apache.org/namespace.quota: "
+{
+cpu: 100m,
+memory: 1GB,
+nvidia.com/gpu: 1
+}
+"
+```
+
+Similar as for other resources we allow in annotations: we allow any string as the key content.
+The value content should be interpreted as a Kubernetes formatted resource quantity. Parsing will handle that enforcement. If any of the values do not comply with the formatting no quota will be set.
+Propagation to the core
+No changes are proposed or required. The quota is passed from the k8shim into the core via the application tags. The content of the tag is a Resource object as defined in the scheduler interface. The schedule interface Resource object supports arbitrary resources already. The content passed from the k8shim to the core will not change. There will also be no changes in the way the quota will be processed in the core as that processing is not linked to resource types.
+Backwards compatibility
+The current annotations will remain supported for the 1.x minor releases. Deprecation will be announced with the first release that supports the new annotation. Messages mentioning the processing of the old annotation will also be logged at a WARN level in the logs.
+
+Removing the existing annotation processing is a breaking change that could cause a large change in behaviour. Removal of processing for the old annotations should be part of the next major release. The next major release is 2.0.0. This is based on the fact that we do not have a deprecation policy defined as yet.
+
+Preference in processing will be with the new annotations. In the case that both the old and new annotations are present on the namespace the new annotation will be used. Using both old and new annotations, i.e. merging of the two sets, will not be supported.
diff --git a/versioned_docs/version-1.1.0/design/interface_message_simplification.md b/versioned_docs/version-1.1.0/design/interface_message_simplification.md
new file mode 100644
index 000000000..b6766ced4
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/interface_message_simplification.md
@@ -0,0 +1,309 @@
+---
+id: interface_message_simplification
+title: Simplifying Interface Messages
+---
+
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+
+# Simplifying Interface Messages and Breaking Shim build dependency on Core
+
+# Proposal
+This document describes a) complexity hidden behind existing Interface messages and 
+explains the newly defined SI messages and its dependent changes on Core and Shim. 
+b). Breaking Shim build dependency on Core.
+
+## Goals
+The goal is to provide the same functionality before and after the change.
+- unit tests before and after the merge must all pass.
+- Smoke tests defined in the core should all pass without major changes definition.
+- End-to-end tests that are part of the shim code must all pass without changes.
+## Background
+The current interface allows us to only send one message between a shim and the core. This provides us with a really simple way of interactions definition.
+
+The complexity is however hidden in the message itself. Every message serves multiple purposes and when the message is received the core and shim need to unpack it and process each part separately and for certain parts in a real specific order.
+Because the message serves a number of purposes it has a large overhead. This might not show up in the code directly as the heavy lifting is done in the generated code. It will show up in the amount of data as a message, even if it does not have all fields, still needs to be encoded in a way that it unpacks correctly on the other side.
+
+## Simplifying Interface Messages
+
+Proposal is to split the one large message into 3 separate messages - one for each entity:
+
+- Allocations
+- Applications
+- Nodes
+
+### API Interface Changes
+
+```
+package api
+
+import "github.com/apache/incubator-yunikorn-scheduler-interface/lib/go/si"
+
+type SchedulerAPI interface {
+    // Register a new RM, if it is a reconnect from previous RM, cleanup 
+    // all in-memory data and resync with RM. 
+    RegisterResourceManager(request *si.RegisterResourceManagerRequest, callback ResourceManagerCallback) (*si.RegisterResourceManagerResponse, error)
+    
+    // Update Allocation status
+    UpdateAllocation(request *si.AllocationRequest) error
+    
+    // Update Application status
+    UpdateApplication(request *si.ApplicationRequest) error
+    
+    // Update Node status
+    UpdateNode(request *si.NodeRequest) error
+    
+    // Notify scheduler to reload configuration and hot-refresh in-memory state based on configuration changes 
+    UpdateConfiguration(clusterID string) error
+}
+
+// RM side needs to implement this API
+type ResourceManagerCallback interface {
+	
+    //Receive Allocation Update Response
+    UpdateAllocation(response *si.AllocationResponse) error
+    
+    //Receive Application Update Response
+    UpdateApplication(response *si.ApplicationResponse) error
+    
+    //Receive Node update Response
+    UpdateNode(response *si.NodeResponse) error
+    
+    // Run a certain set of predicate functions to determine if a proposed allocation
+    // can be allocated onto a node.
+    Predicates(args *si.PredicatesArgs) error
+    
+    // RM side implements this API when it can provide plugin for reconciling
+    // Re-sync scheduler cache can sync some in-cache (yunikorn-core side) state changes
+    // to scheduler cache (shim-side), such as assumed allocations.
+    ReSyncSchedulerCache(args *si.ReSyncSchedulerCacheArgs) error
+    
+    // This plugin is responsible for transmitting events to the shim side.
+    // Events can be further exposed from the shim.
+    SendEvent(events []*si.EventRecord)
+    
+    // Scheduler core can update container scheduling state to the RM,
+    // the shim side can determine what to do incorporate with the scheduling state
+    // update container scheduling state to the shim side
+    // this might be called even the container scheduling state is unchanged
+    // the shim side cannot assume to only receive updates on state changes
+    // the shim side implementation must be thread safe
+    UpdateContainerSchedulingState(request *si.UpdateContainerSchedulingStateRequest)
+    
+    // Update configuration
+    UpdateConfiguration(args *si.UpdateConfigurationRequest) *si.UpdateConfigurationResponse
+}
+```
+
+### Interface Changes to replace UpdateRequest
+
+UpdateRequest would be divided into below messages:
+
+#### AllocationRequest
+```
+message AllocationRequest {
+  repeated AllocationAsk asks = 1;
+  AllocationReleasesRequest releases = 2;
+  string rmID = 3;
+}
+```
+#### ApplicationRequest
+```
+message ApplicationRequest {
+  repeated AddApplicationRequest new = 1;
+  repeated RemoveApplicationRequest remove = 2;
+  string rmID = 3;
+}
+```
+#### NodeRequest
+```
+message NodeRequest {
+  repeated NodeInfo nodes = 1;
+  string rmID = 2;
+}
+```
+### Merging Create and Update NodeInfo into Single NodeInfo
+```
+message NodeInfo {
+  enum ActionFromRM {
+    CREATE = 0;
+    UPDATE = 1;
+    DRAIN = 2;
+    SCHEDULABLE = 3;
+    DECOMISSION = 4;
+  }
+
+  string nodeID = 1;
+  ActionFromRM action = 2;
+  map<string, string> attributes = 3;
+  Resource schedulableResource = 4;
+  Resource occupiedResource = 5;
+  repeated Allocation existingAllocations = 6;
+}
+```
+
+### Event Changes to replace UpdateRequest
+
+RMUpdateRequestEvent would be replaced by following events:
+
+- RMUpdateAllocationEvent
+- RMUpdateApplicationEvent
+- RMUpdateNodeEvent
+
+### Interface Changes to replace UpdateResponse
+
+UpdateResponse would be divided into below messages:
+
+#### AllocationResponse
+```
+message AllocationResponse {
+  repeated Allocation new = 1;
+  repeated AllocationRelease released = 2;
+  repeated AllocationAskRelease releasedAsks =3;
+  repeated RejectedAllocationAsk rejected = 4;
+}
+```
+#### ApplicationResponse
+```
+message ApplicationResponse {
+  repeated RejectedApplication rejected = 1;
+  repeated AcceptedApplication accepted = 2;
+  repeated UpdatedApplication updated = 3;
+}
+```
+#### NodeResponse
+```
+message NodeResponse {
+  repeated RejectedNode rejected = 1;
+  repeated AcceptedNode accepted = 2;
+}
+```
+
+### Event Changes for UpdateResponse
+
+Scheduler/Context.go from Core already triggers an event for each entity separately and rmproxy.go is the one which handles all these events and packs it under single *si.UpdateResponse and eventually sends to shim through scheduler_callback#RecvUpdateResponse. With the above API interface change, rmproxy.go would use appropriate callback method to send response to shim. With this separate callback approach, each entity response would be handled separately in shim.
+
+## Detailed Flow Analysis
+
+### Add/Delete Allocations
+
+The RM (shim) sends a simplified AllocationRequest as described above. This message is wrapped by the RM proxy and forwarded to the cache for processing. The RM can request an allocation to be added or removed.
+
+```
+1. Shim sends a simplified AllocationRequest to core through SchedulerAPI.UpdateAllocation
+2. RMProxy sends rmevent.RMUpdateAllocationEvent to scheduler 
+3. On receiving the above event, scheduler calls context.handleRMUpdateAllocationEvent to do the 
+   following:
+   3.1: processAsks
+        2.1.2: Process each request.Asks ask of AllocationRequest request and adds to the application
+        2.1.2: In case of rejection, triggers RMRejectedAllocationAskEvent with 
+        all asks which has been rejected
+        2.1.2: On receiving RMRejectedAllocationAskEvent, RMProxy.processUpdatePartitionConfigsEvent 
+        process the event, creates a AllocationResponse using RMRejectedAllocationAskEvent 
+        attributes and send to shim through UpdateAllocation callback method
+   3.2: processAskReleases
+        2.2.1: Process each request.Releases.AllocationAsksToRelease ask release of AllocationRequest request 
+        and removes from the application
+   3.3: processAllocationReleases
+        3.3.1: Process each request.Releases.AllocationRelease allocation release of AllocationRequest 
+        request and removes from the application
+        3.3.2: Collect all above exact released allocations and triggers RMReleaseAllocationEvent with all allocations needs to be released
+        3.3.3: On receiving RMReleaseAllocationEvent, RMProxy.processRMReleaseAllocationEvent
+        prcoess the event, creates a AllocationResponse using RMReleaseAllocationEvent
+        attributes and send to shim through UpdateAllocation callback method
+        3.3.4: Collect all above confirmed (placeholder swap & preemption) allocations 
+        and send to shim through two ways 
+            a). Wraps confirmed allocations as AssumedAllocation 
+            and send to shim ReSyncSchedulerCache callback plugin 
+            b). Wraps confirmed allocations as Allocation and triggers 
+            RMNewAllocationsEvent. On receiving RMNewAllocationsEvent, 
+            RMProxy.processAllocationUpdateEvent process the event, creates a 
+            AllocationResponse using RMNewAllocationsEvent attributes and send to shim 
+            through UpdateAllocation callback method
+```
+
+### Add/Delete Applications
+
+The RM (shim) sends a simplified ApplicationRequest as described above. This message is wrapped by the RM proxy and forwarded to the cache for processing. The RM can request an application to be added or removed.
+
+```
+1. Shim sends a simplified ApplicationRequest to core through SchedulerAPI.UpdateApplication
+2. RMProxy sends rmevent.RMUpdateApplicationEvent to scheduler
+3. On receiving the above event, scheduler calls context.handleRMUpdateApplicationEvent to do the 
+   following:
+   3.1: Add new apps to the partition. 
+        3.1.2: Wraps AcceptedApps and RejectedApps (if any) as part of RMApplicationUpdateEvent 
+        and fires the same
+        3.1.2: On receiving RMApplicationUpdateEvent, RMProxy.processApplicationUpdateEvent 
+        process the event, creates a ApplicationResponse using RMApplicationUpdateEvent 
+        attributes and send to shim through UpdateApplication callback method
+   3.2 Remove apps from the partition.
+        3.2.1: Collect all allocations belongs to the removed app and triggers 
+        RMReleaseAllocationEvent with all allocations needs to be released
+        3.2.2: On receiving RMReleaseAllocationEvent, RMProxy.processRMReleaseAllocationEvent
+        prcoess the event, creates a AllocationResponse using RMReleaseAllocationEvent
+        attributes and send to shim through UpdateAllocation callback method
+```
+
+### Add/Delete Nodes
+
+The RM (shim) sends a simplified NodeRequest as described above. This message is wrapped by the RM proxy and forwarded to the cache for processing. The RM can request an node to be added or removed.
+
+```
+1. Shim sends a simplified NodeRequest to core through SchedulerAPI.UpdateNode
+2. RMProxy sends rmevent.RMUpdateNodeEvent to scheduler
+3. On receiving the above event, scheduler calls context.handleRMUpdateNodeEvent to do the 
+   following:
+   3.1: Add new node to the partition. 
+        3.1.2: Wraps AcceptedNodes and RejectedNodes (if any) as part of RMNodeUpdateEvent 
+        and fires the same
+        3.1.2: On receiving RMNodeUpdateEvent, RMProxy.processRMNodeUpdateEvent 
+        process the event, creates a NodeResponse using RMNodeUpdateEvent 
+        attributes and send to shim through UpdateNode callback method
+   3.2: Update node
+        3.2.1 Update the partition resource
+   3.3: Drain node
+        3.3.1 Ensures node is not schedulable
+   3.4: Decommissioning (Remove) node from the partition.
+        3.4.1: Ensures node is not schedulable
+        3.4.2: Collect all above exact released allocations from that node and triggers 
+        RMReleaseAllocationEvent with all allocations needs to be released
+        3.4.3: On receiving RMReleaseAllocationEvent, 
+        RMProxy.processRMReleaseAllocationEvent process the event, creates a 
+        AllocationResponse using RMReleaseAllocationEvent attributes and 
+        send to shim through UpdateAllocation callback method
+        3.4.4: Collect all above confirmed (placeholder swap & preemption) from that node 
+        allocations and send to shim through two ways 
+            a). Wraps confirmed allocations as AssumedAllocation and send to shim 
+            through ReSyncSchedulerCache callback plugin 
+            b). Wraps confirmed allocations as Allocation and triggers RMNewAllocationsEvent. 
+            On receiving RMNewAllocationsEvent, RMProxy.processAllocationUpdateEvent 
+            process the event, creates a AllocationResponse using RMNewAllocationsEvent 
+            attributes and send to shim through UpdateAllocation callback method
+```
+
+## Breaking the Shim build dependency on Core 
+
+Planned for different phases. 
+
+### Phase 1
+Moved all plugins from core to appropriate place in SI under ResourceManagerCallback, 
+a single common interface.
+
+### Phase 2
+Please refer https://issues.apache.org/jira/browse/YUNIKORN-930 for more details
diff --git a/versioned_docs/version-1.1.0/design/k8shim.md b/versioned_docs/version-1.1.0/design/k8shim.md
new file mode 100644
index 000000000..6d19a2a34
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/k8shim.md
@@ -0,0 +1,74 @@
+---
+id: k8shim
+title: Kubernetes Shim Design
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Github repo: https://github.com/apache/yunikorn-k8shim
+
+Please read the [architecture](architecture.md) doc before reading this one, you will need to understand
+the 3 layer design of YuniKorn before getting to understand what is the Kubernetes shim.
+
+## The Kubernetes shim
+
+The YuniKorn Kubernetes shim is responsible for talking to Kubernetes, it is responsible for translating the Kubernetes
+cluster resources, and resource requests via scheduler interface and send them to the scheduler core.
+And when a scheduler decision is made, it is responsible for binding the pod to the specific node. All the communication
+between the shim and the scheduler core is through the scheduler-interface.
+
+## The admission controller
+
+The admission controller runs in a separate pod, it runs a
+[mutation webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook)
+and a [validation webhook](https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/#validatingadmissionwebhook), where:
+
+1. The `mutation webhook` mutates pod spec by:
+   - Adding `schedulerName: yunikorn`
+     - By explicitly specifying the scheduler name, the pod will be scheduled by YuniKorn scheduler.
+   - Adding `applicationId` label
+     - When a label `applicationId` exists, reuse the given applicationId.
+     - When a label `spark-app-selector` exists, reuse the given spark app ID.
+     - Otherwise, assign a generated application ID for this pod, using convention: `yunikorn-<namespace>-autogen`. This is unique per namespace.
+   - Adding `queue` label
+     - When a label `queue` exists, reuse the given queue name. Note, if placement rule is enabled, values set in the label is ignored.
+     - Otherwise, adds `queue: root.default`
+   - Adding `disableStateAware` label
+     - If pod was assigned a generated applicationId by the admission controller, also set `disableStateAware: true`. This causes the generated application
+       to immediately transition from the `Starting` to `Running` state so that it will not block other applications.
+2. The `validation webhook` validates the configuration set in the configmap
+   - This is used to prevent writing malformed configuration into the configmap.
+   - The validation webhook calls scheduler [validation REST API](api/scheduler.md#configuration-validation) to validate configmap updates.
+
+### Admission controller deployment
+
+By default, the admission controller is deployed as part of the YuniKorn Helm chart installation. This can be disabled if necessary (though not recommended) by setting the Helm parameter `embedAdmissionController` to `false`.
+
+On startup, the admission controller performs a series of tasks to ensure that it is properly registered with Kubernetes:
+1. Loads a Kubernetes secret called `admission-controller-secrets`. This secret stores a pair of CA certificates which are used to sign the TLS server certificate used by the admission controller.
+2. If the secret cannot be found or either CA certificate is within 90 days of expiration, generates new certificate(s). If a certificate is expiring, a new one is generated with an expiration of 12 months in the future. If both certificates are missing or expiring, the second certificate is generated with an expiration of 6 months in the future. This ensures that both certificates do not expire at the same time, and that there is an overlap of trusted certificates.
+3. If the CA certificates were created or updated, writes the secrets back to Kubernetes.
+4. Generates an ephemeral TLS server certificate signed by the CA certificate with the latest expiration date.
+5. Validates, and if necessary, creates or updates the Kubernetes webhook configurations named `yunikorn-admission-controller-validations` and `yunikorn-admission-controller-mutations`. If the CA certificates have changed, the webhooks will also be updated. These webhooks allow the Kubernetes API server to connect to the admission controller service to perform configmap validations and pod mutations. 
+6. Starts up the admission controller HTTPS server.
+
+Additionally, the admission controller also starts a background task to wait for CA certificates to expire. Once either certificate is expiring within the next 30 days, new CA and server certificates are generated, the webhook configurations are updated, and the HTTPS server is quickly restarted. This ensures that certificates rotate properly without downtime.
+
+In production clusters, it is recommended to deploy the admission controller with two replicas by setting the Helm parameter `admissionController.replicaCount` to `2`. This will ensure that at least one admission controller webhook is reachable by the Kubernetes API server at all times. In this configuration, the CA certificates and webhook configurations are shared between the instances.
diff --git a/versioned_docs/version-1.1.0/design/namespace_resource_quota.md b/versioned_docs/version-1.1.0/design/namespace_resource_quota.md
new file mode 100644
index 000000000..90830b6a1
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/namespace_resource_quota.md
@@ -0,0 +1,183 @@
+---
+id: namespace_resource_quota
+title: Namespace Resource Quota
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+In K8s, user can setup namespace with [resource quotas](https://kubernetes.io/docs/concepts/policy/resource-quotas/) to limit aggregated resource consumption in this namespace. The validation of namespace resource quotas is handled in api-server directly, therefore YuniKorn simply honors the quotas like the default scheduler.
+
+## Best practice
+
+It is not mandatory to setup YuniKorn queues with respect of namespaces.
+However, in practice, it makes more sense to do so.
+Namespace is often used to set a cap for resource consumptions per user-group/team,
+YuniKorn queue is also meant to divide cluster resource into multiple groups.
+Let's go through an example.
+
+### 1. Setup namespace
+
+Namespace: `advertisement`:
+```
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+  name: advertisement
+spec:
+  hard:
+    requests.cpu: "200m"
+    requests.memory: 2000Mi
+    limits.cpu: "200m"
+    limits.memory: 4000Mi
+```
+Create the namespace
+```
+kubectl create namespace advertisement
+kubectl create -f ./advertisement.yaml --namespace=advertisement
+kubectl get quota --namespace=advertisement
+kubectl describe quota advertisement --namespace=advertisement
+
+// output
+Name:            advertisement
+Namespace:       advertisement
+Resource         Used  Hard
+--------         ----  ----
+limits.cpu       0     200m
+limits.memory    0     4000Mi
+requests.cpu     0     200m
+requests.memory  0     2000Mi
+```
+
+### 2. Setup YuniKorn queues
+
+Queue: `advertisement`:
+```
+name: advertisement
+resources:
+  guaranteed:
+    vcore: 100
+    memory: 1000
+  max:
+    vcore: 200
+    memory: 2000
+```
+
+ensure `QueueMaxResource <= NamespaceResourceQuotaRequests`
+
+### 3. Mapping applications to queues & namespace
+
+In a pod spec
+
+```
+apiVersion: v1
+kind: Pod
+metadata:
+  namespace: advertisement
+  labels:
+    app: sleep
+    applicationId: "application_2019_01_22_00001"
+    queue: "root.advertisement"
+  name: task0
+spec:
+  schedulerName: yunikorn
+  containers:
+    - name: sleep-5s
+      image: "alpine:latest"
+      command: ["/bin/ash", "-ec", "while :; do echo '.'; sleep 5 ; done"]
+      resources:
+        requests:
+          cpu: "50m"
+          memory: "800M"
+        limits:
+          cpu: "100m"
+          memory: "1000M"
+```
+
+Check Quota
+
+```
+kubectl describe quota advertisement --namespace=advertisement
+
+Name:            advertisement
+Namespace:       advertisement
+Resource         Used  Hard
+--------         ----  ----
+limits.cpu       100m  200m
+limits.memory    1G    4000Mi
+requests.cpu     50m   200m
+requests.memory  800M  2000Mi
+```
+
+Now submit another application,
+
+```
+apiVersion: v1
+kind: Pod
+metadata:
+  namespace: advertisement
+  labels:
+    app: sleep
+    applicationId: "application_2019_01_22_00002"
+    queue: "root.advertisement"
+  name: task1
+spec:
+  schedulerName: yunikorn
+  containers:
+    - name: sleep-5s
+      image: "alpine:latest"
+      command: ["/bin/ash", "-ec", "while :; do echo '.'; sleep 5 ; done"]
+      resources:
+        requests:
+          cpu: "200m"
+          memory: "800M"
+        limits:
+          cpu: "200m"
+          memory: "1000M"
+```
+
+pod will not be able to submitted to api-server, because the requested cpu `200m` + used cpu `100m` = `300m` which exceeds the resource quota.
+
+```
+kubectl create -f pod_ns_adv_task1.yaml
+Error from server (Forbidden): error when creating "pod_ns_adv_task1.yaml": pods "task1" is forbidden: exceeded quota: advertisement, requested: limits.cpu=200m,requests.cpu=200m, used: limits.cpu=100m,requests.cpu=50m, limited: limits.cpu=200m,requests.cpu=200m
+```
+
+## Future Work
+
+For compatibility, we should respect namespaces and resource quotas.
+Resource quota is overlapped with queue configuration in many ways,
+for example the `requests` quota is just like queue's max resource. However,
+there are still a few features resource quota can do but queue cannot, such as
+
+1. Resource `limits`. The aggregated resource from all pods in a namespace cannot exceed this limit.
+2. Storage Resource Quota, e.g storage size, PVC number, etc.
+3. Object Count Quotas, e.g count of PVCs, services, configmaps, etc.
+4. Resource Quota can map to priority class.
+
+Probably we can build something similar to cover (3) in this list.
+But it would be hard to completely support all these cases.
+
+But currently, setting applications mapping to a queue as well as a corresponding namespace is over complex.
+Some future improvements might be:
+
+1. Automatically detects namespaces in k8s-shim and map them to queues. Behind the scenes, we automatically generates queue configuration based on namespace definition. Generated queues are attached under root queue.
+2. When new namespace added/updated/removed, similarly to (1), we automatically update queues.
+3. User can add more configuration to queues, e.g add queue ACL, add child queues on the generated queues.
+4. Applications submitted to namespaces are transparently submitted to corresponding queues.
\ No newline at end of file
diff --git a/versioned_docs/version-1.1.0/design/pluggable_app_management.md b/versioned_docs/version-1.1.0/design/pluggable_app_management.md
new file mode 100644
index 000000000..d297adac3
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/pluggable_app_management.md
@@ -0,0 +1,75 @@
+---
+id: pluggable_app_management
+title: Pluggable App Management
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## The Problem
+
+Currently, we schedule and group an application is based on a label on the pod.
+This generic way works for any type of workload. It does however give us a limited information on the lifecycle
+and application. On the K8s side, operators have been introduced to provide more detail on the application
+and help scheduling. We cannot use them currently and want to add that functionality.
+
+## K8s Operator Pattern
+
+[K8s operator](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)
+is a pattern in K8s to manage applications, it's a handy way to manage application's lifecycle out-of-box on K8s.
+You define several CRDs and some controllers to monitor and mutate the state of the application based on the CRD definition.
+
+For example in [spark-k8s-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator),
+it defines a CRD called `SparkApplication`, the controller watches the events of add/update/delete of this CRD
+and trigger corresponding actions on event notifications. The `SparkApplication` looks like
+[this example](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/examples/spark-pi.yaml). There
+are a lot more popular operators, such as [flink-k8s-operator](https://github.com/GoogleCloudPlatform/flink-on-k8s-operator),
+ [tf-operator](https://github.com/kubeflow/tf-operator), [pytorch-operator](https://github.com/kubeflow/pytorch-operator), etc. 
+
+Use Spark as an example. YuniKorn is able to schedule resources for all pods in K8s, that seamlessly supports Spark. It
+works with [native Spark on K8s](https://spark.apache.org/docs/latest/running-on-kubernetes.html), or
+[spark on K8s with operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/design.md#architecture),
+you'll find the difference from the design architecture chart from the given link. To support native Spark on K8s,
+YuniKorn reads pods' spec and group Spark pods by a label-selector, based on `spark-app-selector`.
+The operator approach gives us more context about the Spark job, such as a better understanding about job state.
+But all these info requires us to look at `SparkApplication` CRD, currently, there is no neat way to
+add such functionality. That's why we need to design a flexible approach to support 3rd party operators
+(retrieving info from their CRDs), so we can easily integrate with other operators with small effort.
+
+## Design
+
+The key issue here is we need a app-management interface, that can be easily extended.
+It needs to be decoupled with existing scheduling logic. For each operator, we create a service to manage this type app's lifecycle,
+and communicate with the scheduling cache independently. The high-level design looks like below:
+
+![Pluggable App Management](./../assets/pluggable-app-mgmt.jpg)
+
+Where
+- `AppManagementService` is a composite set of services that can be managed together.
+- `AppManager` is a specific app management service for a particular type of application. In each service, it has
+   access to K8s clients, such as informers, listers, in order to monitor CRD events. And it collects necessary info
+   and talk with scheduler cache through `AMProtocol`.
+- `APIProvider` encapsulate a set of useful APIs that can be shared, such as kube-client, pod/node/storage informers, etc.
+   Each of such informers, it can be shared with multiple app managers, to avoid the overhead.
+- `AMProtocol` defines the basic interaction contract between app manager and the scheduler cache, that helps the cache
+   to performs app lifecycle management without understanding what type of the application it is.
+
+In the upon chart, the AppManagementService has 2 services, the _general_ one is managing normal applications, that
+recognizes applications by pod labels; the _spark-k8s-operator_ one watches `SparkApplication` CRD and manage jobs'
+lifecycle defined by this CRD.
\ No newline at end of file
diff --git a/versioned_docs/version-1.1.0/design/predicates.md b/versioned_docs/version-1.1.0/design/predicates.md
new file mode 100644
index 000000000..9233a25f3
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/predicates.md
@@ -0,0 +1,80 @@
+---
+id: predicates
+title: Support K8s Predicates
+---
+
+<!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements.  See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership.  The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License.  You may obtain a copy of the License at
+*
+*      http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+## Design
+
+Predicates are a set of pre-registered functions in K8s, the scheduler invokes these functions to check if a pod
+is eligible to be allocated onto a node. Common predicates are: node-selector, pod affinity/anti-affinity etc. To support
+these predicates in YuniKorn, we don't intend to re-implement everything on our own, but to re-use the core predicates
+code as much as possible.
+
+YuniKorn-core is agnostic about underneath RMs, so the predicates functions are implemented in K8s-shim as a `SchedulerPlugin`.
+SchedulerPlugin is a way to plug/extend scheduler capabilities. Shim can implement such plugin and register itself to
+yunikorn-core, so plugged function can be invoked in the scheduler core. Find all supported plugins in
+[types](https://github.com/apache/yunikorn-core/blob/master/pkg/plugins/types.go).
+
+## Workflow
+
+First, RM needs to register itself to yunikorn-core, it advertises what scheduler plugin interfaces are supported.
+E.g a RM could implement `PredicatePlugin` interface and register itself to yunikorn-core. Then yunikorn-core will
+call PredicatePlugin API to run predicates before making allocation decisions.
+
+
+Following workflow demonstrates how allocation looks like when predicates are involved.
+
+```
+pending pods: A, B
+shim sends requests to core, including A, B
+core starts to schedule A, B
+  partition -> queue -> app -> request
+    schedule A (1)
+      run predicates (3)
+        generate predicates metadata (4)
+        run predicate functions one by one with the metadata
+        success
+        proposal: A->N
+    schedule B (2)
+      run predicates (calling shim API)
+        generate predicates metadata
+        run predicate functions one by one with the metadata
+        success
+        proposal: B->N
+commit the allocation proposal for A and notify k8s-shim
+commit the allocation proposal for B and notify k8s-shim
+shim binds pod A to N
+shim binds pod B to N
+```
+
+(1) and (2) are running in parallel.
+
+(3) yunikorn-core calls a `schedulerPlugin` API to run predicates, this API is implemented on k8s-shim side.
+
+(4) K8s-shim generates metadata based on current scheduler cache, the metadata includes some intermittent states about nodes and pods.
+
+## Predicates White-list
+
+Intentionally, we only support a white-list of predicates. Majorly due to 2 reasons,
+* Predicate functions are time-consuming, it has negative impact on scheduler performance. To support predicates that are only necessary can minimize the impact. This will be configurable via CLI options;
+* The implementation depends heavily on K8s default scheduler code, though we reused some unit tests, the coverage is still a problem. We'll continue to improve the coverage when adding new predicates.
+
+The white-list currently is defined in [PredicateManager](https://github.com/apache/yunikorn-k8shim/blob/master/pkg/plugin/predicates/predicate_manager.go).
diff --git a/versioned_docs/version-1.1.0/design/resilience.md b/versioned_docs/version-1.1.0/design/resilience.md
new file mode 100644
index 000000000..ac0c93bac
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/resilience.md
@@ -0,0 +1,144 @@
+---
+id: resilience
+title: Resilience
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+This is not a HA (High-availability) design, HA implies that a service can
+survive from a fatal software/hardware failure. That requires one or more
+standby instances providing same services to take over active instance on failures.
+Resilience here means for YuniKorn, we can restart it without losing its state.
+
+## The problem
+
+YuniKorn is designed as a stateless service, it doesn't persist its state, e.g
+applications/queues/allocations etc, to any persistent storage. All states are
+in memory only. This design ensures YuniKorn to be able to response requests with
+low latency, and deployment mode is simple. However, a restart (or recovery) will
+have the problem to lose state data. We need a decent way to reconstruct all
+previous states on a restart.
+
+## Design
+
+### Workflow
+
+Scheduler core has no notion of "state", which means it does not know if it is under recovering.
+It is too complex to maintain a series of `scheduler states` in both core and shim, because we must
+keep them in-sync. However, if we live under a simple assumption: **scheduler core only responses
+requests, the correction of requests is ensured by shim according its current state**.
+The design becomes much simpler. This way, the shim maintains a state machine like below. When
+it is under `running` state, it sends new requests to the scheduler core as long as a new one is found;
+when under `recovering` state, it collects previous allocations and send recovery messages to
+the scheduler core, and waiting for recovery to be accomplished.
+
+Shim scheduler state machine
+
+```
+      Register                 Recover                Success
+New -----------> Registered -----------> Recovering ----------> Running
+                                             |   Fail
+                                              --------> Failed
+```
+
+Following chart illustrate how yunikorn-core and shim works together on recovery.
+
+![Workflow](./../assets/resilience-workflow.jpg)
+
+Restart (with recovery) process
+- yunikorn-shim registers itself with yunikorn-core
+- shim enters "recovering" state. Under "recovering" state, the shim only scans existing nodes and allocations, no new scheduling requests will be sent.
+  - shim scans existing nodes from api-server and added them to cache
+  - shim scans existing pods from api-server, filter out the pods that already assigned (scheduled to a node), and added that to cache (allocation in that node)
+  - shim sends update request to yunikorn-core with the info found in previous steps
+- yunikorn-core handles update requests, the steps should look like a replay of allocation process, including
+  - adding node
+  - adding applications
+  - adding allocations
+  - modifying queue resources
+  - update partition info
+- when all nodes are fully recovered, shim transits the state to "running"
+- shim notifies yunikorn-core that recovery is done, then yunikorn-core transits to "running" state.
+
+### How to determine recovery is complete?
+
+Shim queries K8s api-server to get how many nodes were available in this cluster. It tracks the recovering status of each node.
+Once all nodes are recovered, it can claim the recovery is completed. This approach requires us to add `recovering` and `recovered`
+states to nodes' state machine in the shim.
+
+### Node recovery
+
+In the shim layer, it maintains states for each node and pods running on this node. When start to recover nodes,
+all nodes initially are considered as under `recovering`. Only when all pods running on this node are fully recovered,
+the node can be considered as `recovered`.
+
+![node-recovery](./../assets/resilience-node-recovery.jpg)
+
+Like demonstrated on upon diagram,
+
+- Node0 is still recovering because pod0 is recovering.
+- Node1 is recovered (become schedulable) because all pods on this node have been recovered.
+- Node2 is lost, shim lost contact with this node. If after sometime this node comes back, shim should still try to recover this node.
+
+### Requests for recovery
+
+During recovery process, shim needs to collect all known information of applications, nodes and allocations from the underneath
+Resource Manager and use them for recovery.
+
+#### Applications
+
+Existing applications must be recovered first before allocations. Shim needs to scan all existing applications
+from nodes, and add applications info as a list of `AddApplicationRequest` in the `UpdateRequest`. This is same
+as the fresh application submission.
+
+```
+message AddApplicationRequest {
+  string applicationID = 1;
+  string queueName = 2;
+  string partitionName = 3;
+}
+```
+
+#### Nodes and allocations
+
+Once a shim is registered to the scheduler-core, subsequent requests are sent via `UpdateRequest#NewNodeInfo`
+(see more from [si.proto](https://github.com/apache/yunikorn-scheduler-interface/blob/master/si.proto)).
+The structure of the messages looks like,
+
+```
+message NewNodeInfo {
+  // nodeID
+  string nodeID = 1;
+  // optional node attributes
+  map<string, string> attributes = 2;
+  // total node resource
+  Resource schedulableResource = 3;
+  // existing allocations on this node
+  repeated Allocation existingAllocations = 4;
+}
+```
+Shim needs to scan all existing allocations on a node and wrap these info up as a `NewNodeInfo`, add that to a
+`UpdateRequest` and then send to scheduler-core.
+
+**Note**: the recovery of existing allocations depend on the existence of applications, which means applications must
+be recovered first. Since scheduler-core handles `UpdateRequest` one by one, it is required that all existing allocations
+in a `UpdateRequest` must from known applications or new applications embedded within the same `UpdateRequest`, which can be
+specified in `NewApplications` field. Scheduler-core ensures `NewApplications` are always processed first.
+
diff --git a/versioned_docs/version-1.1.0/design/scheduler_configuration.md b/versioned_docs/version-1.1.0/design/scheduler_configuration.md
new file mode 100644
index 000000000..54b616a4e
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/scheduler_configuration.md
@@ -0,0 +1,246 @@
+---
+id: scheduler_configuration
+title: Scheduler Configuration
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+The Yunikorn core scheduler configuration has two separate areas that need to be configured. The scheduler service itself, things like web service ports etc, and the queue configuration. The split between the two types of configuration is proposed with two points in mind:
+* Separation of duty
+* Dynamic vs Static
+
+The scheduler configuration is mainly static. There is no need to change a web service port or a scheduling policy while the service is running. The queue configuration is far more dynamic and can change while the service is running.
+
+From a separation of duty we can allow an operator that manages the cluster to make changes to the scheduler queues. You would not want to allow that administrator to change the scheduler configuration itself.
+
+Separated from the core scheduler configuration we have one or more shim configurations. We currently cannot anticipate the deployment model of the scheduler and its shims. A shim, like the k8s-shim, might run in the same container or node but there is no guarantee it will. We also do not know the number of shims that will be used with one core scheduler. There is also still the possibility to have multiple instances of the same shim with one core scheduler.
+
+Shim configuration must be independent of the core scheduler configuration.
+## Scheduler Configuration
+Scheduler configuration covers all the configuration needed to start the scheduler and the dependent services. The configuration consists of a simple key value pair. All configuration to start the service must be part of this configuration.
+The scheduler configuration must exclude the queue related configuration.
+
+Scheduler configuration as currently identified
+* Bind host
+* Service port
+* Web bind host
+* Web service port
+* SSL config
+* Shims Configured
+* SchedulerACL
+
+Configuration to consider:
+* Assign multiple containers in one go: use case is bin packing, don’t spread an application over large number of nodes. Needs to become configurable.
+* Pre-emption related configuration:
+    * threshold: do not pre-empt from a queue if the cluster load is below a certain threshold.
+    * Interval: pause between pre-emption checks
+## Queue Configuration
+### Queue Definition
+On startup the scheduler will load the configuration for the queues from the provided configuration file after initialising the service. If there is no queue configuration provided the scheduler should start up with a simple default configuration which performs a well documented default behaviour.
+Based on the kubernetes definition this configuration could be a configMap <sup id="s1">[1](#f1)</sup> but not a CRD.
+
+The queue configuration is dynamic. Changing the queue configuration must not require a scheduler restart.
+Changes should be allowed by either calling the GO based API, the REST based API or by updating the configuration file. Changes made through the API must be persisted in the configuration file. Making changes through an API is not a high priority requirement and could be postponed to a later release.
+
+The queue configuration defines queues in a hierarchy: a tree. The base of the tree is the _root_ queue. The queue configuration must define a single _root_ queue. All queues that are defined in queue configuration are considered _managed_ queues.
+
+The root queue reflect the whole cluster. Resource settings on the root queue are not allowed. The resources available to the root queue are calculated based on the registered node resources in the cluster. If resources would be specified on the root limit the cluster would either be artificially limited to a specific size or expect resources to be available that are not there.
+
+Queues in the hierarchy in the tree are separated by the “.” dot character (ASCII 0x2E). This indirectly means that a queue name itself cannot contain a dot as it interferes with the hierarchy separator. Any queue name in the configuration that contains a dot will cause the configuration to be considered invalid. However we must allow placement rules to create a queue with a dot based input.
+
+Not all queues can be used to submit an application to. Applications can only be submitted to a queue which does not have a queue below it. These queues are defined as the _leaf_ queues of the tree. Queues that are not a _leaf_ and thus can contain other queues or child queues are considered _parent_ queues.
+
+Each queue must have exactly one _parent_ queue, besides the root queue. The root queue cannot have a _parent_ and will be automatically defined as a _parent_ queue type.
+A fully qualified queue name, case insensitive, must be unique in the hierarchy. A queue in the hierarchy can thus be only uniquely identified by its fully qualified path. This means that a queue with the same name is allowed at a different point in the hierarchy.
+Example:
+```
+root.companyA.development
+root.companyB.development
+root.production.companyA
+```
+In the example the queues _companyA_ and _companyB_ are _parent_ queues. Both _development_ queues are _leaf_ queues.
+The second instance of the _companyA_ queue is a _leaf_ queue which is not related to the first instance as it is defined at a different level in the hierarchy.
+
+The queue as defined in the configuration will be assigned a queue type. This can either be implicit based on how the queue is defined in the hierarchy or explicit by setting the optional _parent_ property as part of the queue definition. By default all queues will be assigned their type based on the configuration. There is only one case in which this should automatic process would need to be overridden and that is to mark a _leaf_ in the configuration as a _parent_. The use case is part [...]
+
+Access control lists provide a split between submission permission and administration permissions. Submission access to a queue allows an application to be submitted to the queue by the users or groups specified. The administration permissions allows submission to the queue plus the administrative actions. Administrative actions are currently limited to killing an application and moving an application to a different queue.
+
+Access control lists are checked recursively up to the root of the tree starting at the lowest point in the tree. In other words when the access control list of a queue does not allow access the parent queue is checked. The checks are repeated all the way up to the root of the queues.
+
+On each queue, except the root queue, the following properties can be set:
+* QueueType:
+    * Parent (boolean)
+* Resource settings:
+    * Guaranteed (resource)
+    * Maximum (resource)
+* Running Application limit:
+    * Maximum (integer)
+* Queue Permissions:
+    * SubmitACL (ACL)
+    * AdminACL (ACL)
+* Pre emption setting:
+    * PreEmptionAllowed (boolean)
+* Application sort algorithm:
+    * ApplicationSortPolicy (enumeration: fair, fifo)
+
+On the root queue only the following properties can be set:
+* Running Application limit:
+    * Maximum (integer)
+* Queue Permissions:
+    * SubmitACL (ACL)
+    * AdminACL (ACL)
+* Application sort algorithm:
+    * ApplicationSortPolicy (enumeration: fair, fifo)
+
+### User definition
+Applications are run by a user could run in one or more queues. The queues can have limits set on the resources that can be used. This does not limit the amount of resources that can be used by the user in the cluster.
+
+From an administrative perspective setting a limit of the resources that can be used by a specific user can be important.  In this case a user is broadly defined as the identity that submits the application. This can be a service or a person, from a scheduling perspective there is no difference.
+User limits can prevent a take over of a queue or the cluster by a misbehaving user or application. From a multi tenancy perspective user limits also allows for sharing or subdivision of resources within the tenancy however that is defined.
+
+Adding user based limits will allow the cluster administrators to control the cluster wide resource usage of a user:
+* Running Application limit:
+    * Maximum (integer)
+* Resource setting:
+    * Maximum (resource)
+
+### Placement Rules definition
+Schedulers can place an application in a queue dynamically. This means that an application when submitted does not have to include a queue to run in.
+
+A placement rule will use the application details to place the application in the queue. The outcome of running a placement rule will be a fully qualified queue or a `fail`, which means execute the next rule in the list. Rules will be executed in the order that they are defined.
+
+During the evaluation of the rule the result could be a queue name that contains a dot. This is especially true for user and group names which are POSIX compliant. When a rule generates a partial queue name that contains a dot it must be replaced as it is the separator in the hierarchy. The replacement text will be `_dot_`
+
+The first rule that matches, i.e. returns a fully qualified queue name, will halt the execution of the rules. If the application is not placed at the end of the list of rules the application will be rejected. Rules can return queues that are not defined in the configuration only if the rule allows creation of queues.
+
+These queues created by the placement rules are considered _unmanaged_ queues as they are not managed by the administrator in the configuration. An administrator cannot influence the _unmanaged_ queue creation or deletion. The scheduler creates the queue when it is needed and removes the queue automatically when it is no longer used.
+
+Rules provide a fully qualified queue name as the result. To allow for deeper nesting of queues the parent of the queue can be set as part of the rule evaluation. The rule definition should allow a fixed configured fully qualified parent to be specified or it can call a second rule to generate the parent queue.  By default a queue is generated as a child of the root queue.
+
+Example:
+Placing an application submitted by the user _user1_ whom is a member of the groups _user1_ and _companyA_ in a queue based on UserName:
+```
+Rule name: UserName
+    Parent: root.fixedparent
+Result: root.fixedparent.user1
+
+Rule name: UserName
+    Parent: SecondaryGroup
+	Filter:
+        Type: allow
+	    Groups: company.*
+Result: root.companyA.user1
+
+Rule name: UserName
+Filter:
+    Users: user2,user3
+Result: denied placement
+```
+The default behaviour for placing an application in a queue, which would do the same as using the queue that is provided during submit, would be a rule that takes the provided queue with the create flag set to false.
+
+Access permissions will be enforced as part of the rule evaluation. For _managed_ queues this means that the ACL for the queue itself is checked. For an _unmanaged_ queue the parent queue ACL is the one that is checked. For the definition of the access control list and checks see the [Access Control Lists](#access-control-lists) chapter.
+
+Defining placement rules in the configuration requires the following information per rule:
+* Name:
+    * Name (string)
+* Parent
+    * Parent (string)
+* Create Flag:
+    * Create (boolean)
+* Filter:
+    * A regular expression or list of users/groups to apply the rule to.
+  
+The filter can be used to allow the rule to be used (default behaviour) or deny the rule to be used. User or groups matching the filter will be either allowed or denied.
+The filter is defined as follow:
+* Type:
+    * Type (string) which can have no value (empty) or "allow" or "deny", case insensitive.
+* Users:
+    * A list of zero or more user names. If the list is exactly one long it will be interpreted as a regular expression.
+* Groups:
+    * A list of zero or more group names. If the list is exactly one long it will be interpreted as a regular expression.
+
+Proposed rules for placing applications would be:
+* Provided: returns the queue provided during the submission
+* UserName: returns the user name
+* PrimaryGroupName: returns the primary group of the user
+* SecondaryGroupName: returns the first secondary group of the user that matches
+* Fixed: returns the queue name configured in the rule
+* ApplicationType: returns the application type (if available)
+
+For _unmanaged_ queues in the current revision of the configuration you cannot provide any queue specific properties. However in the future we should consider propagating specific resource related settings from a _managed_ parent to the _unmanaged_ child, specifically:
+* Dynamic Resource settings:
+    * Guaranteed (resource)
+    * Maximum (resource)
+* Dynamic Running Application limit:
+    * Maximum (integer)
+
+### Configuration updates
+Updating the queue definition will allow updating the existing queue properties as well as adding and removing queues. A new queue definition will only become active if the configuration can be parsed. The change of the definition is an atomic change which applies all modification in one action.
+
+Updating the queue properties will not automatically trigger further action. This means that if the maximum number of resources of a queue or its parent is changed we leave the applications in the queue running as they are. The scheduler will adhere to the new property values which should see the convergence over time.
+
+A _managed_ queue will only be removed if it is removed from the configuration. Before we can remove a queue it must not be running applications. This means that when a _managed_ queue is removed from the configuration it must be empty or the system needs to allow the queue to drain. Forcing a _managed_ queue to be empty before we can remove it is not possible which means that _managed_ queues are removed in multiple steps:
+1. The queue is removed from the configuration
+1. The queue is marked as `draining`
+1. All managed queues that are `draining` and empty are removed
+
+Long running applications should be handled gracefully when removing a _managed_ queue. The scheduler should at least track and expose that a queue has been in a _draining_ state for an extended period of time. In the optimal case the application should be notified of the queue change to allow it to release resources. In all cases the queue administrators should be notified to allow them to take action. This action would currently be a manual move of the application to a different queue  [...]
+
+_Unmanaged_ queues that are not defined in the queue definition are created by the scheduler automatically based on the placement rules. _Unmanaged_ queues have a lifespan independent of the configuration. Whenever an _unmanaged_ queue is empty it will get removed. The queue will automatically be created again when a new application is requesting it via triggering the placement rule.
+
+Removing an empty _managed_ or _unmanaged_ queue is handled by the same removal code which must run independent of the configuration updates and scheduling actions.
+
+Configurations can change over time. The impact of a fail over or restart must still be investigated.
+Base point to make: a changed configuration should not impact the currently running applications. Queues that no longer exist should be handled somehow.
+
+### Access Control Lists
+The scheduler ACL is independent of the queue ACLs. A scheduler administrator is not by default allowed to submit an application or administer the queues in the system.
+
+All ACL types should use the same definition pattern. We should allow at least POSIX user and group names which uses the portable filename character set <sup id="s2">[2](#f2)</sup>. However we should take into account that we could have domain specifiers based on the environment that the system runs in (@ sign as per HADOOP-12751).
+
+By default access control is enabled and access is denied. The only special case is for the core scheduler which automatically adds the system user, the scheduler process owner, to the scheduler ACL. The scheduler process owner is allowed to make sure that the process owner can use the API to call any administrative actions.
+
+Access control lists give access to the users and groups that have been specified in the list. They do not provide the possibility to explicitly remove or deny access to the users and groups specified in the list.
+
+The access control list is defined as:
+```
+ACL ::= “*” |  userlist [ “ “ grouplist ]
+userlist ::= “” | user { “,” user }
+grouplist ::= “” | group { “,” group }
+```
+
+This definition specifies a wildcard of * which results in access for everyone. If the user list is empty and the group list is empty nobody will have access. This deny all ACL has two possible representations:
+* an empty access control list.
+* a single space.
+
+If there is no access control list is configured access is denied by default.
+## Shim Configuration
+The shim configuration is highly dependent on the shim implementation. The k8s shim differs from the YARN shim. Currently the k8s shim is configured via command line options but we should not depend on that.
+
+### K8s shim
+The full configuration of the K8s shim is still under development.
+
+### YARN shim
+The full configuration of the YARN shim is still under development.
+
+---
+<br/><b id="f1"></b>1: https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#should-i-use-a-configmap-or-a-custom-resource. [↩](#s1)
+<br/><b id="f2"></b>2: The set of characters from which portable filenames are constructed. [↩](#s2)
+<br/>`A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 . _ -`
diff --git a/versioned_docs/version-1.1.0/design/scheduler_core_design.md b/versioned_docs/version-1.1.0/design/scheduler_core_design.md
new file mode 100644
index 000000000..917c172b2
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/scheduler_core_design.md
@@ -0,0 +1,401 @@
+---
+id: scheduler_core_design
+title: Scheduler Core Design
+---
+
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+
+:::caution
+The scheduler core design has changed. [YUNIKORN-317](https://issues.apache.org/jira/browse/YUNIKORN-317) was committed and has removed the scheduler cache.
+This document will not be maintained and is just for historical reference.
+See [scheduler cache removal design](cache_removal.md)
+:::
+
+Github repo: https://github.com/apache/yunikorn-core
+
+Scheduler core encapsulates all scheduling algorithms, it collects resources from underneath resource management
+platforms (like YARN/K8s), and is responsible for container allocation requests. It makes the decision where is the
+best spot for each request and then sends response allocations to the resource management platform.
+Scheduler core is agnostic about underneath platforms, all the communications are through the [scheduler interface](https://github.com/apache/yunikorn-scheduler-interface).
+
+## Components:
+
+```
+
+                     +---------------+  +--------------+
+                     |K8s Shim       |  |YARN Shim     |
+                     +---------------+  +--------------+
+
+                                +--------------+   +------------+
+                Scheduler-      | GRPC Protocol|   |Go API      |
+                Interface:      +--------------+   +------------+
+
++---------------------------------------------------------------------------+
+                     +--------------------+
+                     |Scheduler API Server|
+ +-------------+     +---------+----------+
+ |AdminService |               |
+ +-------------+               |Write Ops                    +----------------+
+ +-------------+               V                            ++Scheduler       |
+ |Configurator |      +-------------------+  Allocate       ||   And          |
+ +-------------+      |Cache Event Handler+<-----------------|                |
+         +----------> +-------------------+  Preempt        ++Preemptor       |
+          Update Cfg   Handled by policies                   +----------------+
+                               +  (Stateless)
+                        +------v--------+
+                        |Scheduler Cache|
+                        +---------------+
+                +---------------------------------------------+
+                |--------+ +------+ +----------+ +----------+ |
+                ||Node   | |Queue | |Allocation| |Requests  | |
+                |--------+ +------+ +----------+ +----------+ |
+                +---------------------------------------------+
+```
+
+### Scheduler API Server (RMProxy)
+
+Responsible for communication between RM and Scheduler, which implements scheduler-interface GRPC protocol,
+or just APIs. (For intra-process communication w/o Serde).
+
+### Scheduler Cache
+
+Caches all data related to scheduler state, such as used resources of each queues, nodes, allocations.
+Relationship between allocations and nodes, etc. Should not include in-flight data for resource allocation.
+For example to-be-preempted allocation candidates. Fair share resource of queues, etc.
+
+### Scheduler Cache Event Handler
+
+Handles all events which needs to update scheduler internal state. So all the write operations will be carefully handled.
+
+### Admin Service
+
+Handles request from Admin, which can also load configurations from storage and update scheduler policies.
+
+### Scheduler and Preemptor
+
+Handles Scheduler's internal state. (Which is not belong to scheduelr cache), such as internal reservations, etc.
+Scheduler and preemptor will work together, make scheduling or preemption decisions. All allocate/preempt request
+will be handled by event handler.
+
+## Scheduler's responsibility
+
+- According to resource usages between queues, sort queues, applications, and figure out order of application allocation. (This will be used by preemption as well).
+- It is possible that we cannot satisfy some of the allocation request, we need to skip them and find next request.
+- It is possible that some allocation request cannot be satisfied because of resource fragmentation. We need to reserve room for such requests.
+- Different nodes may belong to different disjoint partitions, we can make independent scheduler runs
+- Be able to config and change ordering policies for apps, queues.
+- Application can choose their own way to manage sort of nodes.
+
+## Preemption
+
+- It is important to know "who wants the resource", so we can do preemption based on allocation orders.
+- When do preemption, it is also efficient to trigger allocation op. Think about how to do it.
+- Preemption needs to take care about queue resource balancing.
+
+## Communication between Shim and Core 
+
+YuniKorn-Shim (like https://github.com/apache/yunikorn-k8shim) communicates with core by
+using scheduler-interface (https://github.com/apache/yunikorn-scheduler-interface).
+Scheduler interface has Go API or GRPC. Currently, yunikorn-k8shim is using Go API to communicate with yunikorn-core
+to avoid extra overhead introduced by GRPC. 
+
+**Shim (like K8shim) first need to register with core:** 
+
+```go
+func (m *RMProxy) RegisterResourceManager(request *si.RegisterResourceManagerRequest, callback api.ResourceManagerCallback) (*si.RegisterResourceManagerResponse, error)
+```
+
+Which indicate ResourceManager's name, a callback function for updateResponse. The design of core is be able to do scheduling for multiple clusters (such as multiple K8s cluster) just with one core instance.
+
+**Shim interacts with core by invoking RMProxy's Update API frequently, which updates new allocation request, allocation to kill, node updates, etc.** 
+
+```go
+func (m *RMProxy) Update(request *si.UpdateRequest) error
+```
+
+Response of update (such as new allocated container) will be received by registered callback.
+
+## Configurations & Semantics
+
+Example of configuration:
+
+- Partition is name space.
+- Same queues can under different partitions, but enforced to have same hierarchy.
+
+    Good:
+
+    ```
+     partition=x    partition=y
+         a           a
+       /   \        / \
+      b     c      b   c
+    ```
+
+    Good (c in partition y acl=""):
+
+    ```
+     partition=x    partition=y
+         a           a
+       /   \        /
+      b     c      b
+    ```
+
+    Bad (c in different hierarchy)
+
+    ```
+     partition=x    partition=y
+         a           a
+       /   \        /  \
+      b     c      b    d
+                  /
+                 c
+    ```
+
+    Bad (Duplicated c)
+
+    ```
+     partition=x
+         a
+       /   \
+      b     c
+     /
+    c
+
+    ```
+
+- Different hierarchies can be added
+
+    ```scheduler-conf.yaml
+    partitions:
+      - name:  default
+        queues:
+            root:
+              configs:
+                acls:
+              childrens:
+                - a
+                - b
+                - c
+                - ...
+            a:
+              configs:
+                acls:
+                capacity: (capacity is not allowed to set for root)
+                max-capacity: ...
+          mapping-policies:
+            ...
+      - name: partition_a:
+        queues:
+            root:...
+    ```
+
+## How scheduler do allocation
+
+Scheduler runs a separate goroutine to look at asks and available resources, and do resource allocation. Here's allocation logic in pseudo code: 
+
+Entry point of scheduler allocation is `scheduler.go: func (s *Scheduler) schedule()`
+
+```
+# First of all, YuniKorn has partition concept, a logical resource pool can consists
+# of one of multiple physical dis-joint partitions. It is similar to YARN's node
+# partition concept.
+
+for partition : partitions:
+  # YuniKorn can reserve allocations for picky asks (such as large request, etc.)
+  # Before doing regular allocation, YuniKorn look at reservedAllocations first.
+  for reservedAllocation : partition.reservedAllocations: 
+     reservedAllocation.tryAllocate(..)
+  
+  # After tried all reserved allocation, YuniKorn will go to regular allocation
+  partition.tryAllocate(..)
+  
+  # If there's any allocation created, scheduler will create an AllocationProposal
+  # and send to Cache to "commit" the AllocationProposal 
+```
+
+**Allocation by hierchical of queues**
+
+Inside `partition.tryAllocate` 
+
+It recursively traverse from root queue and down to lower level, for each level, logic is inside `pkg/scheduler/scheduling_queue.go func (sq *SchedulingQueue) tryAllocate`
+
+Remember YuniKorn natively supports hierarchical of queues. For ParentQueue (which has sub queues under the parent queue), it uses queue's own sorting policy to sort subqueues and try to allocate from most preferred queue to least-preferred queue. 
+
+For LeafQueue (which has applications inside the queue), it uses queue's own sorting policy to sort applications belong to the queue and allocate based on the sorted order. 
+
+(All sorting policies can be configured differently at each level.) 
+
+**Allocation by application**
+
+When it goes to Application, see (`scheduler_application.go: func (sa *SchedulingApplication) tryAllocate`), It first sort the pending resource requests belong to the application (based on requests' priority). And based on the selected request, and configured node-sorting policy, it sorts nodes belong to the partition and try to allocate resources on the sorted nodes. 
+
+When application trying to allocate resources on nodes, it will invokes PredicatePlugin to make sure Shim can confirm the node is good. (For example K8shim runs predicates check for allocation pre-check).
+
+**Allocation completed by scheduler** 
+
+Once allocation is done, scheduler will create an AllocationProposal and send to Cache to do further check, we will cover details in the upcoming section.
+
+## Flow of events
+
+Like mentioned before, all communications between components like RMProxy/Cache/Schedulers are done by using async event handler. 
+
+RMProxy/Cache/Scheduler include local event queues and event handlers. RMProxy and Scheduler have only one queue (For example: `pkg/scheduler/scheduler.go: handleSchedulerEvent`), and Cache has two queues (One for events from RMProxy, and one for events from Scheduler, which is designed for better performance). 
+
+We will talk about how events flowed between components: 
+
+**Events for ResourceManager registration and updates:**
+
+```
+Update from ResourceManager -> RMProxy -> RMUpdateRequestEvent Send to Cache
+New ResourceManager registration -> RMProxy -> RegisterRMEvent Send to Cache
+```
+
+**Cache Handles RM Updates** 
+
+There're many fields inside RM Update event (`RMUpdateRequestEvent`), among them, we have following categories: 
+
+```
+1) Update for Application-related updates
+2) Update for New allocation ask and release. 
+3) Node (Such as kubelet) update (New node, remove node, node resource change, etc.)
+```
+
+More details can be found at: 
+
+```
+func (m *ClusterInfo) processRMUpdateEvent(event *cacheevent.RMUpdateRequestEvent)
+
+inside cluster_info.go
+```
+
+**Cache send RM updates to Scheduler**
+
+For most cases, Cache propagate updates from RM to scheduler directly (including Application, Node, Asks, etc.). And it is possible that some updates from RM is not valid (such as adding an application to a non-existed queue), for such cases, cache can send an event back to RMProxy and notify the ResourceManager. (See `RMApplicationUpdateEvent.RejectedApplications`)
+
+**Cache handles scheduler config** 
+
+Cache also handles scheduler's config changes, see
+
+```go
+func (m *ClusterInfo) processRMConfigUpdateEvent(event *commonevents.ConfigUpdateRMEvent)
+```
+
+Similar to other RM updates, it propages news to scheduelr.
+
+**Scheduler do allocation**
+
+Once an AllocationProposal created by scheduler, scheduler sends `AllocationProposalBundleEvent` to Cache to commit. 
+
+Cache look at AllocationProposal under lock, and commit these proposals. The reason to do proposal/commit is Scheduler can run in multi-threads which could cause conflict for resource allocation. This approach is inspired by Borg/Omega/YARN Global Scheduling.
+
+Cache checks more states such as queue resources, node resources (we cannot allocate more resource than nodes' available), etc. Once check is done, Cache updates internal data strcture and send confirmation to Scheduler to update the same, and scheduler sends allocated Allocation to RMProxy so Shim can do further options. For example, K8shim will `bind` an allocation (POD) to kubelet.
+
+```
+Job Add:
+--------
+RM -> Cache -> Scheduler (Implemented)
+
+Job Remove:
+-----------
+RM -> Scheduler -> Cache (Implemented)
+Released allocations: (Same as normal release) (Implemented)
+Note: Make sure remove from scheduler first to avoid new allocated created. 
+
+Scheduling Request Add:
+-----------------------
+RM -> Cache -> Scheduler (Implemented)
+Note: Will check if requested job exists, queue exists, etc.
+When any request invalid:
+   Cache -> RM (Implemented)
+   Scheduler -> RM (Implemented)
+
+Scheduling Request remove:
+------------------------- 
+RM -> Scheduler -> Cache (Implemented)
+Note: Make sure removed from scheduler first to avoid new container allocated
+
+Allocation remove (Preemption) 
+-----------------
+Scheduler -> Cache -> RM (TODO)
+              (confirmation)
+
+Allocation remove (RM voluntarilly ask)
+---------------------------------------
+RM -> Scheduler -> Cache -> RM. (Implemented)
+                      (confirmation)
+
+Node Add: 
+---------
+RM -> Cache -> Scheduler (Implemented)
+Note: Inside Cache, update allocated resources.
+Error handling: Reject Node to RM (Implemented)
+
+Node Remove: 
+------------
+Implemented in cache side
+RM -> Scheduler -> Cache (TODO)
+
+Allocation Proposal:
+--------------------
+Scheduler -> Cache -> RM
+When rejected/accepted:
+    Cache -> Scheduler
+    
+Initial: (TODO)
+--------
+1. Admin configured partitions
+2. Cache initializes
+3. Scheduler copies configurations
+
+Relations between Entities 
+-------------------------
+1. RM includes one or multiple:
+   - Partitions 
+   - Jobs
+   - Nodes 
+   - Queues
+   
+2. One queue: 
+   - Under one partition
+   - Under one RM.
+   
+3. One job: 
+   - Under one queue (Job with same name can under different partitions)
+   - Under one partition
+
+RM registration: (TODO)
+----------------
+1. RM send registration
+2. If RM already registered, remove old one, including everything belong to RM.
+
+RM termination (TODO) 
+--------------
+Just remove the old one.
+
+Update of queues (TODO) 
+------------------------
+Admin Service -> Cache
+
+About partition (TODO) 
+-----------------------
+Internal partition need to be normalized, for example, RM specify node with partition = xyz. 
+Scheduler internally need to normalize it to <rm-id>_xyz
+This need to be done by RMProxy
+
+```
diff --git a/versioned_docs/version-1.1.0/design/scheduler_object_states.md b/versioned_docs/version-1.1.0/design/scheduler_object_states.md
new file mode 100644
index 000000000..e5a8b8f0e
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/scheduler_object_states.md
@@ -0,0 +1,127 @@
+---
+id: scheduler_object_states
+title: Scheduler Object States
+---
+
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+
+The YuniKorn project uses state machines to track the states of different objects.
+This ranges from applications in the core to nodes in the k8shim.
+The state machines are independent and not shared between the resource managers and core.
+A resource manager shim, and the core can thus have an independent idea of the state of a similar object.
+
+## Core Scheduler
+State change are triggered by events that get processed.
+One event can cause a change for multiple states or no change at all.
+
+### Application State 
+Applications have a complex state model.
+An application when created starts ain the new state.
+
+An application can have the following states:
+* New: A new application that is being submitted or created, from here the application transitions into the accepted state when it is ready for scheduling.
+The first ask to be added will trigger the transition.
+* Accepted: The application is ready and part of the scheduling cycle.
+On allocation of the first ask the application moves into a starting state.
+This state is part of the normal scheduling cycle.
+* Starting: The application has exactly one allocation confirmed this corresponds to one running container/pod. 
+The application transitions to running if and when more allocations are added to the application.
+This state times out automatically to prevent applications that consist of just one allocation from getting stuck in this state.
+The current time out is set to 5 minutes, and cannot be changed.
+If after the timeout expires the application will auto transition to running.
+The state change on time out is independent of the number of allocations added. 
+This state is part of the normal scheduling cycle.
+* Running: The state in which the application will spend most of its time.
+Containers/pods can be added to and removed from the application. 
+This state is part of the normal scheduling cycle.
+* Completing: An application that has no pending requests or running containers/pod will be completing.
+This state shows that the application has not been marked completed yet but currently is not actively being scheduled.
+* Completed: An application is considered completed when it has been in the completing state for a defined time period.
+From this state the application can only move to the Expired state, and it cannot move back into any of scheduling states (Running or Completing)
+The current timeout is set to 30 seconds.
+* Expired: The completed application is tracked for a period of time, after that is expired and deleted from the scheduler.
+This is a final state and after this state the application cannot be tracked anymore. 
+* Failing: An application marked for failing, what still has some allocations or asks what needs to be cleaned up before entering into the Failed state. 
+  The application can be Failing when the partition it belongs to is removed or during gang scheduling, if the placeholder processing times out, and the application has no real allocations yet.
+* Failed: An application is considered failed when it was marked for failure and all the pending requests and allocations were already removed.
+This is a final state. The application cannot change state after entering.
+* Rejected: The application was rejected when it was added to the scheduler. 
+This only happens when a resource manager tries to add a new application, when it gets created in a New state, and the scheduler rejects the creation.
+Applications can be rejected due ACLs denying access to a queue the application has specified, or a placement via placement rules has failed. 
+This is a final state. The application cannot change state after entering.
+
+The events that can trigger a state change:
+* Reject: rejecting the application by the scheduler (source: core scheduler)
+* Run: progress an application to the next active state (source: core scheduler)
+* Complete: mark an application as idle or complete (source: core scheduler)
+* Fail: fail an application (source: resource manager or core scheduler)
+* Expire: progress the application to the expired state and remove it from the scheduler (source: core scheduler)
+
+Here is a diagram that shows the states with the event that causes the state to change:  
+![application state diagram](./../assets/application-state.png)
+
+### Object State
+<!-- fix the draining to stopped transition -->
+The object state is used by the following objects:
+* queues
+* partitions
+
+The object states are as follows: 
+* Active: The object is active and used during the scheduling cycle.
+This is the starting and normal state of an object.
+An active object transitions to draining when it is removed.  
+* Stopped: The object is stopped and no longer actively scheduled.
+The object if empty is ready to be removed from the scheduler.
+The object can transition back into active state if it gets re-started.
+* Draining: Before an object can be removed it needs to be cleaned up.
+The cleanup starts with placing the object in the draining state.
+In this state it does not accept additions or changes but is still actively being scheduled.
+This allows for a graceful shutdown, cleanup and removal of the object.
+This is the final state.
+
+The events that can trigger a state change:
+* Start: make the object active (source: core scheduler)
+* Stop: make the object inactive (source: core scheduler)
+* Remove: mark an object for removal (source: core scheduler)
+
+Here is a diagram that shows the states with the event that causes the state to change:  
+![object state diagram](./../assets/object-state.png)
+
+### Node
+<!-- should start using object state -->
+Node objects in the core are not using a state machine but do have a state.
+A node can have one of two states: `schedulable` or `not schedulable`.
+There is no complex state model or complex transition logic.
+The scheduler can either use the node or not.
+
+The node status changes based on the status provided by the resource manager (shim) that owns the node. 
+
+## K8Shim Resource Manager
+
+### Application
+![application state diagram](./../assets/k8shim-application-state.png)
+
+### Task
+![task state diagram](./../assets/k8shim-task-state.png)
+
+### Node
+![node state diagram](./../assets/k8shim-node-state.png)
+
+### Scheduler
+![scheduler state diagram](./../assets/k8shim-scheduler-state.png)
diff --git a/versioned_docs/version-1.1.0/design/scheduler_plugin.md b/versioned_docs/version-1.1.0/design/scheduler_plugin.md
new file mode 100644
index 000000000..f26b86abc
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/scheduler_plugin.md
@@ -0,0 +1,112 @@
+---
+id: scheduler_plugin
+title: K8s Scheduler Plugin
+---
+
+<!--
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements.  See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership.  The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License.  You may obtain a copy of the License at
+*
+*      http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+-->
+
+## Background
+
+YuniKorn (on Kubernetes) is traditionally implemented as a ground-up implementation of a Kubernetes scheduler.
+This has allowed us to innovate rapidly, but is not without its problems; we currently have numerous places
+where we call into non-public K8S source code APIs with varying levels of (code) stability, requiring
+sometimes very disruptive code changes when we switch to new Kubernetes releases.
+
+Ideally, we should be able to take advantage of enhancements to new Kubernetes releases automatically.
+Using the plugin model enables us to enhance the Kubernetes scheduling logic with YuniKorn features.
+This also helps keep YuniKorn compatible with new Kubernetes releases with minimal effort.
+
+Additionally, it is desirable in many cases to allow non-batch workloads to bypass the YuniKorn scheduling
+functionality and use default scheduling logic. However, we have no way to do that today as the default
+scheduling functionality is not present in the YuniKorn scheduler binary.
+
+Since Kubernetes 1.19, the Kubernetes project has created a stable API for the
+[Scheduling Framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/),
+which allows plugins to be created which implement various extension points. Plugins implement one or more
+of these extension points, and are then compiled into a scheduler binary which contains the default
+scheduler and plugin code, configured to call into the plugins during normal scheduling flow.
+
+## Design
+
+We have added a scheduler plugin to the k8s-shim codebase which can be used to build a Kubernetes
+scheduler binary that includes YuniKorn functionality as well as the default scheduler functionality,
+significantly improving the compatibility of YuniKorn with upstream Kubernetes and allowing deployment of
+YuniKorn as the sole scheduler in a cluster with much greater confidence.
+
+Separate docker images are created for the scheduler. The traditional YuniKorn scheduler is built as
+`scheduler-{version}` while the new plugin version is built as `scheduler-plugin-{version}`. Either can be
+deployed interchangeably into a Kubernetes cluster with the same helm charts by customizing the scheduler
+image to deploy.
+
+## Entrypoints
+
+The existing shim `main()` method has been relocated to `pkg/cmd/shim/main.go`, and a new `main()` method
+under `pkg/cmd/schedulerplugin/main.go` has be created. This method instantiates the default Kubernetes
+scheduler and adds YuniKorn to it as a set of plugins. It also modifies the default scheduler CLI argument
+parsing to add YuniKorn-specific options. When the YuniKorn plugin is created, it will launch an instance
+of the existing shim / core schedulers in the background, sync all informers, and start the normal YuniKorn
+scheduling loop.
+
+## Shim Scheduler Changes
+
+In order to cooperate with the default scheduler, the shim needs to operate slightly differently when in
+plugin mode. These differences include:
+
+ - In `postTaskAllocated()`, we don’t actually bind the Pod or Volumes, as this is the responsibility of
+   the default scheduler framework. Instead, we track the Node that YK allocated for the Node in an
+   internal map, dispatch a new BindTaskEvent, and record a `QuotaApproved` event on the Pod.
+ - In `postTaskBound()`, we update the Pod’s state to `QuotaApproved` as this will cause the default scheduler
+   to re-evaluate the pod for scheduling (more on this below).
+ - In the scheduler cache, we track pending and in-progress pod allocations, and remove them if a pod is
+   removed from the cache.
+
+## Plugin Implementation
+
+To expose the entirety of YuniKorn functionality, we implement three of the Scheduling Framework Plugins:
+
+### PreFilter
+
+PreFilter plugins are passed a reference to a Pod and return either `Success` or `Unschedulable`, depending
+on whether that pod should be considered for scheduling.
+
+For the YuniKorn implementation, we first check the Pod to see if we have an associated `applicationId`
+defined. If not, we immediately return `Success`, which allows us to delegate to the default scheduler for
+non-batch workloads.
+
+If an `applicationId` is present, then we determine if there is a pending pod allocation (meaning the
+YuniKorn core has already decided to allocate the pod). If so, we return `Success`, otherwise `Unschedulable`.
+Additionally, if an in-progress allocation is detected (indicating that we have previously attempted to
+schedule this pod), we trigger a `RejectTask` event for the YuniKorn core so that the pod will be sent back
+for scheduling later.
+
+### Filter
+
+Filter plugins are used to filter out nodes that cannot run a Pod. Only Pods which pass the PreFilter stage
+are evaluated. 
+
+For the YuniKorn plugin, we follow similar logic to PreFilter, except that we also validate that the pending
+pod allocation matches the node YuniKorn chose for the pod. If the node matches, we transition the pending
+allocation to an in-progress allocation. This helps ensure that we stay in sync with the default scheduler,
+as it is possible that we allow an allocation to proceed but the bind fails for some reason.
+
+### PostBind
+
+The PostBind extension point is used informationally to notify the plugin that a pod was successfully bound.
+
+The YuniKorn implementation uses this to clean up the outstanding in-progress pod allocations.
diff --git a/versioned_docs/version-1.1.0/design/simple_preemptor.md b/versioned_docs/version-1.1.0/design/simple_preemptor.md
new file mode 100644
index 000000000..7dbce187c
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/simple_preemptor.md
@@ -0,0 +1,114 @@
+---
+id: simple_preemptor
+title: DaemonSet Scheduling using Simple Preemptor
+---
+
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+# Design & Implementation of Preemption for DaemonSet Pods using Simple Preemptor
+
+The simplistic approach to preempt or free up resources of running applications for DaemonSet pods. A good example of daemonset pod is fluentd logging pod which is very essential for any applicaton pod running in the node for logging.
+
+## When to start preemption?
+[YUNIKORN-1184](https://issues.apache.org/jira/browse/YUNIKORN-1184) ensures daemon set pods have been allocated properly if resources are available on the required node, otherwise, reserve the same required node so that it can be picked up to make reservation as reserved allocation (AllocatedReserved) in the next scheduling cycle. However, the whole process of modifying the reservation to reserved allocation depends on how much resources are freed up in the meantime. Duration for freein [...]
+
+By any chance, before the next run of the regular scheduling cycle (context#schedule() ), resources become available and particularly on that specific required node, then nothing needs to be done. It just moves ahead with the next steps. In case of resource constraints, unlike the regular pod reservation, other nodes cannot be tried by calling application#tryNodesNoReserve() as this demonset pod needs to run only on the specific required node. Instead, we can fork a new go routine (trigg [...]
+
+
+## How to do preemption?
+K8s does the preemption based on the pod priority. Pods with lower priority would be chosen first and so on. Proposal is not to depend on k8s for Preemption. Instead, Core should take the responsibility of finding out the list of pods that needs to be preempted, making communications to Shim and finally expecting the preempted resources to allocate to the corresponding daemonset automatically as part of the regular scheduling cycle.
+
+### Steps in trigger_preempt_workflow() go routine:
+
+##### Reservation age check (1)
+We can introduce a new Reservation age “createtime” (can be added to the reservation object) to check against the configured value of preemption_start_delay, a property to define the minimal waiting time to start the preemption process. Once reservation age exceeds this waiting time, the next step would be carried out. Otherwise, the corresponding reservation has to wait and can be processed next time.
+
+##### Get allocations from specific required Node (2)
+Get all allocations from the required node of the daemonset pod and go through the below Pre-filter pods step to filter the pods not suited for a preemption.
+
+##### Pre-filter pods to choose Victims/Candidates suited for Preemption (3)
+
+Core should filter the pods based on the following criteria:
+
+###### DaemonSet Pods
+
+All Daemonset pods should be filtered out completely irrespective of priority settings. Depending on the “requiredNode” value of pod spec, these pods can be filtered out and cannot be taken forward for the remaining process.
+
+![simple_preemptor](./../assets/simple_preemptor.png)
+
+##### Ordering Victim pods (4)
+
+###### Pods classification
+
+Once pods has been filtered out, need to classify the pods based on its types:
+
+1. Regular/Normal Pods (RP)
+2. Driver/Owner Pods (DP)
+3. Preemption Opt out Pods (OP)
+
+This classification ensures different treatment for each type of pod so that victims can be chosen among these pods in the same order. Please refer to the above diagram. It shows the 2-Dimensional array (NOTE: “Array” has been used only for documentation purposes, need to finalize the appropriate data structure) with each sub array holding pods of the same type. 1st sub array has RP’s, 2nd sub array has DP’s, 3rd sub array has OP’s and goes on.
+
+Regular/Normal Pods (RP)
+
+The regular/normal pods should be gathered and placed in the 1st sub array as these pods are given first preference for choosing the victims. In general, preempting these pods have very little impact when compared to other types/classes of pods. Hence, keeping these pods in the first subarray is the right choice
+
+Application Owner (DP)
+
+Pod acting as owner/master for other pods in the same application should be placed in the 2nd sub array because preempting those kinds of pods has a major impact when compared to Regular pods. We can select these pods by checking whether any owner reference exists between this pod and other pods. This will help prevent scenarios such as a driver pod being evicted at a very early stage when other alternatives are available for choosing the victim.
+
+Preemption Opt out (OP)
+
+Pods can be allowed to run with the Preempt opt out option. So, Pods marked with opt out should be placed in the 3rd sub array and can be used to choose victims as a last option. For now, we can use a label such as yunikorn.apache.org/allow-preemption: false for detecting those pods.
+
+
+As and when we want to introduce a new class/type of Pods, a new sub array would be created for the same and would be placed in the main array based on its significance.
+
+###### Sorting Pods
+
+Each sub array should be sorted based on the multiple criteria:
+
+1. Priority
+2. Age
+3. Resource
+
+Each sub array would be sorted priority wise, age wise and finally resource wise. The 1st sub array carrying Regular Pods has 4 pods of priority 1 and 2 pods of Priority 2. Among the 4 pods of the same priority, 3 pods are of the same age as well. Hence, again sorting resource wise really adds value and sorts them in the above shown order. Please refer to “zone”.
+
+#### Victim pods selection strategy (5)
+
+Introduce a new configuration, preemption_victim_poselection_strategy with different options (single, multiple etc) but later options act as fallback to earlier one. Defining an order for these options should be possible and upto the Administrator to place the options in an order he/she desires. Depending on the values, the whole selection strategy mechanism can decide whether a “fallback” approach among these options should be followed or not. Depending on the value, the selection strat [...]
+
+##### 1. Single Victim Pod
+
+Single Victim Pod, but resource deviation between victim pod and daemonset pod is not beyond configurable percentage. Configuring deviation with lower percentage (for example, 5% or 10%) helps prevent evicting victim pods already running with higher resource requirements. If there are many single victims found within the defined deviation %, then selection starts based on deviation % ascending order as intent is to choose the victim as close as possible to the daemonset pod resource requ [...]
+
+##### 2. Multiple Victim Pods
+
+Multiple Victim Pods, but no. of victim pods not more than configured value. This selection strategy helps to choose more than one victim, starts with the victim (resource wise descending order) and goes upto to a stage where total resource of victims meets the daemonset resource requirements but ensuring total count of victim pods not exceeding configured value.
+
+New config: preemption_victim_pods_selection_strategy
+Possible values are single,multiple (default) or multiple,single or single or multiple
+
+In case of more than one value (for ex. single,multiple), fallback would be followed as described above.
+
+#### Communicate the Pod Preemption to Shim (6)
+
+Once the list of pods has been finalized for preemption, Core can make a call to Shim for termination using notifyRMAllocationReleased (with type as TerminationType_PREEMPTED_BY_SCHEDULER). Shim can process the request as usual by making a call to K8s to delete the pod and subsequently call failTaskPodWithReasonAndMsg to notify the pod with reasons.
+
+### What happens after Preemption?
+
+Shim makes a call to K8s to delete the pod. Once k8s delete the pod, shim gets a notification from k8 and passes the information to core. This flow happens for any pod deletion and exists even today. So, even for preempted resources, we can leave it upto the regular scheduling cycle and Core-Shim communications to allocate these freed up preempted resources to the daemonset pod as node has been already reserved much earlier before the above described whole preemption workflow has begun.
\ No newline at end of file
diff --git a/versioned_docs/version-1.1.0/design/state_aware_scheduling.md b/versioned_docs/version-1.1.0/design/state_aware_scheduling.md
new file mode 100644
index 000000000..f92f93cff
--- /dev/null
+++ b/versioned_docs/version-1.1.0/design/state_aware_scheduling.md
@@ -0,0 +1,112 @@
+---
+id: state_aware_scheduling
+title: Batch Workloads Ordering with StateAware Policy
+---
+
+<!--
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ -->
+
+## The problem
+A common pattern while processing data is that the application can be divided into multiple stages.
+Another way to look at this is the fact that processing needs to be kicked off and that the first step is to start a driver or manager for the application.
+Later stages might depend on the previous stages.
+When running applications in a size limited environment this could lead to a resource exhaustion when submitting multiple applications at the same time.
+These first stages might consume all available resources leaving no room for the next stage(s) to start.
+Often this issue is caused by having a high number of applications start simultaneous and trying to get resources in parallel.
+### Example issue
+When submitting numerous Spark applications in a short amount of time the drivers will all be started shortly after each other.
+The drivers consume all available resources in a queue or in the whole cluster.
+After starting the drivers they will request resources for the executors. 
+Since the queue or cluster has no resources left the executors will not be started.
+The driver cannot progress. 
+The only way that progress would be made is if and when one of the drivers finishes or fails and frees up resources for executors to be started.
+
+## Design
+### Design goals
+1. Prevent resource exhaustion by first stage allocations
+1. Improve chance for jobs to get minimal required resources over others
+
+### None goals
+1. This is NOT an implementation of Gang scheduling.
+1. No change to the currently supported FAIR or FIFO scheduling algorithms
+1. Fix resource quota usage outside of the core scheduler for submitted but waiting applications
+
+### Possible solutions
+Other batch schedulers like the YARN schedulers use a limit on the number of simultaneous running applications.
+They use either resource constraints on the driver or management stage or set a hard limit of the number of applications that can run in a queue.
+The draw back of that solution is that it does not work well in a cluster that can scale up or down automatically in a cloud environment.
+To work around that percentage based limits could be set on the consumed resources for the driver or management stage.
+This does not alleviate the fact that driver or management stages can be of any size, large and or small, which complicates the percentage scenario further as it does not give a predictable behaviour.
+
+A different solution would be to assume a specific behaviour of the applications.
+Using that assumption a limit on the applications could be set based on the state it is in.
+The spark driver and executor behaviour is the most usual use case.
+This would provide a way to limit scheduling to existing applications and only drip feed new applications into the list of applications to schedule when there are resources available.
+
+### Algorithm
+The algorithm described here is based on the drip feed of new applications into the applications to schedule based on the states of all applications.
+Scheduling is based on the applications in a queue.
+The algorithm will be applied at a queue level.
+This is not a cluster wide setup.
+
+What we want to achieve is the following behaviour: only schedule one (1) application that is in its early stage(s) (called a starting state) at the same time.
+Only consider another new application if and when the previous application has transitioned out of the starting state.
+Applications will always be allocated resources on a first in first out basis based on submission time.
+That means that an application that is newly added and in its starting phase will only get resources if applications in the later stages do not need any resources.
+
+This algorithm will be implemented as an application sorting policy on a queue.
+This allows specific queues to limit parallel application startup while other queues with different work loads can schedule without or with different limitations.
+
+### Fallback mechanism
+A fallback mechanism has to be built into the algorithm.
+Not all applications will request more than one allocation.
+The other case that has to be accounted for could be a misbehaving or a slow application.
+Having an application stuck in the starting state could cause a scheduler livelock and starvation of other applications.
+
+The fall back mechanism proposed is as simple as a time limit on the starting state.
+This means that any application auto progresses out of the starting state.
+The time limit will be set to five (5) minutes hard coded as a starting point and will not be made configurable.
+
+The other fallback mechanism considered was making the number of allocations for the starting state configurable.
+This option provided a number of issues which made it difficult to implement.
+One of the main stumbling blocks is the fact that it requires the application submitter to specify the value.
+It also does not guarantee that the application will leave the starting state either and does not fix the livelock issue.
+If an application was submitted with five required allocation but due to issues during the run never asked for more than four then the livelock would still occur.
+
+Setting a default of zero (0) would also not work as it would bypass the starting state.
+It would make the sorting policy an opt-in instead of an opt-out.
+Setting a default of one (1) does not give us much enhancement to what we currently propose.
+It makes the sorting policy an opt-out but does not give the cluster administrator any control over the scheduling behaviour.
+Weighing those against each other the proposal is to not make this configurable.
+
+### Example run
+Using Spark applications as an example: a new application can only be scheduled if the previous application has at least one (1) executor allocated.
+
+![images](./../assets/fifo-state-example.png)
+
+Assume we have the following Spark apps: App1 & App2 as in the diagram above. The applications were submitted in that order: App1 first, then App2. They were both submitted to the same queue.
+
+1. Both applications are in the queue waiting for the first allocation: accepted by the scheduler. App1 has requested driver D1 and App2 has requested driver D2.
+1. The scheduler sorts the application and allows 1 accepted application to be scheduled (no starting applications yet): App1 as the oldest applications with an outstanding request is scheduled.  
+App1 is allocated its driver (D1) and progresses to starting.  
+App2 request for a driver is ignored as the scheduler is starting App1 (only 1 application in starting or accepted state is scheduled).
+1. App1 requests executors E11 and E12. The scheduler assigns E11 and E12. At this point the application state changes to running when it has at least 1 executor allocated.
+1. App2 has been waiting to get the driver allocated. Since there are no applications in a starting state the scheduler looks at App2 which is in an accepted state. App2 moves from the accepted state to starting when the driver is allocated.
+1. App2 requests its executor E21. The application state changes to running when E21 is allocated.
+
+This process would repeat itself for any new application submitted.
\ No newline at end of file
diff --git a/versioned_docs/version-1.1.0/developer_guide/build.md b/versioned_docs/version-1.1.0/developer_guide/build.md
new file mode 100644
index 000000000..586a630ea
--- /dev/null
+++ b/versioned_docs/version-1.1.0/developer_guide/build.md
@@ -0,0 +1,190 @@
+---
+id: build
+title: Build and Run
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+YuniKorn always works with a container orchestrator system. Currently, a Kubernetes shim [yunikorn-k8shim](https://github.com/apache/yunikorn-k8shim)
+is provided in our repositories, you can leverage it to develop YuniKorn scheduling features and integrate with Kubernetes.
+This document describes resources how to setup dev environment and how to do the development.
+
+## Development Environment setup
+
+Read the [environment setup guide](developer_guide/env_setup.md) first to setup Docker and Kubernetes development environment.
+
+## Build YuniKorn for Kubernetes
+
+Prerequisite:
+- Go 1.16+
+
+You can build the scheduler for Kubernetes from [yunikorn-k8shim](https://github.com/apache/yunikorn-k8shim) project.
+The build procedure will build all components into a single executable that can be deployed and running on Kubernetes.
+
+Start the integrated build process by pulling the `yunikorn-k8shim` repository:
+```bash
+mkdir $HOME/yunikorn/
+cd $HOME/yunikorn/
+git clone https://github.com/apache/yunikorn-k8shim.git
+```
+At this point you have an environment that will allow you to build an integrated image for the YuniKorn scheduler.
+
+### A note on Go modules and git version
+Go use git to fetch module information.
+Certain modules cannot be retrieved if the git version installed on the machine used to build is old.
+A message similar to the one below will be logged when trying to build for the first time.
+```text
+go: finding modernc.org/mathutil@v1.0.0
+go: modernc.org/golex@v1.0.0: git fetch -f origin refs/heads/*:refs/heads/* refs/tags/*:refs/tags/* in <location>: exit status 128:
+	error: RPC failed; result=22, HTTP code = 404
+	fatal: The remote end hung up unexpectedly
+```
+Update git to a recent version to fix this issue.
+Git releases later than 1.22 are known to work.
+
+### Build Docker image
+
+Building a docker image can be triggered by following command.
+
+```
+make image
+```
+
+The image with the build in configuration can be deployed directly on kubernetes.
+Some sample deployments that can be used are found under [deployments](https://github.com/apache/yunikorn-k8shim/tree/master/deployments/scheduler) directory.
+For the deployment that uses a config map you need to set up the ConfigMap in kubernetes.
+How to deploy the scheduler with a ConfigMap is explained in the [scheduler configuration deployment](developer_guide/deployment.md) document.
+
+The image build command will first build the integrated executable and then create the docker image.
+Currently, there are some published docker images under [this docker hub repo](https://hub.docker.com/r/apache/yunikorn), you are free to fetch and use.
+The default image tags are not suitable for deployments to an accessible repository as it uses a hardcoded user and would push to Docker Hub with proper credentials.
+You *must* update the `TAG` variable in the `Makefile` to push to an accessible repository.
+When you update the image tag be aware that the deployment examples given will also need to be updated to reflect the same change.
+
+### Inspect the docker image
+
+The docker image built from previous step has embedded some important build info in image's metadata. You can retrieve
+these info with docker `inspect` command.
+
+```
+docker inspect apache/yunikorn:scheduler-latest
+```
+
+This info includes git revisions (last commit SHA) for each component, to help you understand which version of the source code
+was shipped by this image. They are listed as docker image `labels`, such as
+
+```
+"Labels": {
+    "BuildTimeStamp": "2019-07-16T23:08:06+0800",
+    "Version": "0.1",
+    "yunikorn-core-revision": "dca66c7e5a9e",
+    "yunikorn-k8shim-revision": "bed60f720b28",
+    "yunikorn-scheduler-interface-revision": "3df392eded1f"
+}
+```
+
+### Dependencies
+
+The dependencies in the projects are managed using [go modules](https://blog.golang.org/using-go-modules).
+Go Modules require at least Go version 1.11 to be installed on the development system.
+
+If you want to modify one of the projects locally and build with your local dependencies you will need to change the module file. 
+Changing dependencies uses mod `replace` directives as explained in the [Update dependencies](#Updating dependencies).
+
+The YuniKorn project has four repositories three of those repositories have a dependency at the go level.
+These dependencies are part of the go modules and point to the github repositories.
+During the development cycle it can be required to break the dependency on the committed version from github.
+This requires making changes in the module file to allow loading a local copy or a forked copy from a different repository.  
+
+#### Affected repositories
+The following dependencies exist between the repositories:
+
+| repository| depends on |
+| --- | --- |
+| yunikorn-core | yunikorn-scheduler-interface | 
+| yunikorn-k8shim | yunikorn-scheduler-interface, yunikorn-core |
+| yunikorn-scheduler-interface | none |
+| yunikorn-web | yunikorn-core |
+
+The `yunikorn-web` repository has no direct go dependency on the other repositories. However any change to the `yunikorn-core` webservices can affect the web interface. 
+
+#### Making local changes
+
+To make sure that the local changes will not break other parts of the build you should run:
+- A full build `make` (build target depends on the repository)
+- A full unit test run `make test`
+
+Any test failures should be fixed before proceeding.
+
+#### Updating dependencies
+
+The simplest way is to use the `replace` directive in the module file. The `replace` directive allows you to override the import path with a new (local) path.
+There is no need to change any of the imports in the source code. The change must be made in the `go.mod` file of the repository that has the dependency. 
+
+Using `replace` to use of a forked dependency, such as:
+```
+replace github.com/apache/yunikorn-core => example.com/some/forked-yunikorn
+```
+
+There is no requirement to fork and create a new repository. If you do not have a repository you can use a local checked out copy too. 
+Using `replace` to use of a local directory as a dependency:
+```
+replace github.com/apache/yunikorn-core => /User/example/local/checked-out-yunikorn
+```
+and for the same dependency using a relative path:
+```
+replace github.com/apache/yunikorn-core => ../checked-out-yunikorn
+```
+Note: if the `replace` directive is using a local filesystem path, then the target must have the `go.mod` file at that location.
+
+Further details on the modules' wiki: [When should I use the 'replace' directive?](https://github.com/golang/go/wiki/Modules#when-should-i-use-the-replace-directive).
+
+## Build the web UI
+
+Example deployments reference the [YuniKorn web UI](https://github.com/apache/yunikorn-web). 
+The YuniKorn web UI has its own specific requirements for the build. The project has specific requirements for the build follow the steps in the README to prepare a development environment and build how to build the projects.
+The scheduler is fully functional without the web UI. 
+
+## Locally run the integrated scheduler
+
+When you have a local development environment setup you can run the scheduler in your local kubernetes environment.
+This has been tested in a Docker desktop with 'Docker for desktop' and Minikube. See the [environment setup guide](developer_guide/env_setup.md) for further details.
+
+```
+make run
+```
+It will connect with the kubernetes cluster using the users configured configuration located in `$HOME/.kube/config`.
+
+To run YuniKorn in Kubernetes scheduler plugin mode instead, execute:
+
+```
+make run_plugin
+```
+
+You can also use the same approach to run the scheduler locally but connecting to a remote kubernetes cluster,
+as long as the `$HOME/.kube/config` file is pointing to that remote cluster.
+
+
+## Verify external interface changes with e2e tests
+
+Yunikorn has an external REST interface which is validated by end-to-end tests. However, the tests exist in the k8shim repository.
+Whenever a change is made to the external interface, make sure that it is validated by running e2e tests or adjust the test cases accordingly.
+
+How to run the tests locally is described [here](https://github.com/apache/yunikorn-k8shim/blob/master/test/e2e/README.md).
diff --git a/versioned_docs/version-1.1.0/developer_guide/dependencies.md b/versioned_docs/version-1.1.0/developer_guide/dependencies.md
new file mode 100644
index 000000000..d458798df
--- /dev/null
+++ b/versioned_docs/version-1.1.0/developer_guide/dependencies.md
@@ -0,0 +1,124 @@
+---
+id: dependencies
+title: Go module updates
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## When to update
+The references in the `master` branches must be updated if a change is made in the scheduler interface.
+Updating the dependency of a shim in reference to the core might be needed even if the scheduler interface does not change.
+New functionality could be added that rely on changed field content of the messages, not the field layout of the message.
+In that case just the shim dependency needs to be updated.
+
+## Why a pseudo version
+In the `master` branch we **must** use a pseudo version for all the YuniKorn repository references we use.
+As the branch is in active development and not released we do not have a real version tag to reference.
+However, we still need to be able to point to the right commit for the dependencies.
+
+Go allows using [pseudo versions](https://go.dev/ref/mod#pseudo-versions) for these specific cases.
+An example of the pseudo versions we use in the Kubernetes shim:
+```
+module github.com/apache/yunikorn-k8shim
+
+go 1.16
+
+require (
+	github.com/apache/yunikorn-core v0.0.0-20220325135453-73d55282f052
+	github.com/apache/yunikorn-scheduler-interface v0.0.0-20220325134135-4a644b388bc4
+	...
+)
+```
+Release branches **must** not use pseudo versions.
+During the creation of a release, [tags](/community/release_procedure#tag-and-update-release-for-version) will be created.
+These tags will be used as the reference in the go.mod files for the release.    
+
+## Enforcement of pseudo version
+In the pull request checks for the `yunikorn-core` and `yunikorn-k8shim` repositories enforce the format of the versions.
+A build failure will be triggered if the version reference for the `yunikorn-core` or `yunikorn-scheduler-interface`
+repositories in the `master` branch is not a pseudo version.
+
+The check enforces that the start of the version reference is `v.0.0.0-`
+
+Pseudo versions are not enforced in the release branches as per [why a pseudo version](#why-a-pseudo-version) explanation above. 
+
+## Updating the core dependency
+Before updating the core dependency must make sure that the scheduler interface changes are finalised.
+
+1. Make the changes in the scheduler interface.
+2. Commit the changes into the master branch on GitHub and pull the latest master branch commit.
+3. [Generate a new pseudo version](#generating-a-pseudo-version) for the scheduler-interface.
+
+Updating the core dependency
+
+4. Update the go.mod file for the dependent repository: core repository
+    * Open the go.mod file
+    * Copy the generated pseudo version reference
+    * Replace the scheduler-interface version reference with the one generated in step 3.
+    * Save the go.mod file
+5. Run a `make test` to be sure that the change works. The build will pull down the new dependency and the change in the scheduler interface will be used.
+6. Commit the changes into the master branch on GitHub and pull the latest master branch commit
+
+## Updating a shim dependency
+Before updating a shim dependency you must make sure that the core dependency has been updated and committed.
+There are cases that the reference for the scheduler-interface has not changed.
+This is not an issue, either skip the update steps or execute them as per normal resulting in no changes as part of the commit.
+
+7. [Generate a new pseudo version](#generating-a-pseudo-version) for the core
+8. Update the go.mod file for the dependent repository: k8shim repository
+    * Open the go.mod file
+    * Copy the generated pseudo version reference of the scheduler interface
+    * Replace the scheduler-interface version reference with the one generated in step 3.
+    * Copy the generated pseudo version reference of the core
+    * Replace the core version reference with the one generated in step 7.
+    * Save the go.mod file
+9. Run a `make test` to be sure that the change works. The build will pull down the new dependency and the changes in the core and scheduler interface will be used.
+10. Commit the changes into the master branch on GitHub
+
+:::note
+If multiple PRs are being worked on in the scheduler interface and or core at the same time a different PR might have already applied the update.
+This will all depend on the commit order.
+It is therefor that steps 5 and 8 are performed to make sure there is no regression.
+:::
+## Generating a pseudo version
+
+A pseudo references for use in a go.mod file is based on the commit hash and timestamp.
+It is simple to generate one using the following steps: 
+
+1. Change to the repository for which the new pseudo version needs to be generated.
+2. Update the local checked out code for the master branch to get the latest commits
+```
+git pull; git status
+```
+The status should show up to date with the `origin` from where it was cloned.
+3. Run the following command to get the pseudo version:
+```
+TZ=UTC git --no-pager show --quiet --abbrev=12 --date='format-local:%Y%m%d%H%M%S' --format='v0.0.0-%cd-%h'
+```
+4. This command will print a line like this:
+```
+v0.0.0-20220318052402-b3dfd0d2adaa
+```
+That is the pseudo version that can be used in the go.mod files.
+
+:::note
+The pseudo version must be based on a commit that is in the vcs system, i.e. from Github.
+Local commits or commits that are not yet merged in a PR cannot be used.
+:::
diff --git a/versioned_docs/version-1.1.0/developer_guide/deployment.md b/versioned_docs/version-1.1.0/developer_guide/deployment.md
new file mode 100644
index 000000000..7e5aa5bfa
--- /dev/null
+++ b/versioned_docs/version-1.1.0/developer_guide/deployment.md
@@ -0,0 +1,164 @@
+---
+id: deployment
+title: Deploy to Kubernetes
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+The easiest way to deploy YuniKorn is to leverage our [helm charts](https://hub.helm.sh/charts/yunikorn/yunikorn),
+you can find the guide [here](get_started/get_started.md). This document describes the manual process to deploy YuniKorn
+scheduler and admission controller. It is primarily intended for developers.
+
+## Build docker image
+
+Under project root of the `yunikorn-k8shim`, run the command to build an image using the map for the configuration:
+```
+make image
+```
+
+This command will build an image. The image will be tagged with a default version and image tag.
+
+**Note** the default build uses a hardcoded user and tag. You *must* update the `IMAGE_TAG` variable in the `Makefile` to push to an appropriate repository. 
+
+**Note** the latest yunikorn images in docker hub are not updated anymore due to ASF policy. Hence, you should build both scheduler image and web image locally before deploying them.
+
+## Setup RBAC for Scheduler
+
+The first step is to create the RBAC role for the scheduler, see [yunikorn-rbac.yaml](https://github.com/apache/yunikorn-k8shim/blob/master/deployments/scheduler/yunikorn-rbac.yaml)
+```
+kubectl create -f scheduler/yunikorn-rbac.yaml
+```
+The role is a requirement on the current versions of kubernetes.
+
+## Create the ConfigMap
+
+This must be done before deploying the scheduler. It requires a correctly setup kubernetes environment.
+This kubernetes environment can be either local or remote. 
+
+- download configuration file if not available on the node to add to kubernetes:
+```
+curl -o queues.yaml https://raw.githubusercontent.com/apache/yunikorn-k8shim/master/conf/queues.yaml
+```
+- create ConfigMap in kubernetes:
+```
+kubectl create configmap yunikorn-configs --from-file=queues.yaml
+```
+- check if the ConfigMap was created correctly:
+```
+kubectl describe configmaps yunikorn-configs
+```
+
+**Note** if name of the ConfigMap is changed the volume in the scheduler yaml file must be updated to reference the new name otherwise the changes to the configuration will not be picked up. 
+
+## Attach ConfigMap to the Scheduler Pod
+
+The ConfigMap is attached to the scheduler as a special volume. First step is to specify where to mount it in the pod:
+```yaml
+  volumeMounts:
+    - name: config-volume
+      mountPath: /etc/yunikorn/
+```
+Second step is to link the mount point back to the configuration map created in kubernetes:
+```yaml
+  volumes:
+    - name: config-volume
+      configMap:
+        name: yunikorn-configs
+``` 
+
+Both steps are part of the scheduler yaml file, an example can be seen at [scheduler.yaml](https://github.com/apache/yunikorn-k8shim/blob/master/deployments/scheduler/scheduler.yaml)
+for reference.
+
+## Deploy the Scheduler
+
+The scheduler can be deployed with following command.
+```
+kubectl create -f deployments/scheduler/scheduler.yaml
+```
+
+The deployment will run 2 containers from your pre-built docker images in 1 pod,
+
+* yunikorn-scheduler-core (yunikorn scheduler core and shim for K8s)
+* yunikorn-scheduler-web (web UI)
+
+Alternatively, the scheduler can be deployed as a K8S scheduler plugin:
+```
+kubectl create -f deployments/scheduler/plugin.yaml
+```
+
+The pod is deployed as a customized scheduler, it will take the responsibility to schedule pods which explicitly specifies `schedulerName: yunikorn` in pod's spec. In addition to the `schedulerName`, you will also have to add a label `applicationId` to the pod.
+```yaml
+  metadata:
+    name: pod_example
+    labels:
+      applicationId: appID
+  spec:
+    schedulerName: yunikorn
+```
+
+Note: Admission controller abstracts the addition of `schedulerName` and `applicationId` from the user and hence, routes all traffic to YuniKorn. If you use helm chart to deploy, it will install admission controller along with the scheduler. Otherwise, proceed to the steps
+below to manually deploy the admission controller if running non-example workloads where `schedulerName` and `applicationId` are not present in the pod spec and metadata, respectively.
+
+## Setup RBAC for Admission Controller
+
+Before the admission controller is deployed, we must create its RBAC role, see [admission-controller-rbac.yaml](https://github.com/apache/yunikorn-k8shim/blob/master/deployments/scheduler/admission-controller-rbac.yaml).
+
+```
+kubectl create -f scheduler/admission-controller-rbac.yaml
+```
+
+## Create the Secret
+
+Since the admission controller intercepts calls to the API server to validate/mutate incoming requests, we must deploy an empty secret
+used by the webhook server to store TLS certificates and keys. See [admission-controller-secrets.yaml](https://github.com/apache/yunikorn-k8shim/blob/master/deployments/scheduler/admission-controller-secrets.yaml).
+
+```
+kubectl create -f scheduler/admission-controller-secrets.yaml
+```
+
+## Deploy the Admission Controller
+
+Now we can deploy the admission controller as a service. This will automatically validate/modify incoming requests and objects, respectively, in accordance with the [example in Deploy the Scheduler](#Deploy-the-Scheduler). See the contents of the admission controller deployment and service in [admission-controller.yaml](https://github.com/apache/yunikorn-k8shim/blob/master/deployments/scheduler/admission-controller.yaml).
+
+```
+kubectl create -f scheduler/admission-controller.yaml
+```
+
+## Access to the web UI
+
+When the scheduler is deployed, the web UI is also deployed in a container.
+Port forwarding for the web interface on the standard ports can be turned on via:
+
+```
+POD=`kubectl get pod -l app=yunikorn -o jsonpath="{.items[0].metadata.name}"` && \
+kubectl port-forward ${POD} 9889 9080
+```
+
+`9889` is the default port for Web UI, `9080` is the default port of scheduler's Restful service where web UI retrieves info from.
+Once this is done, web UI will be available at: http://localhost:9889.
+
+## Configuration Hot Refresh
+
+YuniKorn supports to load configuration changes automatically from attached configmap. Simply update the content in the configmap,
+that can be done either via Kubernetes dashboard UI or commandline. _Note_, changes made to the configmap might have some
+delay to be picked up by the scheduler.
+
+
+
diff --git a/versioned_docs/version-1.1.0/developer_guide/env_setup.md b/versioned_docs/version-1.1.0/developer_guide/env_setup.md
new file mode 100644
index 000000000..c45d77e21
--- /dev/null
+++ b/versioned_docs/version-1.1.0/developer_guide/env_setup.md
@@ -0,0 +1,156 @@
+---
+id: env_setup
+title: Dev Environment Setup
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+There are several ways to setup a local development environment for Kubernetes, the two most common ones are `Minikube` ([docs](https://kubernetes.io/docs/setup/minikube/)) and `docker-desktop`.
+`Minikube` provisions a local Kubernetes cluster on several Virtual Machines (via VirtualBox or something similar). `docker-desktop` on the other hand, sets up Kubernetes cluster in docker containers.
+
+## Local Kubernetes cluster using Docker Desktop
+
+In this tutorial, we will base all the installs on Docker Desktop.
+Even in this case we can use a lightweight [minikube](#local-kubernetes-cluster-with-minikube) setup which gives the same functionality with less impact.
+
+### Installation
+
+Download and install [Docker-Desktop](https://www.docker.com/products/docker-desktop) on your laptop. Latest version has an embedded version of Kubernetes so no additional install is needed.
+Just simply follow the instruction [here](https://docs.docker.com/docker-for-mac/#kubernetes) to get Kubernetes up and running within docker-desktop.
+
+Once Kubernetes is started in docker desktop, you should see something similar below:
+
+![Kubernetes in Docker Desktop](./../assets/docker-desktop.png)
+
+This means that:
+1. Kubernetes is running.
+1. the command line tool `kubctl` is installed in the `/usr/local/bin` directory.
+1. the Kubernetes context is set to `docker-desktop`.
+
+### Deploy and access dashboard
+
+After setting up the local Kubernetes you need to deploy the dashboard using the following steps: 
+1. follow the instructions in [Kubernetes dashboard doc](https://github.com/kubernetes/dashboard) to deploy the dashboard.
+1. start the Kubernetes proxy in the background from a terminal to get access on the dashboard on the local host:   
+    ```shell script
+    kubectl proxy &
+    ```
+1. access the dashboard at the following URL: [clickable link](http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login)
+
+### Access local Kubernetes cluster
+
+The dashboard as deployed in the previous step requires a token or config to sign in. Here we use the token to sign in. The token is generated automatically and can be retrieved from the system.
+
+1. retrieve the name of the dashboard token:
+    ```shell script
+    kubectl -n kube-system get secret | grep kubernetes-dashboard-token
+    ```
+2. retrieve the content of the token, note that the token name ends with a random 5 character code and needs to be replaced with the result of step 1. As an example:  
+    ```shell script
+    kubectl -n kube-system describe secret kubernetes-dashboard-token-tf6n8
+    ```
+3. copy the token value which is part of the `Data` section with the tag `token`.
+4. select the **Token** option in the dashboard web UI:<br/>
+    ![Token Access in dashboard](./../assets/dashboard_token_select.png)
+5. paste the token value into the input box and sign in:<br/>
+    ![Token Access in dashboard](./../assets/dashboard_secret.png)
+
+## Local Kubernetes cluster with Minikube
+Minikube can be added to an existing Docker Desktop install. Minikube can either use the pre-installed hypervisor or use a hypervisor of choice. These instructions use [HyperKit](https://github.com/moby/hyperkit) which is embedded in Docker Desktop.   
+
+If you want to use a different hypervisor then HyperKit make sure that you follow the generic minikube install instructions. Do not forget to install the correct driver for the chosen hypervisor if required.
+The basic instructions are provided in the [minikube install](https://kubernetes.io/docs/tasks/tools/install-minikube/) instructions.
+
+Check hypervisor Docker Desktop should have already installed HyperKit. In a terminal run: `hyperkit` to confirm. Any response other than `hyperkit: command not found` confirms that HyperKit is installed and on the path. If it is not found you can choose a different hypervisor or fix the Docker Desktop install.
+
+### Installing Minikube
+1. install minikube, you can either use brew or directly via these steps: 
+    ```shell script
+    curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-darwin-amd64
+    chmod +x minikube
+    sudo mv minikube /usr/local/bin
+    ```
+1. install HyperKit driver (required), you can either use brew or directly via these steps:
+    ```shell script
+    curl -LO https://storage.googleapis.com/minikube/releases/latest/docker-machine-driver-hyperkit
+    sudo install -o root -g wheel -m 4755 docker-machine-driver-hyperkit /usr/local/bin/
+    ```
+1. update the minikube config to default to the HyperKit install `minikube config set vm-driver hyperkit`
+1. change docker desktop to use minikube for Kubernetes:<br/>
+    ![Kubernetes in Docker Desktop: minikube setting](./../assets/docker-dektop-minikube.png)
+
+### Deploy and access the cluster
+After the installation is done you can start a new cluster.
+1. start the minikube cluster: `minikube start --kubernetes-version v1.14.2`
+1. start the minikube dashboard: `minikube dashboard &`
+
+### Build impact
+When you create images make sure that the build is run after pointing it to the right environment. 
+Without setting the enviromnent minikube might not find the docker images when deploying the scheduler.
+1. make sure minikube is started
+1. in the terminal where you wll run the build execute: `eval $(minikube docker-env)`
+1. run the image build from the yunikorn-k8shim repository root: `make image`
+1. deploy the scheduler as per the normal instructions.
+
+## Debug code locally
+
+Note, this instruction requires you have GoLand IDE for development.
+
+In GoLand, go to yunikorn-k8shim project. Then click "Run" -> "Debug..." -> "Edit Configuration..." to get the pop-up configuration window.
+Note, you need to click "+" to create a new profile if the `Go Build` option is not available at the first time.
+
+![Debug Configuration](./../assets/goland_debug.jpg)
+
+The highlighted fields are the configurations you need to add. These include:
+
+- Run Kind: package
+- Package path: point to the path of `pkg/shim` package
+- Working directory: point to the path of the `conf` directory, this is where the program loads configuration file from
+- Program arguments: specify the arguments to run the program, such as `-kubeConfig=/path/to/.kube/config -interval=1s -clusterId=mycluster -clusterVersion=0.1 -name=yunikorn -policyGroup=queues -logEncoding=console -logLevel=-1`.
+Note, you need to replace `/path/to/.kube/config` with the local path to the kubeconfig file. And if you want to change or add more options, you can run `_output/bin/k8s-yunikorn-scheduler -h` to find out.
+
+Once the changes are done, click "Apply", then "Debug". You will need to set proper breakpoints in order to debug the program.
+
+## Access remote Kubernetes cluster
+
+This setup assumes you have already installed a remote Kubernetes cluster. 
+For a generic view on how to access a multiple cluster and integrate it follow the [accessing multiple clusters](https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/) documentation from Kubernetes.
+
+Or follow these simplified steps:
+1. get the Kubernetes `config` file from remote cluster, copy it to the local machine and give it a unique name i.e. `config-remote`
+1. save the `KUBECONFIG` environment variable (if set)
+    ```shell script
+    export KUBECONFIG_SAVED=$KUBECONFIG
+    ```
+1. add the new file to the environment variable
+    ```shell script
+    export KUBECONFIG=$KUBECONFIG:config-remote
+    ``` 
+1. run the command `kubectl config view` to check that both configs can be accessed
+1. switch context using `kubectl config use-context my-remote-cluster`
+1. confirm that the current context is now switched to the remote cluster config:
+    ```text
+    kubectl config get-contexts
+    CURRENT   NAME                   CLUSTER                      AUTHINFO             NAMESPACE
+              docker-for-desktop     docker-for-desktop-cluster   docker-for-desktop
+    *         my-remote-cluster      kubernetes                   kubernetes-admin
+    ```
+
+More docs can be found [here](https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/)  
diff --git a/versioned_docs/version-1.1.0/developer_guide/openshift_development.md b/versioned_docs/version-1.1.0/developer_guide/openshift_development.md
new file mode 100644
index 000000000..8d21171e2
--- /dev/null
+++ b/versioned_docs/version-1.1.0/developer_guide/openshift_development.md
@@ -0,0 +1,182 @@
+---
+id: openshift_development
+title: Development in CodeReady Containers
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+YuniKorn is tested against OpenShift and developers can set up their local environment to test patches against OpenShift.
+Our recommended local environment uses CodeReady containers.
+
+## Set up a running CRC cluster
+
+1. Download CodeReady Container binaries
+
+   Select your OS from the dropdown list then click on "Download" (On a Mac, you'll download crc-macos-amd64.tar.xz; on Linux, crc-linux-amd64.tar.xz).
+   You'll be asked to connect using your Red Hat login. If you don't have one, just click on "Create one now". You do *not* need a Red Hat subscription for this.
+   
+   Once logged in, download CodeReady Containers binary and the pull secret.
+   
+1. Unzip the tar file.
+
+   ```bash
+   tar -xvzf crc-macos-amd64.tar.xz
+   ```
+   
+1. Move the crc binary under your path. Like
+
+   ```bash
+   sudo cp `pwd`/crc-macos-$CRCVERSION-amd64/crc /usr/local/bin
+   ```
+
+1. Configure CRC in accordance with your hardware capabilities.
+
+   ```bash
+   crc config set memory 16000
+   crc config set cpus 12
+   crc setup
+   ```
+1. Start the CRC and open the console.
+
+   ```bash
+   crc start --pull-secret-file pull-secret.txt
+   crc console
+   ```
+
+## Testing a patch
+
+The following steps assume you have a running CRC cluster in your laptop. Note that these steps are not tested against a remote CRC cluster. 
+
+1. Access your environment through the `oc` command.
+
+   Type in the `crc oc-env` command to a shell.
+   ```bash
+   $ crc oc-env
+   export PATH="/Users/<user>/.crc/bin/oc:$PATH"
+   # Run this command to configure your shell:
+   # eval $(crc oc-env)
+   ```
+   So you need to type in this to access the `oc` comamnd:
+   ```
+   eval $(crc oc-env)
+   ```
+
+1. Log in to `oc`. After the CRC has started it will display a similar message:
+
+   ```
+   To access the cluster, first set up your environment by following 'crc oc-env' instructions.
+   Then you can access it by running 'oc login -u developer -p developer https://api.crc.testing:6443'.
+   To login as an admin, run 'oc login -u kubeadmin -p duduw-yPT9Z-hsUpq-f3pre https://api.crc.testing:6443'.
+   To access the cluster, first set up your environment by following 'crc oc-env' instructions.
+   ```
+
+   Use the `oc login -u kubeadmin ...` command. 
+
+1. Get the URL of the local OpenShift cluster's internal private Docker repository by typing the command below.
+
+   ```bash
+   $ oc get route default-route -n openshift-image-registry --template='{{ .spec.host }}'
+   default-route-openshift-image-registry.apps-crc.testing
+   ```
+
+   By default it should be `default-route-openshift-image-registry.apps-crc.testing`. Change the steps above, if the displayed URL is different.
+
+1. Prepare the Docker images.
+
+   You can read more about this at the bottom, in the *Using custom images* section.
+
+1. Prepare the helm chart.
+
+   If you want to use custom Docker images, replace the images in the chart's `values.yaml` config file.
+
+   Note that if you manually pushed the Docker image to the `default-route-openshift-image-registry.apps-crc.testing` docker registry directly you need to have valid certs to access it. 
+   On OpenShift there's service for this: `image-registry.openshift-image-registry.svc`, which is easier to use.
+
+   For example, if you want to override all of the three Docker images you should use the following configs:
+   ```yaml
+   image:
+     repository: image-registry.openshift-image-registry.svc:5000/yunikorn/yunikorn
+     tag: scheduler-latest
+     pullPolicy: Always
+   
+   admission_controller_image:
+     repository: image-registry.openshift-image-registry.svc:5000/yunikorn/yunikorn
+     tag: admission-latest
+     pullPolicy: Always
+   
+   web_image:
+     repository: image-registry.openshift-image-registry.svc:5000/yunikorn/yunikorn-web
+     tag: latest
+     pullPolicy: Always
+   ``` 
+
+   You can find it in the yunikorn-release repo's helm chart directory.
+
+1. Install the helm charts.
+
+   ```bash
+   helm install yunikorn . -n yunikorn
+   ```
+
+## Using custom images
+
+### Podman
+
+1. Log in into Podman using the following command.
+
+   ```bash
+   podman login --tls-verify=false -u kubeadmin -p $(oc whoami -t) default-route-openshift-image-registry.apps-crc.testing
+   ```
+
+1. Build the image in the repository e.g. in shim using the generic `make image` command.
+
+1. Verify that the image is present in the repository.
+
+   ```bash
+   podman images
+   REPOSITORY                TAG              IMAGE ID     CREATED            SIZE
+   localhost/apache/yunikorn admission-latest 19eb41241d64 About a minute ago 53.5 MB
+   localhost/apache/yunikorn scheduler-latest e60e09b424d9 About a minute ago 543 MB
+   ```
+
+## Directly pushing OS Image Registry
+
+1. Create the images that you wish to replace.
+
+   You can either build new images locally or use official (maybe mix both).
+      * For the -shim and -web images checkout the repository (optionally make your changes) and type the following command:
+      ```bash
+      make clean image REGISTRY=default-route-openshift-image-registry.apps-crc.testing/<project>/<name>:<tag>
+      ```
+      Note that in OpenShift a project is equivalent a Kubernetes namespace. The `yunikorn` project/namespace is recommended.
+      * Using an official image is possible by, retagging it with by the `docker tag` command. 
+      ```bash
+      docker tag apache/yunikorn:scheduler-latest default-route-openshift-image-registry.apps-crc.testing/yunikorn/yunikorn:scheduler-latest
+      ```
+
+1. Login to the Docker repository.
+   ```bash
+   docker login -u kubeadmin -p $(oc whoami -t) default-route-openshift-image-registry.apps-crc.testing
+   ```
+
+1. Push the Docker images to the internal Docker repository
+   ```
+   docker push default-route-openshift-image-registry.apps-crc.testing/yunikorn/yunikorn:scheduler-latest
+   ```
diff --git a/versioned_docs/version-1.1.0/get_started/core_features.md b/versioned_docs/version-1.1.0/get_started/core_features.md
new file mode 100644
index 000000000..8f2258966
--- /dev/null
+++ b/versioned_docs/version-1.1.0/get_started/core_features.md
@@ -0,0 +1,73 @@
+---
+id: core_features
+title: Features
+keywords:
+ - feature
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+The main features of YuniKorn include:
+
+## App-aware scheduling
+One of the key differences of YuniKorn is, it does app-aware scheduling. In default K8s scheduler, it simply schedules
+pod by pod, without any context about user, app, queue. However, YuniKorn recognizes users, apps, queues, and it considers
+a lot more factors, e.g resource, ordering etc, while making scheduling decisions. This gives us the possibility to do
+fine-grained controls on resource quotas, resource fairness and priorities, which are the most important requirements
+for a multi-tenancy computing system.
+
+## Hierarchy Resource Queues
+
+Hierarchy queues provide an efficient mechanism to manage cluster resources. The hierarchy of the queues can logically
+map to the structure of an organization. This gives fine-grained control over resources for different tenants. The YuniKorn
+UI provides a centralised view to monitor the usage of resource queues, it helps you to get the insight how the resources are
+used across different tenants. What's more, By leveraging the min/max queue capacity, it can define how elastic it can be
+in terms of the resource consumption for each tenant.
+
+## Job Ordering and Queuing
+Applications can be properly queued in working-queues, the ordering policy determines which application can get resources first.
+The policy can be various, such as simple `FIFO`, `Fair`, `StateAware` or `Priority` based. Queues can maintain the order of applications,
+and based on different policies, the scheduler allocates resources to jobs accordingly. The behavior is much more predictable.
+
+What's more, when the queue max-capacity is configured, jobs and tasks can be properly queued up in the resource queue.
+If the remaining capacity is not enough, they can be waiting in line until some resources are released. This simplifies
+the client side operation. Unlike the default scheduler, resources are capped by namespace resource quotas,
+and that is enforced by the quota-admission-controller, if the underneath namespace has no enough quota, pods cannot be
+created. Client side needs complex logic, e.g retry by condition, to handle such scenarios.
+
+## Resource fairness
+In a multi-tenant environment, a lot of users are sharing cluster resources. To avoid tenants from competing resources
+and potential get starving. More fine-grained fairness needs to achieve fairness across users, as well as teams/organizations.
+With consideration of weights or priorities, some more important applications can get high demand resources that stand over its share.
+This is often associated with resource budget, a more fine-grained fairness mode can further improve the expense control.
+
+## Resource Reservation
+
+YuniKorn automatically does reservations for outstanding requests. If a pod could not be allocated, YuniKorn will try to
+reserve it on a qualified node and tentatively allocate the pod on this reserved node (before trying rest of nodes).
+This mechanism can avoid this pod gets starved by later submitted smaller, less-picky pods.
+This feature is important in the batch workloads scenario because when a large amount of heterogeneous pods is submitted
+to the cluster, it's very likely some pods can be starved even they are submitted much earlier. 
+
+## Throughput
+Throughput is a key criterion to measure scheduler performance. It is critical for a large scale distributed system.
+If throughput is bad, applications may waste time on waiting for scheduling, and further impact service SLAs.
+When the cluster gets bigger, it also means the requirement of higher throughput. The [performance evaluation based on Kube-mark](performance/evaluate_perf_function_with_kubemark.md)
+reveals some perf numbers.
diff --git a/versioned_docs/version-1.1.0/get_started/get_started.md b/versioned_docs/version-1.1.0/get_started/get_started.md
new file mode 100644
index 000000000..39c11f10e
--- /dev/null
+++ b/versioned_docs/version-1.1.0/get_started/get_started.md
@@ -0,0 +1,80 @@
+---
+id: user_guide
+title: Get Started
+slug: /
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Before reading this guide, we assume you either have a Kubernetes cluster, or a local Kubernetes dev environment, e.g MiniKube.
+It is also assumed that `kubectl` is on your path and properly configured.
+Follow this [guide](developer_guide/env_setup.md) on how to setup a local Kubernetes cluster using docker-desktop.
+
+## Install
+
+The easiest way to get started is to use our Helm Charts to deploy YuniKorn on an existing Kubernetes cluster.
+It is recommended to use Helm 3 or later versions.
+
+```shell script
+helm repo add yunikorn https://apache.github.io/yunikorn-release
+helm repo update
+kubectl create namespace yunikorn
+helm install yunikorn yunikorn/yunikorn --namespace yunikorn
+```
+
+By default, the helm chart will install the scheduler, web-server and the admission-controller in the cluster.
+When `admission-controller` is installed, it simply routes all traffic to YuniKorn. That means the resource scheduling
+is delegated to YuniKorn. You can disable it by setting `embedAdmissionController` flag to `false` during the helm install.
+
+The YuniKorn scheduler can also be deployed as a Kubernetes scheduler plugin by setting the Helm `enableSchedulerPlugin`
+flag to `true`. This will deploy an alternate Docker image which contains YuniKorn compiled together with the default
+scheduler. This new mode offers better compatibility with the default Kubernetes scheduler and is suitable for use with the
+admission controller delegating all scheduling to YuniKorn. Because this mode is still very new, it is not enabled by default.
+
+If you are unsure which deployment mode you should use, refer to our [side-by-side comparison](user_guide/deployment_modes).
+ 
+Further configuration options for installing YuniKorn via Helm are available in the [YuniKorn Helm hub page](https://hub.helm.sh/charts/yunikorn/yunikorn).
+
+If you don't want to use helm charts, you can find our step-by-step
+tutorial [here](developer_guide/deployment.md).
+
+## Uninstall
+
+Run the following command to uninstall YuniKorn:
+```shell script
+helm uninstall yunikorn --namespace yunikorn
+```
+
+## Access the Web UI
+
+When the scheduler is deployed, the web UI is also deployed in a container.
+Port forwarding for the web interface on the standard port can be turned on via:
+
+```
+kubectl port-forward svc/yunikorn-service 9889:9889 -n yunikorn
+```
+
+`9889` is the default port for web UI.
+Once this is done, web UI will be available at: `http://localhost:9889`.
+
+![UI Screenshots](./../assets/yk-ui-screenshots.gif)
+
+YuniKorn UI provides a centralised view for cluster resource capacity, utilization, and all application info.
+
diff --git a/versioned_docs/version-1.1.0/performance/evaluate_perf_function_with_kubemark.md b/versioned_docs/version-1.1.0/performance/evaluate_perf_function_with_kubemark.md
new file mode 100644
index 000000000..df244c228
--- /dev/null
+++ b/versioned_docs/version-1.1.0/performance/evaluate_perf_function_with_kubemark.md
@@ -0,0 +1,120 @@
+---
+id: evaluate_perf_function_with_kubemark
+title: Evaluate YuniKorn Performance with Kubemark
+keywords:
+ - performance
+ - throughput
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+The YuniKorn community concerns about the scheduler’s performance and continues to optimize it over the releases. The community has developed some tools in order to test and tune the performance repeatedly.
+
+## Environment setup 
+
+We leverage [Kubemark](https://github.com/kubernetes/kubernetes/blob/release-1.3/docs/devel/kubemark-guide.md#starting-a-kubemark-cluster) to evaluate scheduler’s performance. Kubemark is a testing tool that simulates large scale clusters. It create hollow nodes which runs hollow kubelet to pretend original kubelet behavior. Scheduled pods on these hollow nodes won’t actually execute. It is able to create a big cluster that meets our experiment requirement that reveals the yunikorn sched [...]
+
+## Scheduler Throughput
+
+We have designed some simple benchmarking scenarios on a simulated large scale environment in order to evaluate the scheduler performance. Our tools measure the [throughput](https://en.wikipedia.org/wiki/Throughput) and use these key metrics to evaluate the performance. In a nutshull, scheduler throughput is the rate of processing pods from discovering them on the cluster to allocating them to nodes.
+
+In this experiment, we setup a simulated 2000/4000 nodes cluster with [Kubemark](https://github.com/kubernetes/kubernetes/blob/release-1.3/docs/devel/kubemark-guide.md#starting-a-kubemark-cluster). Then we launch 10 [deployments](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/), with setting replicas to 5000 in each deployment respectively. This simulates large scale workloads submitting to the K8s cluster simultaneously. Our tool periodically monitors and checks po [...]
+
+![Scheduler Throughput](./../assets/yunirkonVSdefault.png)
+<p align="center">Fig 1. Yunikorn and default scheduler throughput </p>
+
+The charts record the time spent until all pods are running on the cluster:
+
+|  Number of Nodes  | yunikorn        | k8s default scheduler		| Diff    |
+|------------------	|:--------------:	|:---------------------: |:-----:  |
+| 2000(nodes)       | 204(pods/sec)			| 49(pods/sec)			        |   416%  |
+| 4000(nodes)       | 115(pods/sec)			| 48(pods/sec)			        |   240%  |
+
+In order to normalize the result, we have been running the tests for several rounds. As shown above, YuniKorn achieves a `2x` ~ `4x` performance gain comparing to the default scheduler.
+
+:::note
+
+Like other performance testing, the result is highly variable depending on the underlying hardware, e.g server CPU/memory, network bandwidth, I/O speed, etc. To get an accurate result that applies to your environment, we encourage you to run these tests on a cluster that is close to your production environment.
+
+:::
+
+## Performance Analysis
+
+The results we got from the experiment are promising. We further take a deep dive to analyze the performance by observing more internal YuniKorn metrics, and we are able to locate a few key areas affecting the performance.
+
+### K8s Limitation
+
+We found the overall performance actually is capped by the K8s master services, such as api-server, controller-manager and etcd, it did not reached the limit of YuniKorn in all our experiments. If you look at the internal scheduling metrics, you can see:
+
+![Allocation latency](./../assets/allocation_4k.png)
+<p align="center">Fig 2. Yunikorn metric in 4k nodes </p>
+
+Figure 2 is a screenshot from Prometheus, which records the [internal metrics](performance/metrics.md) `containerAllocation` in YuniKorn. They are the number of pods being allocated by the scheduler, but have not necessarily been bound to nodes. It consumes roughly 122 seconds to finish scheduling 50k pods, i.e 410 pod/sec. The actual throughput drops to 115 pods/sec, and the extra time was used to bind the pods on different nodes. If K8s side could catch up, we will see a better result. [...]
+
+### Node Sorting
+
+When the cluster size grows, we see an obvious performance drop in YuniKorn. This is because in YuniKorn, we do a full sorting of the cluster nodes in order to find the **"best-fit"** node for a given pod. Such strategy makes the pods distribution more optimal based on the [node sorting policy](./../user_guide/sorting_policies#node-sorting) being used. However, sorting nodes is expensive, doing this in the scheduling cycle creates a lot of overhead. To overcome this, we have improved our [...]
+
+### Per Node Precondition Checks
+
+In each scheduling cycle, another time consuming part is the "Precondition Checks" for a node. In this phase, YuniKorn evaluates all K8s standard predicates, e.g node-selector, pod affinity/anti-affinity, etc, in order to qualify a pod is fit onto a node. These evaluations are expensive.
+
+We have done two experiments to compare the case where the predicates evaluation was enabled with being disabled. See the results below:
+
+![Allocation latency](./../assets/predicateComaparation.png)
+<p align="center">Fig 3. Predicate effect comparison in yunikorn </p>
+
+When the predicates evaluation is disabled, the throughput improves a lot. We looked further into the latency distribution of the entire scheduling cycle and the predicates-eval latency. And found: 
+
+![YK predicate latency](./../assets/predicate_4k.png)
+<p align="center">Fig 4. predicate latency </p>
+
+![YK scheduling with predicate](./../assets/scheduling_with_predicate_4k_.png)
+<p align="center">Fig 5. Scheduling time with predicate active </p>
+
+![YK scheduling with no predicate](./../assets/scheduling_no_predicate_4k.png)
+<p align="center">Fig 6. Scheduling time with predicate inactive </p>
+
+Overall, YuniKorn scheduling cycle runs really fast, and the latency drops in **0.001s - 0.01s** range per cycle. And the majority of the time was used for predicates evaluation, 10x to other parts in the scheduling cycle.
+
+|				| scheduling latency distribution(second)	| predicates-eval latency distribution(second)	|
+|-----------------------	|:---------------------:		|:---------------------:			|
+| predicates enabled		| 0.01 - 0.1				| 0.01-0.1					|
+| predicates disabled		| 0.001 - 0.01				| none						|
+
+## Why YuniKorn is faster?
+
+The default scheduler was created as a service-oriented scheduler; it is less sensitive in terms of throughput compared to YuniKorn. YuniKorn community works really hard to keep the performance outstanding in the line and keep improving it. The reasons that YuniKorn can run faster than the default scheduler are:
+
+* Short Circuit Scheduling Cycle
+
+YuniKorn keeps the scheduling cycle short and efficient. YuniKorn uses all async communication protocol to make sure all the critical paths are non-blocking calls. Most of the places are just doing in-memory calculation which can be highly efficient. The default scheduler leverages [scheduling framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/), it provides lots of flexibility to extend the scheduler, however, the trade-off is the performance. The sc [...]
+
+* Async Events Handling
+
+YuniKorn leverages an async event handling framework to deal with internal states. And this allows the core scheduling cycle can run fast without being blocked by any expensive calls. An example is the default scheduler needs to write state updates, events to pod objects, that is done inside of the scheduling cycle. This involves persisting data to etcd which could be slow. YuniKorn, instead, caches all such events in a queue and writes back to the pod in asynchronous manner. 
+
+* Faster Node Sorting
+
+After [YUNIKORN-807](https://issues.apache.org/jira/browse/YUNIKORN-807), YuniKorn does the incremental node sorting which is highly efficient. This is built on top of the so-called "resource-weight" based node scoring mechanism, and it is also extensible via plugins. All these together reduce the overhead while computing node scores. In comparison, the default scheduler provides a few extension points for calculating node scores, such as `PreScore`, `Score` and `NormalizeScore`. These c [...]
+
+## Summary
+
+During the tests, we found YuniKorn is performing really well, especially compared to the default scheduler. We have identified the major factors in YuniKorn where we can continue to improve the performance, and also explained why YuniKorn is performing better than the default scheduler. We also realized the limitations while scaling Kubernetes to thousands of nodes, that can be alleviated by using other techiques such as, e.g federation. At a result, YuniKorn is a high-efficient, high-t [...]
diff --git a/versioned_docs/version-1.1.0/performance/metrics.md b/versioned_docs/version-1.1.0/performance/metrics.md
new file mode 100644
index 000000000..9dedbec73
--- /dev/null
+++ b/versioned_docs/version-1.1.0/performance/metrics.md
@@ -0,0 +1,109 @@
+---
+id: metrics
+title: Scheduler Metrics
+keywords:
+ - metrics
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+YuniKorn leverages [Prometheus](https://prometheus.io/) to record metrics. The metrics system keeps tracking of
+scheduler's critical execution paths, to reveal potential performance bottlenecks. Currently, there are three categories
+for these metrics:
+
+- scheduler: generic metrics of the scheduler, such as allocation latency, num of apps etc.
+- queue: each queue has its own metrics sub-system, tracking queue status.
+- event: record various changes of events in YuniKorn.
+
+all metrics are declared in `yunikorn` namespace.
+###    Scheduler Metrics
+
+| Metrics Name          | Metrics Type  | Description  | 
+| --------------------- | ------------  | ------------ |
+| containerAllocation   | Counter       | Total number of attempts to allocate containers. State of the attempt includes `allocated`, `rejected`, `error`, `released`. Increase only.  |
+| applicationSubmission | Counter       | Total number of application submissions. State of the attempt includes `accepted` and `rejected`. Increase only. |
+| applicationStatus     | Gauge         | Total number of application status. State of the application includes `running` and `completed`.  | 
+| totalNodeActive       | Gauge         | Total number of active nodes.                          |
+| totalNodeFailed       | Gauge         | Total number of failed nodes.                          |
+| nodeResourceUsage     | Gauge         | Total resource usage of node, by resource name.        |
+| schedulingLatency     | Histogram     | Latency of the main scheduling routine, in seconds.    |
+| nodeSortingLatency    | Histogram     | Latency of all nodes sorting, in seconds.              |
+| appSortingLatency     | Histogram     | Latency of all applications sorting, in seconds.       |
+| queueSortingLatency   | Histogram     | Latency of all queues sorting, in seconds.             |
+| tryNodeLatency        | Histogram     | Latency of node condition checks for container allocations, such as placement constraints, in seconds, in seconds. |
+
+###    Queue Metrics
+
+| Metrics Name              | Metrics Type  | Description |
+| ------------------------- | ------------- | ----------- |
+| appMetrics                | Counter       | Application Metrics, record the total number of applications. State of the application includes `accepted`,`rejected` and `Completed`.     |
+| usedResourceMetrics       | Gauge         | Queue used resource.     |
+| pendingResourceMetrics    | Gauge         | Queue pending resource.  |
+| availableResourceMetrics  | Gauge         | Used resource metrics related to queues etc.    |
+
+###    Event Metrics
+
+| Metrics Name             | Metrics Type  | Description |
+| ------------------------ | ------------  | ----------- |
+| totalEventsCreated       | Gauge         | Total events created.          |
+| totalEventsChanneled     | Gauge         | Total events channeled.        |
+| totalEventsNotChanneled  | Gauge         | Total events not channeled.    |
+| totalEventsProcessed     | Gauge         | Total events processed.        |
+| totalEventsStored        | Gauge         | Total events stored.           |
+| totalEventsNotStored     | Gauge         | Total events not stored.       |
+| totalEventsCollected     | Gauge         | Total events collected.        |
+
+## Access Metrics
+
+YuniKorn metrics are collected through Prometheus client library, and exposed via scheduler restful service.
+Once started, they can be accessed via endpoint http://localhost:9080/ws/v1/metrics.
+
+## Aggregate Metrics to Prometheus
+
+It's simple to setup a Prometheus server to grab YuniKorn metrics periodically. Follow these steps:
+
+- Setup Prometheus (read more from [Prometheus docs](https://prometheus.io/docs/prometheus/latest/installation/))
+
+- Configure Prometheus rules: a sample configuration 
+
+```yaml
+global:
+  scrape_interval:     3s
+  evaluation_interval: 15s
+
+scrape_configs:
+  - job_name: 'yunikorn'
+    scrape_interval: 1s
+    metrics_path: '/ws/v1/metrics'
+    static_configs:
+    - targets: ['docker.for.mac.host.internal:9080']
+```
+
+- start Prometheus
+
+```shell script
+docker pull prom/prometheus:latest
+docker run -p 9090:9090 -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus
+```
+
+Use `docker.for.mac.host.internal` instead of `localhost` if you are running Prometheus in a local docker container
+on Mac OS. Once started, open Prometheus web UI: http://localhost:9090/graph. You'll see all available metrics from
+YuniKorn scheduler.
+
diff --git a/versioned_docs/version-1.1.0/performance/performance_tutorial.md b/versioned_docs/version-1.1.0/performance/performance_tutorial.md
new file mode 100644
index 000000000..b17923838
--- /dev/null
+++ b/versioned_docs/version-1.1.0/performance/performance_tutorial.md
@@ -0,0 +1,522 @@
+---
+id: performance_tutorial
+title: Benchmarking Tutorial
+keywords:
+ - performance
+ - tutorial
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Overview
+
+The YuniKorn community continues to optimize the performance of the scheduler, ensuring that YuniKorn satisfies the performance requirements of large-scale batch workloads. Thus, the community has built some useful tools for performance benchmarking that can be reused across releases. This document introduces all these tools and steps to run them.
+
+## Hardware
+
+Be aware that performance result is highly variable depending on the underlying  hardware. All results published in the doc can only be used as references. We encourage each individual to run similar tests on their own environments in order to get a result based on your own hardware. This doc is just for demonstration purpose.
+
+A list of servers being used in this test are (Huge thanks to [National Taichung University of Education](http://www.ntcu.edu.tw/newweb/index.htm), [Kuan-Chou Lai](http://www.ntcu.edu.tw/kclai/) for providing these servers for running tests):
+
+| Manchine Type         | CPU | Memory | Download/upload(Mbps) |
+| --------------------- | --- | ------ | --------------------- |
+| HP                    | 16  | 36G    | 525.74/509.86         |
+| HP                    | 16  | 30G    | 564.84/461.82         |
+| HP                    | 16  | 30G    | 431.06/511.69         |
+| HP                    | 24  | 32G    | 577.31/576.21         |
+| IBM blade H22         | 16  | 38G    | 432.11/4.15           |
+| IBM blade H22         | 16  | 36G    | 714.84/4.14           |
+| IBM blade H22         | 16  | 42G    | 458.38/4.13           |
+| IBM blade H22         | 16  | 42G    | 445.42/4.13           |
+| IBM blade H22         | 16  | 32G    | 400.59/4.13           |
+| IBM blade H22         | 16  | 12G    | 499.87/4.13           |
+| IBM blade H23         | 8   | 32G    | 468.51/4.14           |
+| WS660T                | 8   | 16G    | 87.73/86.30           |
+| ASUSPRO D640MB_M640SA | 4   | 8G     | 92.43/93.77           |
+| PRO E500 G6_WS720T    | 16  | 8G     | 90/87.18              |
+| WS E500 G6_WS720T     | 8   | 40G    | 92.61/89.78           |
+| E500 G5               | 8   | 8G     | 91.34/85.84           |
+| WS E500 G5_WS690T     | 12  | 16G    | 92.2/93.76            |
+| WS E500 G5_WS690T     | 8   | 32G    | 91/89.41              |
+| WS E900 G4_SW980T     | 80  | 512G   | 89.24/87.97           |
+
+The following steps are needed for each server, otherwise the large scale testing may fail due to the limited number of users/processes/open-files.
+
+### 1. Set /etc/sysctl.conf
+```
+kernel.pid_max=400000
+fs.inotify.max_user_instances=50000
+fs.inotify.max_user_watches=52094
+```
+### 2. Set /etc/security/limits.conf
+
+```
+* soft nproc 4000000
+* hard nproc 4000000
+root soft nproc 4000000
+root hard nproc 4000000
+* soft nofile 50000
+* hard nofile 50000
+root soft nofile 50000
+root hard nofile 50000
+```
+---
+
+## Deploy workflow
+
+Before going into the details, here are the general steps used in our tests:
+
+- [Step 1](#Kubernetes): Properly configure Kubernetes API server and controller manager, then add worker nodes.
+- [Step 2](#Setup-Kubemark): Deploy hollow pods,which will simulate worker nodes, name hollow nodes. After all hollow nodes in ready status, we need to cordon all native nodes, which are physical presence in the cluster, not the simulated nodes, to avoid we allocated test workload pod to native nodes.
+- [Step 3](#Deploy-YuniKorn): Deploy YuniKorn using the Helm chart on the master node, and scale down the Deployment to 0 replica, and [modify the port](#Setup-Prometheus) in `prometheus.yml` to match the port of the service.
+- [Step 4](#Run-tests): Deploy 50k Nginx pods for testing, and the API server will create them. But since the YuniKorn scheduler Deployment has been scaled down to 0 replica, all Nginx pods will be stuck in pending.
+- [Step 5](../user_guide/trouble_shooting.md#restart-the-scheduler): Scale up The YuniKorn Deployment back to 1 replica, and cordon the master node to avoid YuniKorn allocating Nginx pods there. In this step, YuniKorn will start collecting the metrics.
+- [Step 6](#Collect-and-Observe-YuniKorn-metrics): Observe the metrics exposed in Prometheus UI.
+---
+
+## Setup Kubemark
+
+[Kubemark](https://github.com/kubernetes/kubernetes/tree/master/test/kubemark) is a performance testing tool which allows users to run experiments on simulated clusters. The primary use case is the scalability testing. The basic idea is to run tens or hundreds of fake kubelet nodes on one physical node in order to simulate large scale clusters. In our tests, we leverage Kubemark to simulate up to a 4K-node cluster on less than 20 physical nodes.
+
+### 1. Build image
+
+##### Clone kubernetes repo, and build kubemark binary file
+
+```
+git clone https://github.com/kubernetes/kubernetes.git
+```
+```
+cd kubernetes
+```
+```
+KUBE_BUILD_PLATFORMS=linux/amd64 make kubemark GOFLAGS=-v GOGCFLAGS="-N -l"
+```
+
+##### Copy kubemark binary file to the image folder and build kubemark docker image
+
+```
+cp _output/bin/kubemark cluster/images/kubemark
+```
+```
+IMAGE_TAG=v1.XX.X make build
+```
+After this step, you can get the kubemark image which can simulate cluster node. You can upload it to Docker-Hub or just deploy it locally.
+
+### 2. Install Kubermark
+
+##### Create kubemark namespace
+
+```
+kubectl create ns kubemark
+```
+
+##### Create configmap
+
+```
+kubectl create configmap node-configmap -n kubemark --from-literal=content.type="test-cluster"
+```
+
+##### Create secret
+
+```
+kubectl create secret generic kubeconfig --type=Opaque --namespace=kubemark --from-file=kubelet.kubeconfig={kubeconfig_file_path} --from-file=kubeproxy.kubeconfig={kubeconfig_file_path}
+```
+### 3. Label node
+
+We need to label all native nodes, otherwise the scheduler might allocate hollow pods to other simulated hollow nodes. We can leverage Node selector in yaml to allocate hollow pods to native nodes.
+
+```
+kubectl label node {node name} tag=tagName
+```
+
+### 4. Deploy Kubemark
+
+The hollow-node.yaml is down below, there are some parameters we can configure.
+
+```
+apiVersion: v1
+kind: ReplicationController
+metadata:
+  name: hollow-node
+  namespace: kubemark
+spec:
+  replicas: 2000  # the node number you want to simulate
+  selector:
+      name: hollow-node
+  template:
+    metadata:
+      labels:
+        name: hollow-node
+    spec:
+      nodeSelector:  # leverage label to allocate to native node
+        tag: tagName  
+      initContainers:
+      - name: init-inotify-limit
+        image: docker.io/busybox:latest
+        imagePullPolicy: IfNotPresent
+        command: ['sysctl', '-w', 'fs.inotify.max_user_instances=200'] # set as same as max_user_instance in actual node 
+        securityContext:
+          privileged: true
+      volumes:
+      - name: kubeconfig-volume
+        secret:
+          secretName: kubeconfig
+      - name: logs-volume
+        hostPath:
+          path: /var/log
+      containers:
+      - name: hollow-kubelet
+        image: 0yukali0/kubemark:1.20.10 # the kubemark image you build 
+        imagePullPolicy: IfNotPresent
+        ports:
+        - containerPort: 4194
+        - containerPort: 10250
+        - containerPort: 10255
+        env:
+        - name: NODE_NAME
+          valueFrom:
+            fieldRef:
+              fieldPath: metadata.name
+        command:
+        - /kubemark
+        args:
+        - --morph=kubelet
+        - --name=$(NODE_NAME)
+        - --kubeconfig=/kubeconfig/kubelet.kubeconfig
+        - --alsologtostderr
+        - --v=2
+        volumeMounts:
+        - name: kubeconfig-volume
+          mountPath: /kubeconfig
+          readOnly: true
+        - name: logs-volume
+          mountPath: /var/log
+        resources:
+          requests:    # the resource of hollow pod, can modify it.
+            cpu: 20m
+            memory: 50M
+        securityContext:
+          privileged: true
+      - name: hollow-proxy
+        image: 0yukali0/kubemark:1.20.10 # the kubemark image you build 
+        imagePullPolicy: IfNotPresent
+        env:
+        - name: NODE_NAME
+          valueFrom:
+            fieldRef:
+              fieldPath: metadata.name
+        command:
+        - /kubemark
+        args:
+        - --morph=proxy
+        - --name=$(NODE_NAME)
+        - --use-real-proxier=false
+        - --kubeconfig=/kubeconfig/kubeproxy.kubeconfig
+        - --alsologtostderr
+        - --v=2
+        volumeMounts:
+        - name: kubeconfig-volume
+          mountPath: /kubeconfig
+          readOnly: true
+        - name: logs-volume
+          mountPath: /var/log
+        resources:  # the resource of hollow pod, can modify it.
+          requests:
+            cpu: 20m
+            memory: 50M
+      tolerations:
+      - effect: NoExecute
+        key: node.kubernetes.io/unreachable
+        operator: Exists
+      - effect: NoExecute
+        key: node.kubernetes.io/not-ready
+        operator: Exists
+```
+
+once done editing, apply it to the cluster:
+
+```
+kubectl apply -f hollow-node.yaml
+```
+
+---
+
+## Deploy YuniKorn
+
+#### Install YuniKorn with helm
+
+We can install YuniKorn with Helm, please refer to this [doc](https://yunikorn.apache.org/docs/#install).
+We need to tune some parameters based on the default configuration. We recommend to clone the [release repo](https://github.com/apache/yunikorn-release) and modify the parameters in `value.yaml`.
+
+```
+git clone https://github.com/apache/yunikorn-release.git
+cd helm-charts/yunikorn
+```
+
+#### Configuration
+
+The modifications in the `value.yaml` are:
+
+- increased memory/cpu resources for the scheduler pod
+- disabled the admission controller
+- set the app sorting policy to FAIR
+
+please see the changes below:
+
+```
+resources:
+  requests:
+    cpu: 14
+    memory: 16Gi
+  limits:
+    cpu: 14
+    memory: 16Gi
+```
+```
+embedAdmissionController: false
+```
+```
+configuration: |
+  partitions:
+    -
+      name: default
+      queues:
+        - name: root
+          submitacl: '*'
+          queues:
+            -
+              name: sandbox
+              properties:
+                application.sort.policy: fair
+```
+
+#### Install YuniKorn with local release repo
+
+```
+Helm install yunikorn . --namespace yunikorn
+```
+
+---
+
+## Setup Prometheus
+
+YuniKorn exposes its scheduling metrics via Prometheus. Thus, we need to set up a Prometheus server to collect these metrics.
+
+### 1. Download Prometheus release
+
+```
+wget https://github.com/prometheus/prometheus/releases/download/v2.30.3/prometheus-2.30.3.linux-amd64.tar.gz
+```
+```
+tar xvfz prometheus-*.tar.gz
+cd prometheus-*
+```
+
+### 2. Configure prometheus.yml
+
+```
+global:
+  scrape_interval:     3s
+  evaluation_interval: 15s
+
+scrape_configs:
+  - job_name: 'yunikorn'
+    scrape_interval: 1s
+    metrics_path: '/ws/v1/metrics'
+    static_configs:
+    - targets: ['docker.for.mac.host.internal:9080'] 
+    # 9080 is internal port, need port forward or modify 9080 to service's port
+```
+
+### 3. Launch Prometheus
+```
+./prometheus --config.file=prometheus.yml
+```
+
+---
+## Run tests
+
+Once the environment is setup, you are good to run workloads and collect results. YuniKorn community has some useful tools to run workloads and collect metrics, more details will be published here.
+
+### 1. Scenarios 
+In performance tools, there are three types of tests and feedbacks.
+
+|	Test type	|						Description									|	Diagram	|  		Log		|
+| ---------------------	| ------------------------------------------------------------------------------------------------------------------------	| ------------- | ----------------------------- |
+|	node fairness	|	Monitor node resource usage(allocated/capicity) with lots of pods requests						| 	Exist	|	Exist			|
+|	thourghput	|	Measure schedulers' throughput by calculating how many pods are allocated per second based on the pod start time	|	Exist	|	None			|
+
+### 2. Build tool
+The performance tool is available in [yunikorn release repo](https://github.com/apache/yunikorn-release.git),clone the repo to your workspace. 
+```
+git clone https://github.com/apache/yunikorn-release.git
+```
+Build the tool:
+```
+cd yunikorn-release/perf-tools/
+make build
+cd target/perf-tools-bin
+```
+It will look like this.
+![Build-perf-tools](./../assets/perf-tutorial-build.png)
+
+### 3. Set test configuration
+Before start tests, check configuration whether meet your except.
+Default output path is `/tmp`, you can modify `common.outputrootpath` to change it.
+If you set these fields with large number to cause timeout problem, increase value in `common.maxwaitseconds` to allow it.
+
+#### Throughput case
+
+|	Field			|			Description											|
+| ---				| ---									 						|
+|	SchedulerNames		|	List of scheduler will run the test											|
+|	ShowNumOfLastTasks	|	Show metadata of last number of pods										|
+|	CleanUpDelayMs		|	Controll period to refresh deployments status and print log							| 	
+|	RequestConfigs		|	Set resource request and decide number of deployments and pods per deployment with `repeat` and `numPods`	|
+
+In this case,yunikorn and default scheduler will sequentially separately create ten deployments which contains fifty pods.
+It will look like these.
+![throughputConf](./../assets/throughput_conf.png)
+![ThroughputDeployment](./../assets/perf_throughput.png)
+
+#### Node fairness case
+
+|	Field			|	Description									|
+| --- 				| ---											|
+|	SchedulerNames		|	List of schduler will run the test						|
+|	NumPodsPerNode		|	It equals that total pods divided by nodes					|
+|	AllocatePercentage	|	Allow how much percentage of allocatable resource is allowed to allocate	|
+
+Total number of pods will be multiplication of number of ready nodes and `NumPodsPerNode`.
+In following figure, there are thirteen ready nodes and `NumPodsPerNode` is eighty.
+There will be one thousand fourty pods created.
+![nodeFairnessConf](./../assets/node_fairness_conf.png)
+![nodeFairnessDeployment](./../assets/perf_node_fairness.png)
+
+#### e2e perf case
+Its field is similar to throughput one but there is only scheduler in each case.
+![scheduleTestConf](./../assets/perf_e2e_test_conf.png)
+![scheduleTest](./../assets/perf_e2e_test.png)
+
+###  4. Diagrams and logs
+```
+./perf-tools
+```
+It will show result log when each case finished.
+When tests finished, it will look like
+![Result log](./../assets/perf-tutorial-resultLog.png)
+We can find result diagrams and logs in `common.outputrootpath` which is in conf.yaml.
+Related diagrams and logs will be like this.
+![Result diagrams and logs](./../assets/perf-tutorial-resultDiagrams.png)
+---
+
+## Collect and Observe YuniKorn metrics
+
+After Prometheus is launched, YuniKorn metrics can be easily collected. Here is the [docs](metrics.md) of YuniKorn metrics. YuniKorn tracks some key scheduling metrics which measure the latency of some critical scheduling paths. These metrics include:
+
+ - **scheduling_latency_seconds:** Latency of the main scheduling routine, in seconds.
+ - **app_sorting_latency_seconds**: Latency of all applications sorting, in seconds.
+ - **node_sorting_latency_seconds**: Latency of all nodes sorting, in seconds.
+ - **queue_sorting_latency_seconds**: Latency of all queues sorting, in seconds.
+ - **container_allocation_attempt_total**: Total number of attempts to allocate containers. State of the attempt includes `allocated`, `rejected`, `error`, `released`. Increase only.
+
+you can select and generate graph on Prometheus UI easily, such as:
+
+![Prometheus Metrics List](./../assets/prometheus.png)
+
+
+---
+
+## Performance Tuning
+
+### Kubernetes
+
+The default K8s setup has limited concurrent requests which limits the overall throughput of the cluster. In this section, we introduced a few parameters that need to be tuned up in order to increase the overall throughput of the cluster.
+
+#### kubeadm
+
+Set pod-network mask
+
+```
+kubeadm init --pod-network-cidr=10.244.0.0/8
+```
+
+#### CNI
+
+Modify CNI mask and resources.
+
+```
+  net-conf.json: |
+    {
+      "Network": "10.244.0.0/8",
+      "Backend": {
+        "Type": "vxlan"
+      }
+    }
+```
+```
+  resources:
+    requests:
+      cpu: "100m"
+      memory: "200Mi"
+    limits:
+      cpu: "100m"
+      memory: "200Mi"
+```
+
+
+#### Api-Server
+
+In the Kubernetes API server, we need to modify two parameters: `max-mutating-requests-inflight` and `max-requests-inflight`. Those two parameters represent the API request bandwidth. Because we will generate a large amount of pod request, we need to increase those two parameters. Modify `/etc/kubernetes/manifest/kube-apiserver.yaml`:
+
+```
+--max-mutating-requests-inflight=3000
+--max-requests-inflight=3000
+```
+
+#### Controller-Manager
+
+In the Kubernetes controller manager, we need to increase the value of three parameters: `node-cidr-mask-size`, `kube-api-burst` and `kube-api-qps`. `kube-api-burst` and `kube-api-qps` control the server side request bandwidth. `node-cidr-mask-size` represents the node CIDR. it needs to be increased as well in order to scale up to thousands of nodes. 
+
+
+Modify `/etc/kubernetes/manifest/kube-controller-manager.yaml`:
+
+```
+--node-cidr-mask-size=21 //log2(max number of pods in cluster)
+--kube-api-burst=3000
+--kube-api-qps=3000
+```
+
+#### kubelet
+
+In single worker node, we can run 110 pods as default. But to get higher node resource utilization, we need to add some parameters in Kubelet launch command, and restart it.
+
+Modify start arg in `/etc/systemd/system/kubelet.service.d/10-kubeadm.conf`, add `--max-Pods=300` behind the start arg and restart
+
+```
+systemctl daemon-reload
+systemctl restart kubelet
+```
+
+---
+
+## Summary
+
+With Kubemark and Prometheus, we can easily run benchmark testing, collect YuniKorn metrics and analyze the performance. This helps us to identify the performance bottleneck in the scheduler and further eliminate them. The YuniKorn community will continue to improve these tools in the future, and continue to gain more performance improvements.
diff --git a/versioned_docs/version-1.1.0/performance/profiling.md b/versioned_docs/version-1.1.0/performance/profiling.md
new file mode 100644
index 000000000..662b341e0
--- /dev/null
+++ b/versioned_docs/version-1.1.0/performance/profiling.md
@@ -0,0 +1,122 @@
+---
+id: profiling
+title: Profiling
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+Use [pprof](https://github.com/google/pprof) to do CPU, Memory profiling can help you understand the runtime status of YuniKorn scheduler. Profiling instruments have been
+added to YuniKorn rest service, we can easily retrieve and analyze them from HTTP
+endpoints.
+
+## CPU profiling
+
+At this step, ensure you already have YuniKorn running, it can be either running from
+local via a `make run` command, or deployed as a pod running inside of K8s. Then run
+
+```
+go tool pprof http://localhost:9080/debug/pprof/profile
+```
+
+The profile data will be saved on local file system, once that is done, it enters into
+the interactive mode. Now you can run profiling commands, such as
+
+```
+(pprof) top
+Showing nodes accounting for 14380ms, 44.85% of 32060ms total
+Dropped 145 nodes (cum <= 160.30ms)
+Showing top 10 nodes out of 106
+      flat  flat%   sum%        cum   cum%
+    2130ms  6.64%  6.64%     2130ms  6.64%  __tsan_read
+    1950ms  6.08% 12.73%     1950ms  6.08%  __tsan::MetaMap::FreeRange
+    1920ms  5.99% 18.71%     1920ms  5.99%  __tsan::MetaMap::GetAndLock
+    1900ms  5.93% 24.64%     1900ms  5.93%  racecall
+    1290ms  4.02% 28.67%     1290ms  4.02%  __tsan_write
+    1090ms  3.40% 32.06%     3270ms 10.20%  runtime.mallocgc
+    1080ms  3.37% 35.43%     1080ms  3.37%  __tsan_func_enter
+    1020ms  3.18% 38.62%     1120ms  3.49%  runtime.scanobject
+    1010ms  3.15% 41.77%     1010ms  3.15%  runtime.nanotime
+     990ms  3.09% 44.85%      990ms  3.09%  __tsan::DenseSlabAlloc::Refill
+```
+
+you can type command such as `web` or `gif` to get a graph that helps you better
+understand the overall performance on critical code paths. You can get something
+like below:
+
+![CPU Profiling](./../assets/cpu_profile.jpg)
+
+Note, in order to use these
+options, you need to install the virtualization tool `graphviz` first, if you are using Mac, simply run `brew install graphviz`, for more info please refer [here](https://graphviz.gitlab.io/).
+
+## Memory Profiling
+
+Similarly, you can run
+
+```
+go tool pprof http://localhost:9080/debug/pprof/heap
+```
+
+this will return a snapshot of current heap which allows us to check memory usage.
+Once it enters the interactive mode, you can run some useful commands. Such as
+top can list top memory consumption objects.
+```
+(pprof) top
+Showing nodes accounting for 83.58MB, 98.82% of 84.58MB total
+Showing top 10 nodes out of 86
+      flat  flat%   sum%        cum   cum%
+      32MB 37.84% 37.84%       32MB 37.84%  github.com/apache/yunikorn-core/pkg/cache.NewClusterInfo
+      16MB 18.92% 56.75%       16MB 18.92%  github.com/apache/yunikorn-core/pkg/rmproxy.NewRMProxy
+      16MB 18.92% 75.67%       16MB 18.92%  github.com/apache/yunikorn-core/pkg/scheduler.NewScheduler
+      16MB 18.92% 94.59%       16MB 18.92%  github.com/apache/yunikorn-k8shim/pkg/dispatcher.init.0.func1
+    1.04MB  1.23% 95.81%     1.04MB  1.23%  k8s.io/apimachinery/pkg/runtime.(*Scheme).AddKnownTypeWithName
+    0.52MB  0.61% 96.43%     0.52MB  0.61%  github.com/gogo/protobuf/proto.RegisterType
+    0.51MB  0.61% 97.04%     0.51MB  0.61%  sync.(*Map).Store
+    0.50MB   0.6% 97.63%     0.50MB   0.6%  regexp.onePassCopy
+    0.50MB  0.59% 98.23%     0.50MB  0.59%  github.com/json-iterator/go.(*Iterator).ReadString
+    0.50MB  0.59% 98.82%     0.50MB  0.59%  text/template/parse.(*Tree).newText
+```
+
+you can also run `web`, `pdf` or `gif` command to get the graph for heap.
+
+## Download profiling samples and analyze it locally
+
+We have included essential go/go-tool binaries in scheduler docker image, you should be able to do some basic profiling
+analysis inside of the docker container. However, if you want to dig into some issues, it might be better to do the analysis
+locally. Then you need to copy the samples file to local environment first. The command to copy files is like following:
+
+```
+kubectl cp ${SCHEDULER_POD_NAME}:${SAMPLE_PATH_IN_DOCKER_CONTAINER} ${LOCAL_COPY_PATH}
+```
+
+for example
+
+```
+kubectl cp yunikorn-scheduler-cf8f8dd8-6szh5:/root/pprof/pprof.k8s_yunikorn_scheduler.samples.cpu.001.pb.gz /Users/wyang/Downloads/pprof.k8s_yunikorn_scheduler.samples.cpu.001.pb.gz
+```
+
+once you get the file in your local environment, then you can run the `pprof` command for analysis.
+
+```
+go tool pprof /Users/wyang/Downloads/pprof.k8s_yunikorn_scheduler.samples.cpu.001.pb.gz
+```
+
+## Resources
+
+* pprof Document https://github.com/google/pprof/tree/master/doc.
diff --git a/versioned_docs/version-1.1.0/user_guide/acls.md b/versioned_docs/version-1.1.0/user_guide/acls.md
new file mode 100644
index 000000000..c1fd7f2f0
--- /dev/null
+++ b/versioned_docs/version-1.1.0/user_guide/acls.md
@@ -0,0 +1,119 @@
+---
+id: acls
+title: ACLs
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+:::info
+User information is passed to the core scheduler from the kubernetes shim using the methodology defined [here](usergroup_resolution)
+:::
+
+## Usage
+Access Control Lists are generic for YuniKorn.
+They can be used in multiple places in YuniKorn.
+The current use case is limited to queue ACLs.
+
+Access control lists give access to the users and groups that have been specified in the list.
+They do not provide the possibility to explicitly remove or deny access to the users and groups specified in the list.
+
+If there is no access control list is configured access is *denied* by default.
+
+## Syntax
+The access control list is defined as:
+```
+ACL ::= “*” |  userlist [ “ “ grouplist ]
+userlist ::= “” | user { “,” user }
+grouplist ::= “” | group { “,” group }
+```
+
+This definition specifies a wildcard of * which results in access for everyone.
+
+If the user list is empty and the group list is empty nobody will have access.
+This deny all ACL has two possible representations:
+* an empty access control list. (implicit)
+* a single space. (explicit)
+
+## Example config
+
+### Simple examples
+An ACL that allows access to just the user `sue`
+```yaml
+  adminacl: sue
+```
+Nobody else will get access, this is just for `sue`.
+`john` and `bob` will be denied access.
+
+An ACL that allows access to the user `sue` and the members of the group `dev`.
+```yaml
+  adminacl: sue dev
+```
+The user `sue` gets access based on her explicit mention in the user part of the ACL.
+Even though she is not a member of the group `dev`. Her group membership is irrelevant.
+
+The user named `john` whom is a member of the group `dev` will be allowed access based on his group membership.
+A third user called `bob` whom is not mentioned explicitly and is not a member of the `dev` group will be denied access.
+
+An ACL that allows access to the members of the groups `dev` and `test`.
+```yaml
+  adminacl: " dev,test"
+```
+The ACL must start with a space to indicate that there is no user list.
+If the ACL is not correctly quoted the space is dropped by the yaml parser.
+Since the user list is empty none of the users will get access unless they are a member of either the `dev` or `test` group.
+
+Looking at the same three users as before:
+The user `sue` is not a member of either group and is denied access.
+The user named `john` whom is a member of the group `dev` will be allowed access based on his group membership.
+`bob` is not a member of the `dev` group but is a member of `test` and will be allowed access.
+
+### Escaping and quotation marks
+ACLs are currently implemented in the queue configuration which uses a yaml file.
+This places some limitations on the how to escape the values.
+Incorrectly quoted values will cause a yaml parse error or could lead to the incorrect interpretation of the value.
+
+The following points need to be taken into account:
+1. The wildcard entry must be quoted in the yaml config.
+1. A simple list of users with or without it being followed by a list of groups does not need quoting but may be quoted.
+1. An ACL without a user list and just one or more groups must be quoted to find the starting space:
+
+Correctly quoted ACL example
+```yaml
+partitions:
+  - name: default
+    queues:
+      - name: test
+        submitacl: "*"
+        adminacl: sue dev,test
+      - name: product
+        submitacl: " product"
+```
+
+## Access check
+The access check follows the pattern:
+* check if the ACL is the wildcard
+* check if the user is in the user list
+* check if any of the groups the user is a member of is part of the group list
+
+If a check matches the ACL allows access and checking is stopped.
+If none of the checks match the ACL denies access.
+
+## User and Group information
+For User & Group resolution, please follow instructions defined [here](usergroup_resolution)
diff --git a/versioned_docs/version-1.1.0/user_guide/deployment_modes.md b/versioned_docs/version-1.1.0/user_guide/deployment_modes.md
new file mode 100644
index 000000000..6864a9c3c
--- /dev/null
+++ b/versioned_docs/version-1.1.0/user_guide/deployment_modes.md
@@ -0,0 +1,51 @@
+---
+id: deployment_modes
+title: Deployment Modes
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## YuniKorn deployment modes
+
+YuniKorn can be deployed in two different modes: standard and plugin. In standard mode, YuniKorn runs as a customized
+Kubernetes scheduler. In plugin mode, YuniKorn is implemented as a set of plugins on top of the default Kubernetes
+scheduling framework.
+
+In both cases, it is recommended to run the admission controller as well, as this will ensure that only a single
+scheduler is active within your Kubernetes cluster. In this mode, the default Kubernetes scheduler (which is always running)
+will be bypassed for all pods except YuniKorn itself.
+
+## Which version should I use?
+
+### Standard mode
+
+Standard mode is currently the default. It is stable, efficient, and very performant. It is well-suited for
+deployments where most if not all pods are leveraging the queueing features of YuniKorn.
+
+### Plugin mode
+
+Plugin mode is a new deployment model where the scheduler is implemented on top of the default Kubernetes scheduling
+logic, allowing for better compatibility with the default Kubernetes scheduler. It is well-suited for mixed
+workloads (traditional Kubernetes as well as queued applications).
+
+Plugin mode is currently very new and has therefore not yet reached the maturity level of standard mode.
+
+To activate plugin mode when deploying with Helm, set the variable `enableSchedulerPlugin` to `true`.
+
diff --git a/versioned_docs/version-1.1.0/user_guide/gang_scheduling.md b/versioned_docs/version-1.1.0/user_guide/gang_scheduling.md
new file mode 100644
index 000000000..f7593a573
--- /dev/null
+++ b/versioned_docs/version-1.1.0/user_guide/gang_scheduling.md
@@ -0,0 +1,288 @@
+---
+id: gang_scheduling
+title: Gang Scheduling
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## What is Gang Scheduling
+
+When Gang Scheduling is enabled, YuniKorn schedules the app only when
+the app’s minimal resource request can be satisfied. Otherwise, apps
+will be waiting in the queue. Apps are queued in hierarchy queues,
+with gang scheduling enabled, each resource queue is assigned with the
+maximum number of applications running concurrently with min resource guaranteed.
+
+![Gang Scheduling](./../assets/gang_scheduling_iintro.png)
+
+## Enable Gang Scheduling
+
+There is no cluster-wide configuration needed to enable Gang Scheduling.
+The scheduler actively monitors the metadata of each app, if the app has included
+a valid taskGroups definition, it will be considered as gang scheduling desired.
+
+:::info Task Group
+A task group is a “gang” of tasks in an app, these tasks are having the same resource profile
+and the same placement constraints. They are considered as homogeneous requests that can be
+treated as the same kind in the scheduler.
+:::
+
+### Prerequisite
+
+For the queues which runs gang scheduling enabled applications, the queue sorting policy needs to be set either
+`FIFO` or `StateAware`. To configure queue sorting policy, please refer to doc: [app sorting policies](user_guide/sorting_policies.md#Application_sorting).
+
+:::info Why FIFO based sorting policy?
+When Gang Scheduling is enabled, the scheduler proactively reserves resources
+for each application. If the queue sorting policy is not FIFO based (StateAware is FIFO based sorting policy),
+the scheduler might reserve partial resources for each app and causing resource segmentation issues.
+:::
+
+### App Configuration
+
+On Kubernetes, YuniKorn discovers apps by loading metadata from individual pod, the first pod of the app
+is required to enclosed with a full copy of app metadata. If the app doesn’t have any notion about the first or second pod,
+then all pods are required to carry the same taskGroups info. Gang scheduling requires taskGroups definition,
+which can be specified via pod annotations. The required fields are:
+
+| Annotation                                     | Value |
+|----------------------------------------------- |---------------------	|
+| yunikorn.apache.org/task-group-name 	         | Task group name, it must be unique within the application |
+| yunikorn.apache.org/task-groups                | A list of task groups, each item contains all the info defined for the certain task group |
+| yunikorn.apache.org/schedulingPolicyParameters | Optional. A arbitrary key value pairs to define scheduling policy parameters. Please read [schedulingPolicyParameters section](#scheduling-policy-parameters) |
+
+#### How many task groups needed?
+
+This depends on how many different types of pods this app requests from K8s. A task group is a “gang” of tasks in an app,
+these tasks are having the same resource profile and the same placement constraints. They are considered as homogeneous
+requests that can be treated as the same kind in the scheduler. Use Spark as an example, each job will need to have 2 task groups,
+one for the driver pod and the other one for the executor pods.
+
+#### How to define task groups?
+
+The task group definition is a copy of the app’s real pod definition, values for fields like resources, node-selector, toleration
+and affinity should be the same as the real pods. This is to ensure the scheduler can reserve resources with the
+exact correct pod specification.
+
+#### Scheduling Policy Parameters
+
+Scheduling policy related configurable parameters. Apply the parameters in the following format in pod's annotation:
+
+```yaml
+annotations:
+   yunikorn.apache.org/schedulingPolicyParameters: "PARAM1=VALUE1 PARAM2=VALUE2 ..."
+```
+
+Currently, the following parameters are supported:
+
+`placeholderTimeoutInSeconds`
+
+Default value: *15 minutes*.
+This parameter defines the reservation timeout for how long the scheduler should wait until giving up allocating all the placeholders.
+The timeout timer starts to tick when the scheduler *allocates the first placeholder pod*. This ensures if the scheduler
+could not schedule all the placeholder pods, it will eventually give up after a certain amount of time. So that the resources can be
+freed up and used by other apps. If non of the placeholders can be allocated, this timeout won't kick-in. To avoid the placeholder
+pods stuck forever, please refer to [troubleshooting](trouble_shooting.md#gang-scheduling) for solutions.
+
+` gangSchedulingStyle`
+
+Valid values: *Soft*, *Hard*
+
+Default value: *Soft*.
+This parameter defines the fallback mechanism if the app encounters gang issues due to placeholder pod allocation.
+See more details in [Gang Scheduling styles](#gang-scheduling-styles) section
+
+More scheduling parameters will added in order to provide more flexibility while scheduling apps.
+
+#### Example
+
+The following example is a yaml file for a job. This job launches 2 pods and each pod sleeps 30 seconds.
+The notable change in the pod spec is *spec.template.metadata.annotations*, where we defined `yunikorn.apache.org/task-group-name`
+and `yunikorn.apache.org/task-groups`.
+
+```yaml
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: gang-scheduling-job-example
+spec:
+  completions: 2
+  parallelism: 2
+  template:
+    metadata:
+      labels:
+        app: sleep
+        applicationId: "gang-scheduling-job-example"
+        queue: root.sandbox
+      annotations:
+        yunikorn.apache.org/task-group-name: task-group-example
+        yunikorn.apache.org/task-groups: |-
+          [{
+              "name": "task-group-example",
+              "minMember": 2,
+              "minResource": {
+                "cpu": "100m",
+                "memory": "50M"
+              },
+              "nodeSelector": {},
+              "tolerations": [],
+              "affinity": {}
+          }]
+    spec:
+      schedulerName: yunikorn
+      restartPolicy: Never
+      containers:
+        - name: sleep30
+          image: "alpine:latest"
+          command: ["sleep", "30"]
+          resources:
+            requests:
+              cpu: "100m"
+              memory: "50M"
+```
+
+When this job is submitted to Kubernetes, 2 pods will be created using the same template, and they all belong to one taskGroup:
+*“task-group-example”*. YuniKorn will create 2 placeholder pods, each uses the resources specified in the taskGroup definition.
+When all 2 placeholders are allocated, the scheduler will bind the the real 2 sleep pods using the spot reserved by the placeholders.
+
+You can add more than one taskGroups if necessary, each taskGroup is identified by the taskGroup name,
+it is required to map each real pod with a pre-defined taskGroup by setting the taskGroup name. Note,
+the task group name is only required to be unique within an application.
+
+### Enable Gang scheduling for Spark jobs
+
+Each Spark job runs 2 types of pods, driver and executor. Hence, we need to define 2 task groups for each job.
+The annotations for the driver pod looks like:
+
+```yaml
+Annotations:
+  yunikorn.apache.org/schedulingPolicyParameters: “placeholderTimeoutSeconds=30”
+  yunikorn.apache.org/taskGroupName: “spark-driver”
+  yunikorn.apache.org/taskGroup: “
+    TaskGroups: [
+     {
+       Name: “spark-driver”,
+       minMember: 1,
+       minResource: {
+         Cpu: 1,
+         Memory: 2Gi
+       },
+       Node-selector: ...,
+       Tolerations: ...,
+       Affinity: ...
+     },
+      {
+        Name: “spark-executor”,
+        minMember: 10, 
+        minResource: {
+          Cpu: 1,
+          Memory: 2Gi
+        }
+      }
+  ]
+  ”
+```
+
+:::note
+Spark driver and executor pod has memory overhead, that needs to be considered in the taskGroup resources. 
+:::
+
+For all the executor pods,
+
+```yaml
+Annotations:
+  # the taskGroup name should match to the names
+  # defined in the taskGroups field
+  yunikorn.apache.org/taskGroupName: “spark-executor”
+```
+
+Once the job is submitted to the scheduler, the job won’t be scheduled immediately.
+Instead, the scheduler will ensure it gets its minimal resources before actually starting the driver/executors. 
+
+## Gang scheduling Styles
+
+There are 2 gang scheduling styles supported, Soft and Hard respectively. It can be configured per app-level to define how the app will behave in case the gang scheduling fails.
+
+- `Hard style`: when this style is used, we will have the initial behavior, more precisely if the application cannot be scheduled according to gang scheduling rules, and it times out, it will be marked as failed, without retrying to schedule it.
+- `Soft style`: when the app cannot be gang scheduled, it will fall back to the normal scheduling, and the non-gang scheduling strategy will be used to achieve the best-effort scheduling. When this happens, the app transits to the Resuming state and all the remaining placeholder pods will be cleaned up.
+
+**Default style used**: `Soft`
+
+**Enable a specific style**: the style can be changed by setting in the application definition the ‘gangSchedulingStyle’ parameter to Soft or Hard.
+
+#### Example
+
+```yaml
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: gang-app-timeout
+spec:
+  completions: 4
+  parallelism: 4
+  template:
+    metadata:
+      labels:
+        app: sleep
+        applicationId: gang-app-timeout
+        queue: fifo
+      annotations:
+        yunikorn.apache.org/task-group-name: sched-style
+        yunikorn.apache.org/schedulingPolicyParameters: "placeholderTimeoutInSeconds=60 gangSchedulingStyle=Hard"
+        yunikorn.apache.org/task-groups: |-
+          [{
+              "name": "sched-style",
+              "minMember": 4,
+              "minResource": {
+                "cpu": "1",
+                "memory": "1000M"
+              },
+              "nodeSelector": {},
+              "tolerations": [],
+              "affinity": {}
+          }]
+    spec:
+      schedulerName: yunikorn
+      restartPolicy: Never
+      containers:
+        - name: sleep30
+          image: "alpine:latest"
+          imagePullPolicy: "IfNotPresent"
+          command: ["sleep", "30"]
+          resources:
+            requests:
+              cpu: "1"
+              memory: "1000M"
+
+```
+
+## Verify Configuration
+
+To verify if the configuration has been done completely and correctly, check the following things:
+1. When an app is submitted, verify the expected number of placeholders are created by the scheduler.
+If you define 2 task groups, 1 with minMember 1 and the other with minMember 5, that means we are expecting 6 placeholder
+gets created once the job is submitted.
+2. Verify the placeholder spec is correct. Each placeholder needs to have the same info as the real pod in the same taskGroup.
+Check field including: namespace, pod resources, node-selector, toleration and affinity.
+3. Verify the placeholders can be allocated on correct type of nodes, and verify the real pods are started by replacing the placeholder pods.
+
+## Troubleshooting
+
+Please see the troubleshooting doc when gang scheduling is enabled [here](trouble_shooting.md#gang-scheduling).
diff --git a/versioned_docs/version-1.1.0/user_guide/labels_and_annotations_in_yunikorn.md b/versioned_docs/version-1.1.0/user_guide/labels_and_annotations_in_yunikorn.md
new file mode 100644
index 000000000..fa38f746a
--- /dev/null
+++ b/versioned_docs/version-1.1.0/user_guide/labels_and_annotations_in_yunikorn.md
@@ -0,0 +1,48 @@
+---
+id: labels_and_annotations_in_yunikorn
+title: Labels and Annotations in YuniKorn
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+## Labels and Annotations in YuniKorn
+YuniKorn utilizes several Kubernetes labels and annotations to support various features:
+
+### Labels in YuniKorn
+| Name                | Description                                                                                                                                             | 
+|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `applicationId`     | Associates this pod with an application.                                                                                                                |
+| `queue`             | Selects the YuniKorn queue this application should be scheduled in. This may be ignored if a placement policy is in effect.                             |
+| `SparkLabelAppID `  | Alternative method of specifying `applicationId` used by Spark Operator if the label `applicationId` and annotation `yunikorn.apache.org/app-id` unset. | 
+| `disableStateAware` | If present, disables the YuniKorn state-aware scheduling policy for this pod. Set internally by the YuniKorn admission controller.                      |
+| `placeholder`       | Set if this pod represents a placeholder for gang scheduling. Set internally by YuniKorn.                                                               |
+
+### Annotations in YuniKorn
+All annotations are under the namespace `yunikorn.apache.org`. For example `yunikorn.apache.org/app-id`.
+
+| Name                         | Description                                                                                                                                                                            | 
+|------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `app-id`                     | Assoiates this pod with an application.<br/>The priority of applicationID is determined by: annotation `yunikorn.apache.org/app-id` > label `applicationId` > label `SparkLabelAppID`. |
+| `queue`                      | Selects the YuniKorn queue this application should be scheduled in.<br/>The priority of queue is determined by: label `queue` > annotation `yunikorn.apache.org/queue` > default.      | 
+| `task-group-name`            | Sets the task group name this pod belongs to for the purposes of gang scheduling. It must be listed within `task-groups`.                                                              |
+| `task-groups`                | Defines the set of task groups for this application for gang scheduling. Each pod within an application must define all task groups.                                                   |
+| `schedulingPolicyParameters` | Arbitrary key-value pairs used to customize scheduling policies such as gang scheduling.                                                                                               |
+| `placeholder`                | Set if this pod represents a placeholder for gang scheduling. Set internally by YuniKorn.                                                                                              |
+
+For more details surrounding gang-scheduling labels and annotations, please refer to the documentation on [gang scheduling](user_guide/gang_scheduling.md).
diff --git a/versioned_docs/version-1.1.0/user_guide/placement_rules.md b/versioned_docs/version-1.1.0/user_guide/placement_rules.md
new file mode 100644
index 000000000..5f2c64d6c
--- /dev/null
+++ b/versioned_docs/version-1.1.0/user_guide/placement_rules.md
@@ -0,0 +1,354 @@
+---
+id: placement_rules
+title: App Placement Rules
+---
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+The basics for the placement rules are described in the [scheduler configuration design document](design/scheduler_configuration.md#placement-rules-definition).
+Multiple rules can be chained to form a placement policy.
+[Access control lists](user_guide/acls.md) and rule filters are defined per rule and enforced per rule.
+This document explains how to build a policy, including the rule usage, that is part of the scheduler with examples.
+
+## Configuration
+Rules are defined per partition as part of the scheduler queue configuration.
+The order that the rules are defined in is the order in which they are executed.
+If a rule matches the policy will stop executing the remaining rules.
+
+A matching rule generates a fully qualified queue name.
+This means that the name returned starts at the _root_ queue.
+There is no limit on the number of levels in the queue hierarchy that can be generated.
+
+When a rule is executed the result of rules that have been executed is unknown and not taken into account.
+Similar for rule that have not been executed yet: rules cannot influence other rules except when they are configured as the [parent](#parent-parameter) rule.
+
+If the policy does not generate a queue name and no more rules are left the application will be rejected.
+
+Basic structure for the rule placement definition in the configuration:
+```yaml
+placementrules:
+  - name: <name of the 1st rule>
+  - name: <name of the 2nd rule>
+```
+Each rule can take a predefined set of parameters in the configuration.
+The name of the rules that can be used are given in the [rule](#rules) description.
+A rule name is not case sensitive in the configuration.
+Rule name must follow the following naming convention:
+* start with a letter: a-z or A-Z
+* followed by 0 or more characters a-z, A-Z, 0-9 or _
+
+A rule that is not known, i.e. the name does not map to the rules defined below, will cause a initialisation error of the placement manager.
+Rules can also throw a parse error or an error during the initialisation if the parameters are incorrect.
+A rule set with an error can never become active.
+
+A placement manager is considered initialised if it has an active rule set.
+When the configuration is reloaded a new rule set will be created to replace the active rule set.
+In the case that a new rule set loaded contains an error, i.e. is broken, the placement manager ignores the new rule set.
+This means that the placement manager stays in a the state it was in when a broken rule set is loaded.
+If the placement manager keeps using the existing active rule set in the case that it was already initialised.
+A message will be logged about the broken and ignored configuration.
+
+Dots "." in the rule result are replaced by the string "\_dot_".
+A dot is replaced because it is used as the hierarchy separator in the fully qualified queue name.
+Replacing the dot occurs before the full queue hierarchy is build and the result is qualified.
+This means that we allow user name and or tag values to contain dots without the dots affecting the queue hierarchy.
+For queues in the configuration that as an example must map to username with a dot you must specify them as follows:
+A user rule with the user `user.name` will generate the queue name `root.user_dot_name` as output.
+If that "user queue" must be added to the configuration the `user_dot_name` name should be used.
+
+### Create parameter
+The create parameter is a boolean flag that defines if a queue that is generated by the rule may be created if it does not exist.
+There is no guarantee that the queue will be created because the existing queues might prevent the queue creation.
+If the queue generated by the rule does not exist and the flag is not set to _true_ the result of the rule will be a fail.
+
+Basic yaml entry for a rule with `create` flag:
+```yaml
+placementrules:
+  - name: <name of the rule>
+    create: true
+```
+
+The default value is _false_.
+Allowed values: _true_ or _false_, any other value will cause a parse error.
+
+### Parent parameter
+The parent parameter allows specifying a rule that generates a parent queue for the current rule.
+Parent rules can be nested, a parent rule _may_ contain another parent rule.
+There is no enforced limit of parent rules that can be nested.
+
+A parent rule is treated as if it was a rule specified at the top level of the list and thus has the same parameters and requirements as a any other rule in the placement definition.
+The exception is that using a parent rule on a rule that already generates a fully qualified queue is considered a configuration error.
+This error can only occur on the rule of type [fixed](#fixed-rule), see the rule specification for more details.
+
+NOTE: the rule execution traverses down the list of parent rules and executes the last one in the list first.
+This means that the last parent rule will generate the queue directly below the root.
+See the examples for details.
+
+Basic yaml entry for a rule with a `parent` rule:
+```yaml
... 1905 lines suppressed ...