You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uniffle.apache.org by ro...@apache.org on 2022/10/13 02:15:43 UTC

[incubator-uniffle] branch master updated: [ISSUE-48][FEATURE][FOLLOW UP] add docs for operator (#261)

This is an automated email from the ASF dual-hosted git repository.

roryqi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle.git


The following commit(s) were added to refs/heads/master by this push:
     new 8be83904 [ISSUE-48][FEATURE][FOLLOW UP] add docs for operator (#261)
8be83904 is described below

commit 8be83904406f630363acd659ad48f17476c1ef63
Author: jasonawang <ja...@tencent.com>
AuthorDate: Thu Oct 13 10:15:37 2022 +0800

    [ISSUE-48][FEATURE][FOLLOW UP] add docs for operator (#261)
    
    ### What changes were proposed in this pull request?
    For issue #48
    I add docs about design and usage for operator.
    
    ### Why are the changes needed?
    Add doc for operator
    
    ### Does this PR introduce _any_ user-facing change?
    No
    
    ### How was this patch tested?
    Just doc
---
 README.md                                          |   7 ++
 .../operator/examples/configuration.yaml           |  74 +++++++++++
 .../operator/examples/full-restart/README.md       |  33 +++++
 .../examples/full-restart/rss-full-restart.yaml    |   2 +-
 .../operator/examples/full-upgrade/README.md       |  38 ++++++
 .../examples/full-upgrade/rss-full-upgrade.yaml    |   2 +-
 .../operator/examples/partition-upgrade/README.md  |  42 +++++++
 .../partition-upgrade/rss-partition-upgrade.yaml   |   2 +-
 .../operator/examples/specific-upgrade/README.md   |  38 ++++++
 .../specific-upgrade/rss-specific-upgrade.yaml     |   2 +-
 docs/asset/rss-crd-state-transition.png            | Bin 0 -> 71668 bytes
 docs/operator/README.md                            |  35 ++++++
 docs/operator/design.md                            | 137 +++++++++++++++++++++
 docs/operator/examples.md                          |  31 +++++
 docs/operator/install.md                           |  75 +++++++++++
 15 files changed, 514 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index f874ae38..48585664 100644
--- a/README.md
+++ b/README.md
@@ -230,6 +230,13 @@ The jar for MapReduce is located in <RSS_HOME>/jars/client/mr/rss-client-mr-XXXX
 Note that the RssMRAppMaster will automatically disable slow start (i.e., `mapreduce.job.reduce.slowstart.completedmaps=1`)
 and job recovery (i.e., `yarn.app.mapreduce.am.job.recovery.enable=false`)
 
+### Deploy In Kubernetes
+
+We have provided operator of uniffle used for deploying it in kubernetes environments.
+
+For details, see the following document:
+
+[operator docs](docs/operator)
 
 ## Configuration
 
diff --git a/deploy/kubernetes/operator/examples/configuration.yaml b/deploy/kubernetes/operator/examples/configuration.yaml
new file mode 100644
index 00000000..d7e9642b
--- /dev/null
+++ b/deploy/kubernetes/operator/examples/configuration.yaml
@@ -0,0 +1,74 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+---
+kind: ConfigMap
+apiVersion: v1
+metadata:
+  name: rss-configuration
+  namespace: kube-system
+data:
+  coordinator.conf: |-
+    rss.coordinator.app.expired 60000
+    rss.coordinator.exclude.nodes.file.path /data/rssadmin/rss/coo
+    rss.coordinator.server.heartbeat.timeout 30000
+    rss.jetty.http.port 19996
+    rss.rpc.server.port 19997
+  log4j.properties: |-
+    log4j.rootCategory=INFO, RollingAppender
+    log4j.appender.console=org.apache.log4j.ConsoleAppender
+    log4j.appender.console.Threshold=INFO
+    log4j.appender.console.target=System.err
+    log4j.appender.console.layout=org.apache.log4j.PatternLayout
+    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
+    log4j.appender.RollingAppender=org.apache.log4j.RollingFileAppender
+    log4j.appender.RollingAppender.File=./logs/rss.log
+    log4j.appender.RollingAppender.MaxFileSize=50MB
+    log4j.appender.RollingAppender.MaxBackupIndex=10
+    log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
+    log4j.appender.RollingAppender.layout.ConversionPattern=[%p] %d %t %c{1} %M - %m%n
+  server.conf: |-
+    rss.coordinator.quorum rss-coordinator-rss-demo-0:19997,rss-coordinator-rss-demo-1:19997
+    rss.jetty.http.port 19996
+    rss.rpc.executor.size 500
+    rss.rpc.message.max.size 1073741824
+    rss.rpc.server.port 19997
+    rss.server.app.expired.withoutHeartbeat 120000
+    rss.server.buffer.capacity 60g
+    rss.server.commit.timeout 600000
+    rss.server.disk.capacity 3g
+    rss.server.event.size.threshold.l1 128m
+    rss.server.event.size.threshold.l2 192m
+    rss.server.event.size.threshold.l3 256m
+    rss.server.flush.cold.storage.threshold.size 128m
+    rss.server.flush.thread.alive 6
+    rss.server.flush.threadPool.size 12
+    rss.server.hadoop.dfs.client.socket-timeout 15000
+    rss.server.hadoop.dfs.replication 2
+    rss.server.hdfs.base.path hdfs://${your-hdfs-path}
+    rss.server.health.check.enable false
+    rss.server.heartbeat.interval 10000
+    rss.server.heartbeat.timeout 60000
+    rss.server.memory.shuffle.highWaterMark.percentage 70.0
+    rss.server.memory.shuffle.lowWaterMark.percentage 10.0
+    rss.server.pending.event.timeoutSec 600
+    rss.server.preAllocation.expired 120000
+    rss.server.read.buffer.capacity 5g
+    rss.server.shuffle.expired.timeout.ms 120000
+    rss.server.write.retry.max 2
+    rss.storage.basePath /data1/rssdata,/data10/rssdata,/data11/rssdata,/data12/rssdata,/data2/rssdata,/data3/rssdata,/data4/rssdata,/data5/rssdata,/data6/rssdata,/data7/rssdata,/data8/rssdata,/data9/rssdata
+    rss.storage.type MEMORY_LOCALFILE
\ No newline at end of file
diff --git a/deploy/kubernetes/operator/examples/full-restart/README.md b/deploy/kubernetes/operator/examples/full-restart/README.md
new file mode 100644
index 00000000..2180ce20
--- /dev/null
+++ b/deploy/kubernetes/operator/examples/full-restart/README.md
@@ -0,0 +1,33 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+# Fully Restart of Shuffle Servers
+
+If we want to restart shuffle server pods in full, we need to set `.spec.shuffleServer.sync` field to `true`, and
+update `.spec.shuffleServer.upgradeStrategy.type` field to be `FullRestart`.
+
+```yaml
+spec:
+shuffleServer:
+  sync: true
+  upgradeStrategy:
+    type: "FullRestart"
+```
+
+Unlike full upgrade, full restart does not require configuration and image modification.
+
+We can refer to the [example](rss-full-restart.yaml).
\ No newline at end of file
diff --git a/deploy/kubernetes/operator/examples/full-restart/rss-full-restart.yaml b/deploy/kubernetes/operator/examples/full-restart/rss-full-restart.yaml
index 4083fba6..44fbde23 100644
--- a/deploy/kubernetes/operator/examples/full-restart/rss-full-restart.yaml
+++ b/deploy/kubernetes/operator/examples/full-restart/rss-full-restart.yaml
@@ -22,7 +22,7 @@ metadata:
   name: rss-full-restart-demo
   namespace: kube-system
 spec:
-  configMapName: rss-full-restart-demo
+  configMapName: "${rss-configuration-name}"
   coordinator:
     image: "${rss-coordinator-image}"
     initContainerImage: "busybox:latest"
diff --git a/deploy/kubernetes/operator/examples/full-upgrade/README.md b/deploy/kubernetes/operator/examples/full-upgrade/README.md
new file mode 100644
index 00000000..d8a46be9
--- /dev/null
+++ b/deploy/kubernetes/operator/examples/full-upgrade/README.md
@@ -0,0 +1,38 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+# Fully Upgrade of Shuffle Servers
+
+If we want to upgrade shuffle servers in full, we first need to update the configuration files in the configMap.
+
+Then, we need to edit the rss object as follows:
+
++ update `.spec.shuffleServer.image` with new image version of shuffle server
++ set `.spec.shuffleServer.sync` field to `true`
++ update `.spec.shuffleServer.upgradeStrategy` field:
+    + set `.spec.shuffleServer.upgradeStrategy.type` to be `FullUpgrade`
+
+```yaml
+spec:
+  shuffleServer:
+    image: "${rss-shuffle-server-image}"
+    sync: true
+    upgradeStrategy:
+      type: "FullUpgrade"
+```
+
+We can refer to the [example](rss-full-upgrade.yaml).
\ No newline at end of file
diff --git a/deploy/kubernetes/operator/examples/full-upgrade/rss-full-upgrade.yaml b/deploy/kubernetes/operator/examples/full-upgrade/rss-full-upgrade.yaml
index a7215574..a34afbd7 100644
--- a/deploy/kubernetes/operator/examples/full-upgrade/rss-full-upgrade.yaml
+++ b/deploy/kubernetes/operator/examples/full-upgrade/rss-full-upgrade.yaml
@@ -22,7 +22,7 @@ metadata:
   name: rss-full-upgrade-demo
   namespace: kube-system
 spec:
-  configMapName: rss-full-upgrade-demo
+  configMapName: "${rss-configuration-name}"
   coordinator:
     image: "${rss-coordinator-image}"
     initContainerImage: "busybox:latest"
diff --git a/deploy/kubernetes/operator/examples/partition-upgrade/README.md b/deploy/kubernetes/operator/examples/partition-upgrade/README.md
new file mode 100644
index 00000000..75cfe50c
--- /dev/null
+++ b/deploy/kubernetes/operator/examples/partition-upgrade/README.md
@@ -0,0 +1,42 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+# Specific Upgrade of Shuffle Servers
+
+If we want to upgrade shuffle servers in partition mode, we need to edit the rss object as follows:
+
++ update `.spec.shuffleServer.image` with new image version of shuffle server
++ set `.spec.shuffleServer.sync` field to `true`
++ update `.spec.shuffleServer.upgradeStrategy` field:
+    + set `.spec.shuffleServer.upgradeStrategy.type` to be `PartitionUpgrade`
+    + update `.spec.shuffleServer.upgradeStrategy.partition` field, which has the same meaning
+      as `.spec.updateStrategy.rollingUpdate.partition` field
+      in [StatefulSet workload](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) (the replicas
+      whose index is less than this value will keep the old version, and the replicas whose index is greater than or
+      equal to this value will be updated to the new version)
+
+```yaml
+spec:
+  shuffleServer:
+    image: "${rss-shuffle-server-image}"
+    sync: true
+    upgradeStrategy:
+      type: "PartitionUpgrade"
+      partition: 2
+```
+
+We can refer to the [example](rss-partition-upgrade.yaml).
\ No newline at end of file
diff --git a/deploy/kubernetes/operator/examples/partition-upgrade/rss-partition-upgrade.yaml b/deploy/kubernetes/operator/examples/partition-upgrade/rss-partition-upgrade.yaml
index fa4bf71c..5f791251 100644
--- a/deploy/kubernetes/operator/examples/partition-upgrade/rss-partition-upgrade.yaml
+++ b/deploy/kubernetes/operator/examples/partition-upgrade/rss-partition-upgrade.yaml
@@ -22,7 +22,7 @@ metadata:
   name: rss-parition-upgrade-demo
   namespace: kube-system
 spec:
-  configMapName: rss-parition-upgrade-demo
+  configMapName: "${rss-configuration-name}"
   coordinator:
     image: "${rss-coordinator-image}"
     initContainerImage: "busybox:latest"
diff --git a/deploy/kubernetes/operator/examples/specific-upgrade/README.md b/deploy/kubernetes/operator/examples/specific-upgrade/README.md
new file mode 100644
index 00000000..5ebc0226
--- /dev/null
+++ b/deploy/kubernetes/operator/examples/specific-upgrade/README.md
@@ -0,0 +1,38 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+# Specific Upgrade of Shuffle Servers
+
+If we want to upgrade shuffle servers by specifying replicas, we need to edit the rss object as follows:
+
+- update `.spec.shuffleServer.image` with new image version of shuffle server
+- set `.spec.shuffleServer.sync` field to `true`
+- update `.spec.shuffleServer.upgradeStrategy` field:
+    - set `.spec.shuffleServer.upgradeStrategy.type` to be `SpecificUpgrade`
+    - update `.spec.shuffleServer.upgradeStrategy.specificNames` field, which means the pod name we want to upgrade
+
+```yaml
+spec:
+  shuffleServer:
+    image: "${rss-shuffle-server-image}"
+    sync: true
+    upgradeStrategy:
+      type: "SpecificUpgrade"
+      specificNames: [ "rss-shuffle-server-demo-0" ]
+```
+
+We can refer to the [example](rss-specific-upgrade.yaml).
\ No newline at end of file
diff --git a/deploy/kubernetes/operator/examples/specific-upgrade/rss-specific-upgrade.yaml b/deploy/kubernetes/operator/examples/specific-upgrade/rss-specific-upgrade.yaml
index c2fca59c..48c9b490 100644
--- a/deploy/kubernetes/operator/examples/specific-upgrade/rss-specific-upgrade.yaml
+++ b/deploy/kubernetes/operator/examples/specific-upgrade/rss-specific-upgrade.yaml
@@ -22,7 +22,7 @@ metadata:
   name: rss-specific-upgrade-demo
   namespace: kube-system
 spec:
-  configMapName: rss-specific-upgrade-demo
+  configMapName: "${rss-configuration-name}"
   coordinator:
     image: "${rss-coordinator-image}"
     initContainerImage: "busybox:latest"
diff --git a/docs/asset/rss-crd-state-transition.png b/docs/asset/rss-crd-state-transition.png
new file mode 100644
index 00000000..f5329b8c
Binary files /dev/null and b/docs/asset/rss-crd-state-transition.png differ
diff --git a/docs/operator/README.md b/docs/operator/README.md
new file mode 100644
index 00000000..d60e918e
--- /dev/null
+++ b/docs/operator/README.md
@@ -0,0 +1,35 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+# Uniffle Operator
+
+The __[Uniffle Operator](https://github.com/apache/incubator-uniffle/tree/master/deploy/kubernetes/operator/)__ manages
+Apache Uniffle Cluster within Kubernetes.
+
+The operator is currently in beta (`v1alpha1`), and while we do not anticipate changing the API in
+backwards-incompatible ways there is no such guarantee yet.
+
+## Documentation
+
+Please visit the following pages for documentation on using and developing the Uniffle Operator:
+
+- [Installation](install.md): step-by-step instructions on how to get uniffle operator running on our cluster
+- [Design & Usage](design.md): overview design of operator and detail usage of CRD
+
+### Examples
+
+Example uses of each CRD have been [provided](examples.md).
diff --git a/docs/operator/design.md b/docs/operator/design.md
new file mode 100644
index 00000000..aeec45c3
--- /dev/null
+++ b/docs/operator/design.md
@@ -0,0 +1,137 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+# Uniffle Operator Design
+
+## Summary
+
+The purpose is to develop an operator to facilitate the rapid deployment of Uniffle in kubernetes environments.
+
+## Motivation
+
+Using the advantages of kubernetes in container orchestration, elastic scaling, and rolling upgrades, uniffle can more
+easily manage coordinator and shuffle server clusters.
+
+In addition, based on the operating characteristics of shuffle servers, we hope to achieve safe offline:
+
+1. Before a shuffle server is scaled down or upgraded, it should be added to the Coordinator's blacklist in advance.
+2. After ensuring that the number of remaining applications is 0, allow its corresponding pod to be deleted and removed
+   from the blacklist.
+
+We don't just want to simply pull up the coordinators and shuffle servers, but also ensure that running jobs are not
+affected. Therefore, we decided to develop an operator specifically.
+
+## Goals
+
+Operator will implement the following functions:
+
+1. Normally pull up two coordinator deployments (to ensure active-active) and a shuffle server statefulSet.
+2. Supports replica expansion and upgrade of coordinators and shuffle servers, among which shuffle server also supports
+   grayscale upgrade.
+3. Using the webhook mechanism, before a shuffle server is deleted, add its name to the coordinator's blacklist, and
+   check the number of applications remaining running, and then release the pod deletion request after ensuring safety.
+
+## Design Details
+
+This operator consists of two components: a crd controller and a webhook that admits crd and pod requests.
+
+The crd controller observes the status changes of the crd and controls the workload changes.
+
+The webhook verifies the changes of the crd, and admits the pod deletion request according to whether the number of
+remaining applications is 0.
+
+The webhook will add the pod to be deleted to the coordinator's blacklist. When the pod is actually deleted, the
+controller will remove it from the blacklist.
+
+## CRD Definition
+
+An example of a crd object is as follows:
+
+```yaml
+apiVersion: uniffle.apache.org/v1alpha1
+kind: RemoteShuffleService
+metadata:
+  name: rss-demo
+  namespace: kube-system
+spec:
+  # ConfigMapName indicates configMap name stores configurations of coordinators and shuffle servers.
+  configMapName: rss-demo
+  # Coordinator represents the relevant configuration of the coordinators.
+  coordinator:
+    # Image represents the mirror image used by coordinators.
+    image: ${coordinator-image}
+    # InitContainerImage is optional, mainly for non-root users to initialize host path permissions.
+    initContainerImage: "busybox:latest"
+    # Count is the number of coordinator workloads to be generated.
+    # By default, we will deploy two coordinators to ensure active-active.
+    count: 2
+    # RpcNodePort represents the port required by the rpc protocol of the coordinators,
+    # and the range is the same as the port range of the NodePort type service in kubernetes.
+    # By default, we will deploy two coordinators to ensure active-active.
+    rpcNodePort:
+      - 30001
+      - 30011
+    # httpNodePort represents the port required by the http protocol of the coordinators,
+    # and the range is the same as the port range of the NodePort type service in kubernetes.
+    # By default, we will deploy two coordinators to ensure active-active.
+    httpNodePort:
+      - 30002
+      - 30012
+    # XmxSize indicates the xmx size configured for coordinators.
+    xmxSize: "10G"
+    # ConfigDir records the directory where the configuration of coordinators reside.
+    configDir: "/data/rssadmin/rss/conf"
+    # Replicas field is the replicas of each coordinator's deployment.
+    replicas: 1
+    # ExcludeNodesFilePath indicates exclude nodes file path in coordinators' containers.
+    excludeNodesFilePath: "/data/rssadmin/rss/coo/exclude_nodes"
+    # SecurityContext holds pod-level security attributes and common container settings.
+    securityContext:
+      # RunAsUser specifies the user ID of all processes in coordinator pods.
+      runAsUser: 1000
+      # FsGroup specifies the group ID of the owner of the volume within coordinator pods.
+      fsGroup: 1000
+    # LogHostPath represents the host path used to save logs of coordinators.
+    logHostPath: "/data/logs/rss"
+    # HostPathMounts field indicates host path volumes and their mounting path within coordinators' containers.
+    hostPathMounts:
+      /data/logs/rss: /data/rssadmin/rss/logs
+  # shuffleServer represents the relevant configuration of the shuffleServers
+  shuffleServer:
+    # Sync marks whether the shuffle server needs to be updated or restarted.
+    # When the user needs to update the shuffle servers, it needs to be set to true.
+    # After the update is successful, the controller will modify it to false.
+    sync: true
+    # Replicas field is the replicas of each coordinator's deployment.
+    replicas: 3
+    # Image represents the mirror image used by shuffle servers.
+    image: ${shuffle-server-image}
+```
+
+After a user creates a rss object, the rss-controller component will create the corresponding workloads.
+
+For coordinators, the user directly modifies the rss object, and the controller synchronizes the corresponding state to
+the workloads.
+
+For shuffle servers, only by changing the spec.shuffleServer.sync field to true, the controller will apply the
+corresponding updates to the workloads.
+
+If you want more examples, please read more in [examples](examples.md).
+
+## State Transition
+
+![state transition](../asset/rss-crd-state-transition.png)
diff --git a/docs/operator/examples.md b/docs/operator/examples.md
new file mode 100644
index 00000000..f9cf02cc
--- /dev/null
+++ b/docs/operator/examples.md
@@ -0,0 +1,31 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+# Examples
+
+We need to create configMap first which saves coordinators, shuffleServers and log4j's configuration(we can refer
+to [configuration](../../deploy/kubernetes/operator/examples/configuration.yaml)).
+
+Coordinator is a stateless service, when upgrading, we can directly update the configuration and then update the image.
+
+Shuffle server is a stateful service, and the upgrade operation is more complicated, so we show examples of different
+upgrade modes.
+
+- [Full Upgrade](../../deploy/kubernetes/operator/examples/full-upgrade)
+- [Full Restart](../../deploy/kubernetes/operator/examples/full-restart)
+- [Partition Upgrade](../../deploy/kubernetes/operator/examples/partition-upgrade)
+- [Specific Upgrade](../../deploy/kubernetes/operator/examples/specific-upgrade)
diff --git a/docs/operator/install.md b/docs/operator/install.md
new file mode 100644
index 00000000..bfab3452
--- /dev/null
+++ b/docs/operator/install.md
@@ -0,0 +1,75 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+# Installation
+
+This section shows us how to install operator in our cluster.
+
+## Requirements
+
+1. Kubernetes 1.14+
+2. Kubectl 1.14+
+
+Please make sure the kubectl is properly configured to interact with the Kubernetes environment.
+
+## Preparing Images of Coordinators and Shuffle Servers
+
+Run the following command:
+
+```
+cd /deploy/kubernetes/docker && sh build.sh --registry ${our-registry}
+```
+
+## Creating or Updating CRD
+
+We can refer
+to [crd yaml file](../../deploy/kubernetes/operator/config/crd/bases/uniffle.apache.org_remoteshuffleservices.yaml).
+
+Run the following command:
+
+```
+kubectl apply -f ${crd-yaml-file}
+```
+
+## Setup or Update Uniffle Webhook
+
+We can refer to [webhook yaml file](../../deploy/kubernetes/operator/config/manager/rss-webhook.yaml).
+
+Run the following command:
+
+```
+kubectl apply -f ${webhook-yaml-file}
+```
+
+## Setup or Update Uniffle Controller
+
+We can refer to [controller yaml file](../../deploy/kubernetes/operator/config/manager/rss-controller.yaml).
+
+Run the following command:
+
+```
+kubectl apply -f ${controller-yaml-file}
+```
+
+## How To Use
+
+We can learn more details about usage of CRD
+from [uniffle operator design](design.md).
+
+## Examples
+
+Example uses of CRD have been [provided](examples.md).