You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/11/01 15:31:46 UTC

[GitHub] [airflow] jedcunningham commented on a change in pull request #19339: Touch up k8s executor doc

jedcunningham commented on a change in pull request #19339:
URL: https://github.com/apache/airflow/pull/19339#discussion_r740292916



##########
File path: docs/apache-airflow/executor/kubernetes.rst
##########
@@ -46,26 +46,26 @@ This command generates the pods as they will be launched in Kubernetes and dumps
 pod_template_file
 #################
 
-As of Airflow 1.10.12, you can now use the ``pod_template_file`` option in the ``kubernetes`` section
+As of Airflow 1.10.12, you can use the ``pod_template_file`` option in the ``kubernetes`` section

Review comment:
       ```suggestion
   As of Airflow 2.0.0, you now use the ``pod_template_file`` option in the ``kubernetes`` section
   ```
   
   Might as well rip off the bandaid - docs are versioned after all.

##########
File path: docs/apache-airflow/executor/kubernetes.rst
##########
@@ -170,6 +164,29 @@ Additionally, the Kubernetes Executor enables specification of additional featur
 .. @enduml
 .. image:: ../img/k8s-happy-path.png
 
+Comparison with CeleryExecutor
+------------------------------
+
+In contrast to CeleryExecutor, KubernetesExecutor does not require additional components such as Redis and Flower, but does require access to Kubernetes cluster.
+
+With KubernetesExecutor, each task runs in its own pod. The pod is created when the task is queued, and terminates when the task completes.
+Historically, in some cases this presented a resource utilization advantage over CeleryExecutor, where you needed a fixed number of
+long-running celery worker pods, whether or not there were tasks to run.
+
+However, the official Apache Airflow helm chart can automatically scale celery workers down to zero based on the number of tasks in the queue,
+so when using the official chart, this is no longer an advantage.
+
+With Celery workers you will tend to have less task latency because the worker pod is already up and running when the task is queued.  On the
+other hand, because multiple tasks are running in the same pod, with Celery you may have to be more mindful about resource utilization
+in your task design, particularly memory consumption.
+
+An positive of KubernetesExecutor is if you have long-running tasks.  With KubernetesExecutor, if you do a deployment while a task is running,
+the task will keep running until it completes (or times out, etc).  But with CeleryExecutor, provided you have set a grace period, the

Review comment:
       ```suggestion
   An positive of KubernetesExecutor is if you have long-running tasks. With KubernetesExecutor, if you do a deployment while a task is running,
   the task will keep running until it completes (or times out, etc). But with CeleryExecutor, provided you have set a grace period, the
   ```

##########
File path: docs/apache-airflow/executor/kubernetes.rst
##########
@@ -46,26 +46,26 @@ This command generates the pods as they will be launched in Kubernetes and dumps
 pod_template_file
 #################
 
-As of Airflow 1.10.12, you can now use the ``pod_template_file`` option in the ``kubernetes`` section
+As of Airflow 1.10.12, you can use the ``pod_template_file`` option in the ``kubernetes`` section
 of the ``airflow.cfg`` file to form the basis of your KubernetesExecutor pods. This process is faster to execute
-and easier to modify.
+and easier to modify compared with the legacy configuration approach.
 
-We include multiple examples of working pod operators below, but we would also like to explain a few necessary components
-if you want to customize your template files. As long as you have these components, every other element
-in the template is customizable.
+We include multiple examples of working pod operators below, but we would also like to identify a few components
+that you must include if you want to provide a custom template file. Aside from these components, every other
+element in the template is customizable.
 
-1. Airflow will overwrite the base container image and the pod name
+1. Airflow will overwrite the base container ``image`` and the pod's ``metadata.name``.
 
 There are two points where Airflow potentially overwrites the base image: in the ``airflow.cfg``
 or the ``pod_override`` (discussed below) setting. This value is overwritten to ensure that users do
 not need to update multiple template files every time they upgrade their docker image. The other field
-that Airflow overwrites is the ``pod.metadata.name`` field. This field has to be unique across all pods,
+that Airflow overwrites is the ``metadata.name`` field. This field has to be unique across all pods,
 so we generate these names dynamically before launch.
 
-It's important to note while Airflow overwrites these fields, they **can not be left blank**.
-If these fields do not exist, kubernetes can not load the yaml into a Kubernetes V1Pod.
+It's important to note although Airflow overwrites these fields, they **can not be left blank** in the template.
+If these fields are not present in the template, kubernetes can not load the yaml into a Kubernetes V1Pod.

Review comment:
       ```suggestion
   If these fields are not present in the template, kubernetes can not load the yaml into a Kubernetes ``V1Pod``.
   ```

##########
File path: docs/apache-airflow/executor/kubernetes.rst
##########
@@ -21,22 +21,22 @@
 Kubernetes Executor
 ===================
 
-The kubernetes executor is introduced in Apache Airflow 1.10.0. The Kubernetes executor will create a new pod for every task instance.
+The Kubernetes executor runs each task instance in its own pod on a Kubernetes cluster.
 
-Example kubernetes files are available at ``scripts/in_container/kubernetes/app/{secrets,volumes,postgres}.yaml`` in the source distribution (please note that these examples are not ideal for production environments).
-The volumes are optional and depend on your configuration. There are two volumes available:
+Use of persistent volumes is optional and depends on your configuration. There are two types of volumes you may configure:
 
 - **Dags**:
 
-  - By storing dags onto persistent disk, it will be made available to all workers
+  - By storing dags on a persistent volume, it will be made available to all workers

Review comment:
       Totally possible to have a PV that is populated by CICD. That said, this whole section until `pod_template_file` feels a little out of place to me for KubenetesExecutor docs 🤷‍♂️. This likely made more sense when those example yaml files did exist.

##########
File path: docs/apache-airflow/executor/kubernetes.rst
##########
@@ -170,6 +164,29 @@ Additionally, the Kubernetes Executor enables specification of additional featur
 .. @enduml
 .. image:: ../img/k8s-happy-path.png
 
+Comparison with CeleryExecutor
+------------------------------
+
+In contrast to CeleryExecutor, KubernetesExecutor does not require additional components such as Redis and Flower, but does require access to Kubernetes cluster.
+
+With KubernetesExecutor, each task runs in its own pod. The pod is created when the task is queued, and terminates when the task completes.
+Historically, in some cases this presented a resource utilization advantage over CeleryExecutor, where you needed a fixed number of
+long-running celery worker pods, whether or not there were tasks to run.
+
+However, the official Apache Airflow helm chart can automatically scale celery workers down to zero based on the number of tasks in the queue,
+so when using the official chart, this is no longer an advantage.
+
+With Celery workers you will tend to have less task latency because the worker pod is already up and running when the task is queued.  On the

Review comment:
       ```suggestion
   With Celery workers you will tend to have less task latency because the worker pod is already up and running when the task is queued. On the
   ```

##########
File path: docs/apache-airflow/executor/kubernetes.rst
##########
@@ -205,7 +222,7 @@ By monitoring this stream, the KubernetesExecutor can discover that the worker c
 But What About Cases Where the Scheduler Pod Crashes?
 =====================================================
 
-In cases of scheduler crashes, we can completely rebuild the state of the scheduler using the watcher's ``resourceVersion``.
+In cases of scheduler crashes, the scheduler will recover its state using the watcher's ``resourceVersion``.
 
 When monitoring the Kubernetes cluster's watcher thread, each event has a monotonically rising number called a resourceVersion.
 Every time the executor reads a resourceVersion, the executor stores the latest value in the backend database.

Review comment:
       ```suggestion
   When monitoring the Kubernetes cluster's watcher thread, each event has a monotonically rising number called a ``resourceVersion``.
   Every time the executor reads a ``resourceVersion``, the executor stores the latest value in the backend database.
   ```
   There's one more on the following line, but I can't suggest it here.

##########
File path: docs/apache-airflow/executor/kubernetes.rst
##########
@@ -170,6 +164,29 @@ Additionally, the Kubernetes Executor enables specification of additional featur
 .. @enduml
 .. image:: ../img/k8s-happy-path.png
 
+Comparison with CeleryExecutor
+------------------------------
+
+In contrast to CeleryExecutor, KubernetesExecutor does not require additional components such as Redis and Flower, but does require access to Kubernetes cluster.
+
+With KubernetesExecutor, each task runs in its own pod. The pod is created when the task is queued, and terminates when the task completes.
+Historically, in some cases this presented a resource utilization advantage over CeleryExecutor, where you needed a fixed number of
+long-running celery worker pods, whether or not there were tasks to run.
+
+However, the official Apache Airflow helm chart can automatically scale celery workers down to zero based on the number of tasks in the queue,
+so when using the official chart, this is no longer an advantage.
+
+With Celery workers you will tend to have less task latency because the worker pod is already up and running when the task is queued.  On the
+other hand, because multiple tasks are running in the same pod, with Celery you may have to be more mindful about resource utilization
+in your task design, particularly memory consumption.
+
+An positive of KubernetesExecutor is if you have long-running tasks.  With KubernetesExecutor, if you do a deployment while a task is running,
+the task will keep running until it completes (or times out, etc).  But with CeleryExecutor, provided you have set a grace period, the
+task will only keep running up until the grace period has elapsed, at which time the task will be terminated.
+
+Finally, note that it doesn't have to be either-or.  With CeleryKubernetesExecutor you have the best of both worlds.  Tasks by default will
+go to Celery workers.  But if you want a task to run with KubernetesExecutor, you send it to the ``kubernetes`` queue and it will

Review comment:
       ```suggestion
   Finally, note that it doesn't have to be either-or as with CeleryKubernetesExecutor you have the best of both worlds. Tasks by default will
   go to Celery workers, however if you want a task to run with KubernetesExecutor, you send it to the ``kubernetes`` queue and it will
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org