You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/20 20:49:09 UTC

[GitHub] [airflow] vikramkoka opened a new pull request #10433: Enhanced the Kubernetes Executor doc

vikramkoka opened a new pull request #10433:
URL: https://github.com/apache/airflow/pull/10433


   Added an updated architecture diagram and enhanced the description for the Kubernetes Executor
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #10433: Enhanced the Kubernetes Executor doc

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #10433:
URL: https://github.com/apache/airflow/pull/10433#discussion_r477423482



##########
File path: docs/executor/kubernetes.rst
##########
@@ -44,15 +44,21 @@ KubernetesExecutor Architecture
 The KubernetesExecutor runs as a process in the Scheduler that only requires access to the Kubernetes API (it does *not* need to run inside of a Kubernetes cluster). The KubernetesExecutor requires a non-sqlite database in the backend, but there are no external brokers or persistent workers needed.
 For these reasons, we recommend the KubernetesExecutor for deployments have long periods of dormancy between DAG execution.
 
+When a DAG submits a task, the KubernetesExecutor requests a worker pod from the Kubernetes API. The worker pod then runs the task, reports the result, and terminates.
 
-.. image:: ../img/k8s-0-worker.jpeg
 
+.. image:: ../img/arch-diag-kubernetes.png

Review comment:
       The source has been uploaded to https://cwiki.apache.org/confluence/download/attachments/158872928/arch-diag-kubernetes.drawio?api=v2 under https://cwiki.apache.org/confluence/display/AIRFLOW/Drawio+Diagrams
   
   That source file contains both the images




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #10433: Enhanced the Kubernetes Executor doc

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #10433:
URL: https://github.com/apache/airflow/pull/10433#discussion_r477423482



##########
File path: docs/executor/kubernetes.rst
##########
@@ -44,15 +44,21 @@ KubernetesExecutor Architecture
 The KubernetesExecutor runs as a process in the Scheduler that only requires access to the Kubernetes API (it does *not* need to run inside of a Kubernetes cluster). The KubernetesExecutor requires a non-sqlite database in the backend, but there are no external brokers or persistent workers needed.
 For these reasons, we recommend the KubernetesExecutor for deployments have long periods of dormancy between DAG execution.
 
+When a DAG submits a task, the KubernetesExecutor requests a worker pod from the Kubernetes API. The worker pod then runs the task, reports the result, and terminates.
 
-.. image:: ../img/k8s-0-worker.jpeg
 
+.. image:: ../img/arch-diag-kubernetes.png

Review comment:
       The source has been uploaded to https://cwiki.apache.org/confluence/download/attachments/158872928/arch-diag-kubernetes.drawio?api=v2 under https://cwiki.apache.org/confluence/display/AIRFLOW/Drawio+Diagrams




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #10433: Enhanced the Kubernetes Executor doc

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #10433:
URL: https://github.com/apache/airflow/pull/10433


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #10433: Enhanced the Kubernetes Executor doc

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #10433:
URL: https://github.com/apache/airflow/pull/10433#discussion_r477424565



##########
File path: docs/executor/kubernetes.rst
##########
@@ -44,15 +44,25 @@ KubernetesExecutor Architecture
 The KubernetesExecutor runs as a process in the Scheduler that only requires access to the Kubernetes API (it does *not* need to run inside of a Kubernetes cluster). The KubernetesExecutor requires a non-sqlite database in the backend, but there are no external brokers or persistent workers needed.
 For these reasons, we recommend the KubernetesExecutor for deployments have long periods of dormancy between DAG execution.
 
+When a DAG submits a task, the KubernetesExecutor requests a worker pod from the Kubernetes API. The worker pod then runs the task, reports the result, and terminates.
 
-.. image:: ../img/k8s-0-worker.jpeg
 
+.. image:: ../img/arch-diag-kubernetes.png
 
-When a DAG submits a task, the KubernetesExecutor requests a worker pod from the Kubernetes API. The worker pod then runs the task, reports the result, and terminates.
 
+In contrast to the Celery Executor, the Kubernetes Executor does not require additional components such as Redis and Flower, but does require the Kubernetes infrastructure.
+
+One example of an Airflow deployment running on a distributed set of five nodes in a Kubernetes cluster is shown below. 
+
+.. image:: ../img/arch-diag-kubernetes2.png
+
+The Kubernetes Executor has an advantage over the Celery Executor in that Pods are only spun up when required for task execution compared to the Celery Executor where the workers are statically configured and ran running all the time, regardless of workloads. However, this could be a disadvantage depending on the latency needs, since a task takes longer to start using the Kubernetes Executor, since it now includes the Pod startup time.

Review comment:
       ```suggestion
   The Kubernetes Executor has an advantage over the Celery Executor in that Pods are only spun up when required for task execution compared to the Celery Executor where the workers are statically configured and are running all the time, regardless of workloads. However, this could be a disadvantage depending on the latency needs, since a task takes longer to start using the Kubernetes Executor, since it now includes the Pod startup time.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #10433: Enhanced the Kubernetes Executor doc

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #10433:
URL: https://github.com/apache/airflow/pull/10433#discussion_r474269862



##########
File path: docs/executor/kubernetes.rst
##########
@@ -44,15 +44,21 @@ KubernetesExecutor Architecture
 The KubernetesExecutor runs as a process in the Scheduler that only requires access to the Kubernetes API (it does *not* need to run inside of a Kubernetes cluster). The KubernetesExecutor requires a non-sqlite database in the backend, but there are no external brokers or persistent workers needed.
 For these reasons, we recommend the KubernetesExecutor for deployments have long periods of dormancy between DAG execution.
 
+When a DAG submits a task, the KubernetesExecutor requests a worker pod from the Kubernetes API. The worker pod then runs the task, reports the result, and terminates.
 
-.. image:: ../img/k8s-0-worker.jpeg
 
+.. image:: ../img/arch-diag-kubernetes.png

Review comment:
       I really appreciate the addition, but I think we should know how to reproduce the image in the future if we have to update it :). I really love things like mermaid - that can generate the image from markdown-ish text description and have everyone contribute to it easily. Maybe we can re-create the graphs with it . See  #10380 - mermaid has the capability of generating nice diagrams from the textual description which I think is crucial to get images that we can update in the future as a community effort.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org