You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/08 08:38:43 UTC

[GitHub] [airflow] Dr-Denzy opened a new pull request #15270: The KubernetesPodOperator Comprehensive Guide

Dr-Denzy opened a new pull request #15270:
URL: https://github.com/apache/airflow/pull/15270


   The guide aims to alleviate most of the common challenges users
   encounter when using KubernetesPodOperator. It explains the underlying
   concept of the operator. And It also contains several 'HOWTOs' and best
   practices for using the operator.
   
   Ultimately, this guide will help users gain better understanding of how
   Airflow tasks are executed in pods in kubernetes clusters.
   
   closes: #8970
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on a change in pull request #15270: The KubernetesPodOperator Comprehensive Guide

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on a change in pull request #15270:
URL: https://github.com/apache/airflow/pull/15270#discussion_r609907818



##########
File path: docs/apache-airflow-providers-cncf-kubernetes/operators.rst
##########
@@ -35,23 +36,52 @@ you to create and run Pods on a Kubernetes cluster.
   :ref:`GKEStartPodOperator <howto/operator:GKEStartPodOperator>` operator as it
   simplifies the Kubernetes authorization process.
 
-.. note::
-  The :doc:`Kubernetes executor <apache-airflow:executor/kubernetes>` is **not** required to use this operator.
-
 How does this operator work?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` uses the
 Kubernetes API to launch a pod in a Kubernetes cluster. By supplying an
 image URL and a command with optional arguments, the operator uses the Kube Python Client to generate a Kubernetes API
 request that dynamically launches those individual pods.
+Under the hood, :class:`~airflow.providers.cncf.kubernetes.hooks.kubernetes.KubernetesHook` creates the connection to
+the Kubernetes API server.
+Essentially, KubernetesPodOperator packages all the supplied parameters into a request object which is then shipped off
+to Kubernetes API Server so that the pod to execute your task is created. Whenever a task is triggered, a new worker pod
+is spun up to execute that task. And once the task is completed, by default the worker pod is deleted
+and the resources reclaimed.
 Users can specify a kubeconfig file using the ``config_file`` parameter, otherwise the operator will default
 to ``~/.kube/config``.
 
-The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` enables task-level
-resource configuration and is optimal for custom Python
-dependencies that are not available through the public PyPI repository. It also allows users to supply a template
-YAML file using the ``pod_template_file`` parameter.
-Ultimately, it allows Airflow to act a job orchestrator - no matter the language those jobs are written in.
+How does the KubernetesPodOperator differ from the KubernetesExecutor
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. note::
+  The :doc:`Kubernetes executor <apache-airflow:executor/kubernetes>` is **not** required to use this operator.
+
+
+What problems does KubernetesPodOperator solve?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+* The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` enables task-level
+  resource configuration and is optimal for custom Python dependencies that are not available through the
+  public PyPI repository.
+
+* It allows users to supply a template YAML file using the ``pod_template_file`` parameter.
+
+* It allows isolation of deployments, configuration reuse, delegation and better management of secrets.
+
+* Ultimately, it allows Airflow to act a job orchestrator - no matter the language those jobs are written in.

Review comment:
       I think a nice succinct way to think of it is: 'Easy way to run any image on Kubernetes as a task', and I feel like this should be first! (related to the orchestrator and any language points imo)

##########
File path: docs/apache-airflow-providers-cncf-kubernetes/operators.rst
##########
@@ -35,23 +36,52 @@ you to create and run Pods on a Kubernetes cluster.
   :ref:`GKEStartPodOperator <howto/operator:GKEStartPodOperator>` operator as it
   simplifies the Kubernetes authorization process.
 
-.. note::
-  The :doc:`Kubernetes executor <apache-airflow:executor/kubernetes>` is **not** required to use this operator.
-
 How does this operator work?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` uses the
 Kubernetes API to launch a pod in a Kubernetes cluster. By supplying an
 image URL and a command with optional arguments, the operator uses the Kube Python Client to generate a Kubernetes API
 request that dynamically launches those individual pods.
+Under the hood, :class:`~airflow.providers.cncf.kubernetes.hooks.kubernetes.KubernetesHook` creates the connection to
+the Kubernetes API server.

Review comment:
       ```suggestion
   ```
   
   I think this is redundant with the beginning of the this paragraph.

##########
File path: docs/apache-airflow-providers-cncf-kubernetes/connections/kubernetes.rst
##########
@@ -20,7 +20,9 @@
 Kubernetes cluster Connection
 =============================
 
-The Kubernetes cluster Connection type enables connection to a Kubernetes cluster by :class:`~airflow.providers.cncf.kubernetes.operators.spark_kubernetes.SparkKubernetesOperator` tasks. They are not used by ``KubernetesPodOperator`` tasks.
+The Kubernetes cluster Connection type enables connection to a Kubernetes cluster by
+:class:`~airflow.providers.cncf.kubernetes.operators.spark_kubernetes.SparkKubernetesOperator` tasks.
+They are not used by ``KubernetesPodOperator`` tasks.

Review comment:
       ```suggestion
   They are not used by :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator tasks.
   ```
   
   Probably worth linking it.

##########
File path: docs/apache-airflow-providers-cncf-kubernetes/operators.rst
##########
@@ -19,11 +19,12 @@
 
 .. _howto/operator:KubernetesPodOperator:
 
-KubernetesPodOperator
-=====================
+KubernetesPodOperator - The Comprehensive Guide
+===============================================
 
 The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` allows
-you to create and run Pods on a Kubernetes cluster.
+you to create and run Pods on a Kubernetes cluster. The task wrapped in the KubernetesPodOperator is then executed in
+these pods.

Review comment:
       ```suggestion
   you to create and run a Pod on a Kubernetes cluster as a task.
   ```
   
   Maybe? 

##########
File path: docs/apache-airflow-providers-cncf-kubernetes/operators.rst
##########
@@ -35,23 +36,52 @@ you to create and run Pods on a Kubernetes cluster.
   :ref:`GKEStartPodOperator <howto/operator:GKEStartPodOperator>` operator as it
   simplifies the Kubernetes authorization process.
 
-.. note::
-  The :doc:`Kubernetes executor <apache-airflow:executor/kubernetes>` is **not** required to use this operator.
-
 How does this operator work?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` uses the
 Kubernetes API to launch a pod in a Kubernetes cluster. By supplying an
 image URL and a command with optional arguments, the operator uses the Kube Python Client to generate a Kubernetes API
 request that dynamically launches those individual pods.
+Under the hood, :class:`~airflow.providers.cncf.kubernetes.hooks.kubernetes.KubernetesHook` creates the connection to
+the Kubernetes API server.
+Essentially, KubernetesPodOperator packages all the supplied parameters into a request object which is then shipped off
+to Kubernetes API Server so that the pod to execute your task is created. Whenever a task is triggered, a new worker pod
+is spun up to execute that task. And once the task is completed, by default the worker pod is deleted
+and the resources reclaimed.

Review comment:
       I'd be careful with terminology here. "Worker pods" are a KubeExecutor thing, not KPO.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal edited a comment on pull request #15270: The KubernetesPodOperator Comprehensive Guide

Posted by GitBox <gi...@apache.org>.
eladkal edited a comment on pull request #15270:
URL: https://github.com/apache/airflow/pull/15270#issuecomment-816126271


   Does this PR aim to enhance/replace https://github.com/apache/airflow/pull/13405 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on pull request #15270: The KubernetesPodOperator Comprehensive Guide

Posted by GitBox <gi...@apache.org>.
eladkal commented on pull request #15270:
URL: https://github.com/apache/airflow/pull/15270#issuecomment-816126271


   Does this PR aims to enhance/replace https://github.com/apache/airflow/pull/13405 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Dr-Denzy commented on pull request #15270: The KubernetesPodOperator Comprehensive Guide

Posted by GitBox <gi...@apache.org>.
Dr-Denzy commented on pull request #15270:
URL: https://github.com/apache/airflow/pull/15270#issuecomment-817630337


   > Does this PR aim to enhance/replace #13405 ?
   
   The aim is to enhance #13405 in a way to provides a comprehensive doc for users.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Dr-Denzy commented on a change in pull request #15270: The KubernetesPodOperator Comprehensive Guide

Posted by GitBox <gi...@apache.org>.
Dr-Denzy commented on a change in pull request #15270:
URL: https://github.com/apache/airflow/pull/15270#discussion_r609462924



##########
File path: docs/apache-airflow-providers-cncf-kubernetes/operators.rst
##########
@@ -73,14 +103,21 @@ and type safety. While we have removed almost all Kubernetes convenience classes
     :start-after: [START howto_operator_k8s_cluster_resources]
     :end-before: [END howto_operator_k8s_cluster_resources]
 
-Difference between ``KubernetesPodOperator`` and Kubernetes object spec
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` can be considered
-a substitute for a Kubernetes object spec definition that is able
-to be run in the Airflow scheduler in the DAG context. If using the operator, there is no need to create the
-equivalent YAML/JSON object spec for the Pod you would like to run.
-The YAML file can still be provided with the ``pod_template_file`` or even the Pod Spec constructed in Python via
-the ``full_pod_spec`` parameter which requires a Kubernetes ``V1Pod``.
+
+How to use KubernetesPodOperator with YAML file/JSON spec?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ * WIP

Review comment:
       Yes. It is more of a draft now - lots more write-ups will be added.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed pull request #15270: The KubernetesPodOperator Comprehensive Guide

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #15270:
URL: https://github.com/apache/airflow/pull/15270


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #15270: The KubernetesPodOperator Comprehensive Guide

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #15270:
URL: https://github.com/apache/airflow/pull/15270#discussion_r609459309



##########
File path: docs/apache-airflow-providers-cncf-kubernetes/operators.rst
##########
@@ -73,14 +103,21 @@ and type safety. While we have removed almost all Kubernetes convenience classes
     :start-after: [START howto_operator_k8s_cluster_resources]
     :end-before: [END howto_operator_k8s_cluster_resources]
 
-Difference between ``KubernetesPodOperator`` and Kubernetes object spec
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` can be considered
-a substitute for a Kubernetes object spec definition that is able
-to be run in the Airflow scheduler in the DAG context. If using the operator, there is no need to create the
-equivalent YAML/JSON object spec for the Pod you would like to run.
-The YAML file can still be provided with the ``pod_template_file`` or even the Pod Spec constructed in Python via
-the ``full_pod_spec`` parameter which requires a Kubernetes ``V1Pod``.
+
+How to use KubernetesPodOperator with YAML file/JSON spec?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ * WIP

Review comment:
       Does these `WIP`s mean this PR is still a draft at this moment? If so, let's convert it to draft.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15270: The KubernetesPodOperator Comprehensive Guide

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15270:
URL: https://github.com/apache/airflow/pull/15270#issuecomment-850022638


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org