You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/31 01:19:43 UTC

[GitHub] [airflow] tanjinP opened a new pull request #9079: WIP: [8970] Improve KubernetesPodOperator guide

tanjinP opened a new pull request #9079:
URL: https://github.com/apache/airflow/pull/9079


   Closes #8970 
   
   Currently have an outline of the items mentioned (`docs/howto/operator/kubernetes.rst`) in the issue and looking for feedback if this is a good approach to take to build out this documentation.
   
   There are 5 subheadings, we mention what the content will be, if it includes a code example (currently there are none), and potential references to link (will avoid the blog/guides in favor of official documentation, like Kubernetes)
    
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Target Github ISSUE in description if exists
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9079: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#issuecomment-655724857


   I would be happy to add information that Kubernetes Executor is not required for Kubernetes Pod Operator. Two sentences about the behavior of KubernetesPodOperator when installing into clusters would also be helpful, but we can add them in a separate change. 
   
   My opinion is based on the Slack discussion. This is an important feedback on what users are looking for in the documentation.
   https://apache-airflow.slack.com/archives/CCV3FV9KL/p1594235643204100


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dossett commented on a change in pull request #9079: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
dossett commented on a change in pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#discussion_r451802108



##########
File path: docs/howto/operator/kubernetes.rst
##########
@@ -22,150 +22,96 @@
 KubernetesPodOperator
 =====================
 
+The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` allows
+you to create and run Pods on a Kubernetes cluster.
+
+.. contents::
+  :depth: 1
+  :local:
+
 .. note::
   If you use `Google Kubernetes Engine <https://cloud.google.com/kubernetes-engine/>`__, consider
   using the
   :ref:`GKEStartPodOperator <howto/operator:GKEStartPodOperator>` operator as it

Review comment:
       This link does currently appear to be working.  should it be updated?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9079: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#discussion_r452806556



##########
File path: docs/howto/operator/kubernetes.rst
##########
@@ -22,150 +22,96 @@
 KubernetesPodOperator
 =====================
 
+The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` allows
+you to create and run Pods on a Kubernetes cluster.
+
+.. contents::
+  :depth: 1
+  :local:
+
 .. note::
   If you use `Google Kubernetes Engine <https://cloud.google.com/kubernetes-engine/>`__, consider
   using the
   :ref:`GKEStartPodOperator <howto/operator:GKEStartPodOperator>` operator as it

Review comment:
       This link works. However, it is not available in the source code, but works in the HTML code generated. Please look at: https://airflow.readthedocs.io/en/latest/howto/operator/kubernetes.html




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on a change in pull request #9079: WIP: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
dimberman commented on a change in pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#discussion_r449967800



##########
File path: docs/howto/operator/kubernetes.rst
##########
@@ -25,180 +25,90 @@ KubernetesPodOperator
 The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` allows
 you to create and run Pods on a Kubernetes cluster.
 
+.. contents::
+  :depth: 1
+  :local:
+
+.. note::
+  If you use `Google Kubernetes Engine <https://cloud.google.com/kubernetes-engine/>`__, consider
+  using the
+  :ref:`GKEStartPodOperator <howto/operator:GKEStartPodOperator>` operator as it
+  simplifies the Kubernetes authorization process.
+
 How does this operator work?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: include a definition of how the operator works, highlight dependency management gains, language agnosticism,
-and a quick overview of a k8s pod
-Go through the lifecycle of the pod in the context of a single task being executed on Airflow:
-start, monitor, end, delete (if specified). Mention where the pods can be run (in Airflow cluster if already
-hosted on k8s or a different one if kube config is available)
-
-Reference content from:
-- https://cloud.google.com/composer/docs/how-to/using/using-kubernetes-pod-operator
-- https://www.astronomer.io/docs/kubepodoperator/
-- https://medium.com/bluecore-engineering/were-all-using-airflow-wrong-and-how-to-fix-it-a56f14cb0753
-
-How to define Configurations (ConfigMaps and Secrets)?
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: define general best practices from here: https://kubernetes.io/docs/concepts/configuration/overview/
-Have examples with `airflow.kubernetes.secret.Secret` and `airflow.kubernetes.volume.Volume`
+The ``KubernetesPodOperator`` is able to natively launch a Kubernetes Pod to run an individual task -

Review comment:
       The ``KubernetesPodOperato`` uses the Kubernetes API to launch a pod in a Kubernetes cluster.  By supplying an image URL and a bash command, the operator uses the Kube Python Client...

##########
File path: docs/howto/operator/kubernetes.rst
##########
@@ -25,180 +25,90 @@ KubernetesPodOperator
 The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` allows
 you to create and run Pods on a Kubernetes cluster.
 
+.. contents::
+  :depth: 1
+  :local:
+
+.. note::
+  If you use `Google Kubernetes Engine <https://cloud.google.com/kubernetes-engine/>`__, consider
+  using the
+  :ref:`GKEStartPodOperator <howto/operator:GKEStartPodOperator>` operator as it
+  simplifies the Kubernetes authorization process.
+
 How does this operator work?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: include a definition of how the operator works, highlight dependency management gains, language agnosticism,
-and a quick overview of a k8s pod
-Go through the lifecycle of the pod in the context of a single task being executed on Airflow:
-start, monitor, end, delete (if specified). Mention where the pods can be run (in Airflow cluster if already
-hosted on k8s or a different one if kube config is available)
-
-Reference content from:
-- https://cloud.google.com/composer/docs/how-to/using/using-kubernetes-pod-operator
-- https://www.astronomer.io/docs/kubepodoperator/
-- https://medium.com/bluecore-engineering/were-all-using-airflow-wrong-and-how-to-fix-it-a56f14cb0753
-
-How to define Configurations (ConfigMaps and Secrets)?
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: define general best practices from here: https://kubernetes.io/docs/concepts/configuration/overview/
-Have examples with `airflow.kubernetes.secret.Secret` and `airflow.kubernetes.volume.Volume`
+The ``KubernetesPodOperator`` is able to natively launch a Kubernetes Pod to run an individual task -
+and terminate that pod when the task is completed. The operator uses the Kube Python Client to generate a
+Kubernetes API request that dynamically launches those individual pods. The connection to the client is based

Review comment:
       Users can specify a kubeconfig file using the ``config_file`` parameter, otherwise the operator will default to ``~/.kube/config``

##########
File path: docs/howto/operator/kubernetes.rst
##########
@@ -25,180 +25,90 @@ KubernetesPodOperator
 The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` allows
 you to create and run Pods on a Kubernetes cluster.
 
+.. contents::
+  :depth: 1
+  :local:
+
+.. note::
+  If you use `Google Kubernetes Engine <https://cloud.google.com/kubernetes-engine/>`__, consider
+  using the
+  :ref:`GKEStartPodOperator <howto/operator:GKEStartPodOperator>` operator as it
+  simplifies the Kubernetes authorization process.
+
 How does this operator work?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: include a definition of how the operator works, highlight dependency management gains, language agnosticism,
-and a quick overview of a k8s pod
-Go through the lifecycle of the pod in the context of a single task being executed on Airflow:
-start, monitor, end, delete (if specified). Mention where the pods can be run (in Airflow cluster if already
-hosted on k8s or a different one if kube config is available)
-
-Reference content from:
-- https://cloud.google.com/composer/docs/how-to/using/using-kubernetes-pod-operator
-- https://www.astronomer.io/docs/kubepodoperator/
-- https://medium.com/bluecore-engineering/were-all-using-airflow-wrong-and-how-to-fix-it-a56f14cb0753
-
-How to define Configurations (ConfigMaps and Secrets)?
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: define general best practices from here: https://kubernetes.io/docs/concepts/configuration/overview/
-Have examples with `airflow.kubernetes.secret.Secret` and `airflow.kubernetes.volume.Volume`
+The ``KubernetesPodOperator`` is able to natively launch a Kubernetes Pod to run an individual task -
+and terminate that pod when the task is completed. The operator uses the Kube Python Client to generate a
+Kubernetes API request that dynamically launches those individual pods. The connection to the client is based
+on the Kubernetes Configuration file which is either specified directly to the Pod or retrieved from the
+Airflow Configuration.
+
+The ``KubernetesPodOperator`` enables task-level resource configuration and is optimal for custom Python
+dependencies that are not available through the public PyPI repository.
+Ultimately, it allows Airflow to act a job orchestrator - no matter the language those jobs are written in.

Review comment:
       The ```KubernetesPodOperator`` also allows users to supply a template yaml file using the ``pod_template_file`` configuration.

##########
File path: docs/howto/operator/kubernetes.rst
##########
@@ -25,180 +25,90 @@ KubernetesPodOperator
 The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` allows
 you to create and run Pods on a Kubernetes cluster.
 
+.. contents::
+  :depth: 1
+  :local:
+
+.. note::
+  If you use `Google Kubernetes Engine <https://cloud.google.com/kubernetes-engine/>`__, consider
+  using the
+  :ref:`GKEStartPodOperator <howto/operator:GKEStartPodOperator>` operator as it
+  simplifies the Kubernetes authorization process.
+
 How does this operator work?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: include a definition of how the operator works, highlight dependency management gains, language agnosticism,
-and a quick overview of a k8s pod
-Go through the lifecycle of the pod in the context of a single task being executed on Airflow:
-start, monitor, end, delete (if specified). Mention where the pods can be run (in Airflow cluster if already
-hosted on k8s or a different one if kube config is available)
-
-Reference content from:
-- https://cloud.google.com/composer/docs/how-to/using/using-kubernetes-pod-operator
-- https://www.astronomer.io/docs/kubepodoperator/
-- https://medium.com/bluecore-engineering/were-all-using-airflow-wrong-and-how-to-fix-it-a56f14cb0753
-
-How to define Configurations (ConfigMaps and Secrets)?
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: define general best practices from here: https://kubernetes.io/docs/concepts/configuration/overview/
-Have examples with `airflow.kubernetes.secret.Secret` and `airflow.kubernetes.volume.Volume`
+The ``KubernetesPodOperator`` is able to natively launch a Kubernetes Pod to run an individual task -
+and terminate that pod when the task is completed. The operator uses the Kube Python Client to generate a
+Kubernetes API request that dynamically launches those individual pods. The connection to the client is based
+on the Kubernetes Configuration file which is either specified directly to the Pod or retrieved from the
+Airflow Configuration.
+
+The ``KubernetesPodOperator`` enables task-level resource configuration and is optimal for custom Python
+dependencies that are not available through the public PyPI repository.
+Ultimately, it allows Airflow to act a job orchestrator - no matter the language those jobs are written in.
+
+How to use cluster ConfigMaps, Secrets, and Volumes with Pod?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Kubernetes cluster resources such as ConfigMaps, Secrets, and Volumes can be used with a Pod to be launched.
+Utilize the Airflow Kubernetes model classes such as:
+:class:`~airflow.kubernetes.secret.Secret`
+or
+:class:`~airflow.kubernetes.volume.Volume`
+or
+:class:`~airflow.kubernetes.volume_mount.VolumeMount`
+to do this (as well as standard Python dictionaries). These can they be specified in the appropriate parameters
+when declaring the Pod task.
+
+.. exampleinclude:: ../../../airflow/providers/cncf/kubernetes/example_dags/example_kubernetes.py
+    :language: python
+    :start-after: [START howto_operator_k8s_cluster_resources]
+    :end-before: [END howto_operator_k8s_cluster_resources]
 
 Difference between ``KubernetesPodOperator`` and Kubernetes object spec
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: Have a definition of the KubernetesPodOperator and have the equivalent YAML spec to point out
-the similarities and differences
-Bonus include a JSON equivalent as well
+The ``KubernetesPodOperator`` can be considered a substitute for a Kubernetes object spec definition that is able
+to be run in the Airflow scheduler in the DAG context. If using the operator, there is no need to create the
+equivalent YAML/JSON object spec for the Pod you would like to run.
 
 How to use private images (container registry)?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: mention ECR, GCR, Quay, as options for image source and how to set up secret in Airflow to access a
-registry other than Docker Hub
-Reference content from: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
-Include simple example of a private registry image with the secret
+By default, the ``KubernetesPodOperator`` will look for images hosted publicly on Dockerhub.
+If you want to pull images from a private registry (such as ECR, GCR, Quay, or others), you must create a

Review comment:
       To pull images from a private registry*

##########
File path: docs/howto/operator/kubernetes.rst
##########
@@ -25,180 +25,90 @@ KubernetesPodOperator
 The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` allows
 you to create and run Pods on a Kubernetes cluster.
 
+.. contents::
+  :depth: 1
+  :local:
+
+.. note::
+  If you use `Google Kubernetes Engine <https://cloud.google.com/kubernetes-engine/>`__, consider
+  using the
+  :ref:`GKEStartPodOperator <howto/operator:GKEStartPodOperator>` operator as it
+  simplifies the Kubernetes authorization process.
+
 How does this operator work?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: include a definition of how the operator works, highlight dependency management gains, language agnosticism,
-and a quick overview of a k8s pod
-Go through the lifecycle of the pod in the context of a single task being executed on Airflow:
-start, monitor, end, delete (if specified). Mention where the pods can be run (in Airflow cluster if already
-hosted on k8s or a different one if kube config is available)
-
-Reference content from:
-- https://cloud.google.com/composer/docs/how-to/using/using-kubernetes-pod-operator
-- https://www.astronomer.io/docs/kubepodoperator/
-- https://medium.com/bluecore-engineering/were-all-using-airflow-wrong-and-how-to-fix-it-a56f14cb0753
-
-How to define Configurations (ConfigMaps and Secrets)?
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: define general best practices from here: https://kubernetes.io/docs/concepts/configuration/overview/
-Have examples with `airflow.kubernetes.secret.Secret` and `airflow.kubernetes.volume.Volume`
+The ``KubernetesPodOperator`` is able to natively launch a Kubernetes Pod to run an individual task -
+and terminate that pod when the task is completed. The operator uses the Kube Python Client to generate a
+Kubernetes API request that dynamically launches those individual pods. The connection to the client is based
+on the Kubernetes Configuration file which is either specified directly to the Pod or retrieved from the
+Airflow Configuration.
+
+The ``KubernetesPodOperator`` enables task-level resource configuration and is optimal for custom Python
+dependencies that are not available through the public PyPI repository.
+Ultimately, it allows Airflow to act a job orchestrator - no matter the language those jobs are written in.
+
+How to use cluster ConfigMaps, Secrets, and Volumes with Pod?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Kubernetes cluster resources such as ConfigMaps, Secrets, and Volumes can be used with a Pod to be launched.
+Utilize the Airflow Kubernetes model classes such as:
+:class:`~airflow.kubernetes.secret.Secret`
+or
+:class:`~airflow.kubernetes.volume.Volume`
+or
+:class:`~airflow.kubernetes.volume_mount.VolumeMount`
+to do this (as well as standard Python dictionaries). These can they be specified in the appropriate parameters
+when declaring the Pod task.
+
+.. exampleinclude:: ../../../airflow/providers/cncf/kubernetes/example_dags/example_kubernetes.py
+    :language: python
+    :start-after: [START howto_operator_k8s_cluster_resources]
+    :end-before: [END howto_operator_k8s_cluster_resources]
 
 Difference between ``KubernetesPodOperator`` and Kubernetes object spec
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: Have a definition of the KubernetesPodOperator and have the equivalent YAML spec to point out
-the similarities and differences
-Bonus include a JSON equivalent as well
+The ``KubernetesPodOperator`` can be considered a substitute for a Kubernetes object spec definition that is able
+to be run in the Airflow scheduler in the DAG context. If using the operator, there is no need to create the
+equivalent YAML/JSON object spec for the Pod you would like to run.
 
 How to use private images (container registry)?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: mention ECR, GCR, Quay, as options for image source and how to set up secret in Airflow to access a
-registry other than Docker Hub
-Reference content from: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
-Include simple example of a private registry image with the secret
+By default, the ``KubernetesPodOperator`` will look for images hosted publicly on Dockerhub.
+If you want to pull images from a private registry (such as ECR, GCR, Quay, or others), you must create a
+Kubernetes Secret that represents the credentials for accessing images from the private registry that is ultimately
+specified in the ``image_pull_secrets`` parameter.
+
+Create the Secret using ``kubectl``:
+
+.. code-block:: none
+
+    kubectl create secret docker-registry testquay \
+        --docker-server=quay.io \
+        --docker-username=<Profile name> \
+        --docker-password=<password>
+
+Then use it in your pod like so:
+
+.. exampleinclude:: ../../../airflow/providers/cncf/kubernetes/example_dags/example_kubernetes.py
+    :language: python
+    :start-after: [START howto_operator_k8s_private_image]
+    :end-before: [END howto_operator_k8s_private_image]
 
 How does XCom work?
 ^^^^^^^^^^^^^^^^^^^
-TODO: walk through workflow of pushing and pulling from operator
-Reference content from: https://www.aylakhan.tech/?p=725
-Include example of this in action (already one in the GKE example which we can borrow)
-
-
-* Launches a Docker image as a Kubernetes Pod to execute an individual Airflow
-  task via a Kubernetes API request, using the
-  `Kubernetes Python Client <https://github.com/kubernetes-client/python>`_
-* Terminate the pod when the task is completed
-* Works with any Airflow Executor
-* Allows Airflow to act a job orchestrator for a Docker container,
-  no matter the language the job was written in
-* Enables task-level resource configuration
-* Allow you to pass Kubernetes specific parameters into the task
-
-.. code-block:: python
-
-    import kubernetes.client.models as k8s
-
-    from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
-    from airflow.kubernetes.secret import Secret
-    from airflow.kubernetes.volume import Volume
-    from airflow.kubernetes.volume_mount import VolumeMount
-    from airflow.kubernetes.pod import Port
-
-
-    secret_file = Secret('volume', '/etc/sql_conn', 'airflow-secrets', 'sql_alchemy_conn')
-    secret_env  = Secret('env', 'SQL_CONN', 'airflow-secrets', 'sql_alchemy_conn')
-    secret_all_keys  = Secret('env', None, 'airflow-secrets-2')
-    volume_mount = VolumeMount('test-volume',
-                                mount_path='/root/mount_file',
-                                sub_path=None,
-                                read_only=True)
-    port = Port('http', 80)
-    configmaps = ['test-configmap-1', 'test-configmap-2']
-
-    volume_config= {
-        'persistentVolumeClaim':
-          {
-            'claimName': 'test-volume'
-          }
-        }
-    volume = Volume(name='test-volume', configs=volume_config)
-
-    init_container_volume_mounts = [k8s.V1VolumeMount(
-      mount_path='/etc/foo',
-      name='test-volume',
-      sub_path=None,
-      read_only=True
-    )]
-
-    init_environments = [k8s.V1EnvVar(
-      name='key1',
-      value='value1'
-    ), k8s.V1EnvVar(
-      name='key2',
-      value='value2'
-    )]
-
-    init_container = k8s.V1Container(
-      name="init-container",
-      image="ubuntu:16.04",
-      env=init_environments,
-      volume_mounts=init_container_volume_mounts,
-      command=["bash", "-cx"],
-      args=["echo 10"]
-    )
-
-    affinity = {
-        'nodeAffinity': {
-          'preferredDuringSchedulingIgnoredDuringExecution': [
-            {
-              "weight": 1,
-              "preference": {
-                "matchExpressions": {
-                  "key": "disktype",
-                  "operator": "In",
-                  "values": ["ssd"]
-                }
-              }
-            }
-          ]
-        },
-        "podAffinity": {
-          "requiredDuringSchedulingIgnoredDuringExecution": [
-            {
-              "labelSelector": {
-                "matchExpressions": [
-                  {
-                    "key": "security",
-                    "operator": "In",
-                    "values": ["S1"]
-                  }
-                ]
-              },
-              "topologyKey": "failure-domain.beta.kubernetes.io/zone"
-            }
-          ]
-        },
-        "podAntiAffinity": {
-          "requiredDuringSchedulingIgnoredDuringExecution": [
-            {
-              "labelSelector": {
-                "matchExpressions": [
-                  {
-                    "key": "security",
-                    "operator": "In",
-                    "values": ["S2"]
-                  }
-                ]
-              },
-              "topologyKey": "kubernetes.io/hostname"
-            }
-          ]
-        }
-    }
-
-    tolerations = [
-        {
-            'key': "key",
-            'operator': 'Equal',
-            'value': 'value'
-         }
-    ]
-
-    k = KubernetesPodOperator(namespace='default',
-                              image="ubuntu:16.04",
-                              cmds=["bash", "-cx"],
-                              arguments=["echo", "10"],
-                              labels={"foo": "bar"},
-                              secrets=[secret_file, secret_env, secret_all_keys],
-                              ports=[port],
-                              volumes=[volume],
-                              volume_mounts=[volume_mount],
-                              name="test",
-                              task_id="task",
-                              affinity=affinity,
-                              is_delete_operator_pod=True,
-                              hostnetwork=False,
-                              tolerations=tolerations,
-                              configmaps=configmaps,
-                              init_containers=[init_container],
-                              priority_class_name="medium",
-                              )
+The ``KubernetesPodOperator`` handles XCom values differently than other operators. In order to pass a XCom value
+from your Pod you must specify the ``do_xcom_push`` as ``True``. This will create a sidecar container that runs
+alongside the Pod. The Pod must write the XCom value into this location at the ``airflow/xcom/return.json`` path.

Review comment:
       /airflow/xcom*




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9079: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#issuecomment-661022028


   > Not sure what you mean here - what do you mean by installing into cluster? 
   
   I checked my code and this behavior is unique to Cloud Composer.  The team made one change to this operator. By default, the kube_config parameter points to a file that contains credentials for the current environment.
   <img width="641" alt="Screenshot 2020-07-20 at 14 52 31" src="https://user-images.githubusercontent.com/12058428/87939662-aaf14600-ca98-11ea-8b4d-ab98499b59c2.png">
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tanjinP commented on pull request #9079: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
tanjinP commented on pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#issuecomment-659774270


   @mik-laj 
   > I would be happy to add information that Kubernetes Executor is not required for Kubernetes Pod Operator. 
   
   Mentioned this as a [note here](https://github.com/apache/airflow/pull/9079/commits/7148b00481262c99e3bbd2b6958360663fe406f1)
   
   >Two sentences about the behavior of KubernetesPodOperator when installing into clusters would also be helpful, but we can add them in a separate change.
   
   Not sure what you mean here - what do you mean by `installing into cluster`? Do you mean setting up the `kubeconfig`? If so does the section `How does this operator work?` not cover it?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dossett commented on a change in pull request #9079: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
dossett commented on a change in pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#discussion_r451814841



##########
File path: docs/howto/operator/kubernetes.rst
##########
@@ -22,150 +22,96 @@
 KubernetesPodOperator
 =====================
 
+The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` allows
+you to create and run Pods on a Kubernetes cluster.
+
+.. contents::
+  :depth: 1
+  :local:
+
 .. note::
   If you use `Google Kubernetes Engine <https://cloud.google.com/kubernetes-engine/>`__, consider
   using the
   :ref:`GKEStartPodOperator <howto/operator:GKEStartPodOperator>` operator as it

Review comment:
       Some description that `GKEStartPodOperator` extends `KubernetesPodOperator` and that therefore the rest of this documentation is still applicable would be really helpful.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tanjinP commented on pull request #9079: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
tanjinP commented on pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#issuecomment-655076947


   Looks like some static checks failing, will address


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj merged pull request #9079: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
mik-laj merged pull request #9079:
URL: https://github.com/apache/airflow/pull/9079


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9079: WIP: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#discussion_r432947632



##########
File path: docs/howto/operator/kubernetes.rst
##########
@@ -22,7 +22,46 @@
 KubernetesPodOperator
 =====================
 
-The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator`:
+The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` allows
+you to create and run Pods on a Kubernetes cluster.
+

Review comment:
       ```suggestion
   
   
   .. contents::
     :depth: 1
     :local:
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9079: WIP: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#issuecomment-653418928


   @tanjinP Have you encountered any difficulties? Can I help you?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9079: WIP: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#issuecomment-636472245


   Very good direction. We just need it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tanjinP commented on a change in pull request #9079: WIP: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
tanjinP commented on a change in pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#discussion_r450571991



##########
File path: docs/howto/operator/kubernetes.rst
##########
@@ -25,180 +25,90 @@ KubernetesPodOperator
 The :class:`~airflow.providers.cncf.kubernetes.operators.kubernetes_pod.KubernetesPodOperator` allows
 you to create and run Pods on a Kubernetes cluster.
 
+.. contents::
+  :depth: 1
+  :local:
+
+.. note::
+  If you use `Google Kubernetes Engine <https://cloud.google.com/kubernetes-engine/>`__, consider
+  using the
+  :ref:`GKEStartPodOperator <howto/operator:GKEStartPodOperator>` operator as it
+  simplifies the Kubernetes authorization process.
+
 How does this operator work?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: include a definition of how the operator works, highlight dependency management gains, language agnosticism,
-and a quick overview of a k8s pod
-Go through the lifecycle of the pod in the context of a single task being executed on Airflow:
-start, monitor, end, delete (if specified). Mention where the pods can be run (in Airflow cluster if already
-hosted on k8s or a different one if kube config is available)
-
-Reference content from:
-- https://cloud.google.com/composer/docs/how-to/using/using-kubernetes-pod-operator
-- https://www.astronomer.io/docs/kubepodoperator/
-- https://medium.com/bluecore-engineering/were-all-using-airflow-wrong-and-how-to-fix-it-a56f14cb0753
-
-How to define Configurations (ConfigMaps and Secrets)?
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: define general best practices from here: https://kubernetes.io/docs/concepts/configuration/overview/
-Have examples with `airflow.kubernetes.secret.Secret` and `airflow.kubernetes.volume.Volume`
+The ``KubernetesPodOperator`` is able to natively launch a Kubernetes Pod to run an individual task -
+and terminate that pod when the task is completed. The operator uses the Kube Python Client to generate a
+Kubernetes API request that dynamically launches those individual pods. The connection to the client is based
+on the Kubernetes Configuration file which is either specified directly to the Pod or retrieved from the
+Airflow Configuration.
+
+The ``KubernetesPodOperator`` enables task-level resource configuration and is optimal for custom Python
+dependencies that are not available through the public PyPI repository.
+Ultimately, it allows Airflow to act a job orchestrator - no matter the language those jobs are written in.
+
+How to use cluster ConfigMaps, Secrets, and Volumes with Pod?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Kubernetes cluster resources such as ConfigMaps, Secrets, and Volumes can be used with a Pod to be launched.
+Utilize the Airflow Kubernetes model classes such as:
+:class:`~airflow.kubernetes.secret.Secret`
+or
+:class:`~airflow.kubernetes.volume.Volume`
+or
+:class:`~airflow.kubernetes.volume_mount.VolumeMount`
+to do this (as well as standard Python dictionaries). These can they be specified in the appropriate parameters
+when declaring the Pod task.
+
+.. exampleinclude:: ../../../airflow/providers/cncf/kubernetes/example_dags/example_kubernetes.py
+    :language: python
+    :start-after: [START howto_operator_k8s_cluster_resources]
+    :end-before: [END howto_operator_k8s_cluster_resources]
 
 Difference between ``KubernetesPodOperator`` and Kubernetes object spec
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: Have a definition of the KubernetesPodOperator and have the equivalent YAML spec to point out
-the similarities and differences
-Bonus include a JSON equivalent as well
+The ``KubernetesPodOperator`` can be considered a substitute for a Kubernetes object spec definition that is able
+to be run in the Airflow scheduler in the DAG context. If using the operator, there is no need to create the
+equivalent YAML/JSON object spec for the Pod you would like to run.
 
 How to use private images (container registry)?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-TODO: mention ECR, GCR, Quay, as options for image source and how to set up secret in Airflow to access a
-registry other than Docker Hub
-Reference content from: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
-Include simple example of a private registry image with the secret
+By default, the ``KubernetesPodOperator`` will look for images hosted publicly on Dockerhub.
+If you want to pull images from a private registry (such as ECR, GCR, Quay, or others), you must create a
+Kubernetes Secret that represents the credentials for accessing images from the private registry that is ultimately
+specified in the ``image_pull_secrets`` parameter.
+
+Create the Secret using ``kubectl``:
+
+.. code-block:: none
+
+    kubectl create secret docker-registry testquay \
+        --docker-server=quay.io \
+        --docker-username=<Profile name> \
+        --docker-password=<password>
+
+Then use it in your pod like so:
+
+.. exampleinclude:: ../../../airflow/providers/cncf/kubernetes/example_dags/example_kubernetes.py
+    :language: python
+    :start-after: [START howto_operator_k8s_private_image]
+    :end-before: [END howto_operator_k8s_private_image]
 
 How does XCom work?
 ^^^^^^^^^^^^^^^^^^^
-TODO: walk through workflow of pushing and pulling from operator
-Reference content from: https://www.aylakhan.tech/?p=725
-Include example of this in action (already one in the GKE example which we can borrow)
-
-
-* Launches a Docker image as a Kubernetes Pod to execute an individual Airflow
-  task via a Kubernetes API request, using the
-  `Kubernetes Python Client <https://github.com/kubernetes-client/python>`_
-* Terminate the pod when the task is completed
-* Works with any Airflow Executor
-* Allows Airflow to act a job orchestrator for a Docker container,
-  no matter the language the job was written in
-* Enables task-level resource configuration
-* Allow you to pass Kubernetes specific parameters into the task
-
-.. code-block:: python
-
-    import kubernetes.client.models as k8s
-
-    from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
-    from airflow.kubernetes.secret import Secret
-    from airflow.kubernetes.volume import Volume
-    from airflow.kubernetes.volume_mount import VolumeMount
-    from airflow.kubernetes.pod import Port
-
-
-    secret_file = Secret('volume', '/etc/sql_conn', 'airflow-secrets', 'sql_alchemy_conn')
-    secret_env  = Secret('env', 'SQL_CONN', 'airflow-secrets', 'sql_alchemy_conn')
-    secret_all_keys  = Secret('env', None, 'airflow-secrets-2')
-    volume_mount = VolumeMount('test-volume',
-                                mount_path='/root/mount_file',
-                                sub_path=None,
-                                read_only=True)
-    port = Port('http', 80)
-    configmaps = ['test-configmap-1', 'test-configmap-2']
-
-    volume_config= {
-        'persistentVolumeClaim':
-          {
-            'claimName': 'test-volume'
-          }
-        }
-    volume = Volume(name='test-volume', configs=volume_config)
-
-    init_container_volume_mounts = [k8s.V1VolumeMount(
-      mount_path='/etc/foo',
-      name='test-volume',
-      sub_path=None,
-      read_only=True
-    )]
-
-    init_environments = [k8s.V1EnvVar(
-      name='key1',
-      value='value1'
-    ), k8s.V1EnvVar(
-      name='key2',
-      value='value2'
-    )]
-
-    init_container = k8s.V1Container(
-      name="init-container",
-      image="ubuntu:16.04",
-      env=init_environments,
-      volume_mounts=init_container_volume_mounts,
-      command=["bash", "-cx"],
-      args=["echo 10"]
-    )
-
-    affinity = {
-        'nodeAffinity': {
-          'preferredDuringSchedulingIgnoredDuringExecution': [
-            {
-              "weight": 1,
-              "preference": {
-                "matchExpressions": {
-                  "key": "disktype",
-                  "operator": "In",
-                  "values": ["ssd"]
-                }
-              }
-            }
-          ]
-        },
-        "podAffinity": {
-          "requiredDuringSchedulingIgnoredDuringExecution": [
-            {
-              "labelSelector": {
-                "matchExpressions": [
-                  {
-                    "key": "security",
-                    "operator": "In",
-                    "values": ["S1"]
-                  }
-                ]
-              },
-              "topologyKey": "failure-domain.beta.kubernetes.io/zone"
-            }
-          ]
-        },
-        "podAntiAffinity": {
-          "requiredDuringSchedulingIgnoredDuringExecution": [
-            {
-              "labelSelector": {
-                "matchExpressions": [
-                  {
-                    "key": "security",
-                    "operator": "In",
-                    "values": ["S2"]
-                  }
-                ]
-              },
-              "topologyKey": "kubernetes.io/hostname"
-            }
-          ]
-        }
-    }
-
-    tolerations = [
-        {
-            'key': "key",
-            'operator': 'Equal',
-            'value': 'value'
-         }
-    ]
-
-    k = KubernetesPodOperator(namespace='default',
-                              image="ubuntu:16.04",
-                              cmds=["bash", "-cx"],
-                              arguments=["echo", "10"],
-                              labels={"foo": "bar"},
-                              secrets=[secret_file, secret_env, secret_all_keys],
-                              ports=[port],
-                              volumes=[volume],
-                              volume_mounts=[volume_mount],
-                              name="test",
-                              task_id="task",
-                              affinity=affinity,
-                              is_delete_operator_pod=True,
-                              hostnetwork=False,
-                              tolerations=tolerations,
-                              configmaps=configmaps,
-                              init_containers=[init_container],
-                              priority_class_name="medium",
-                              )
+The ``KubernetesPodOperator`` handles XCom values differently than other operators. In order to pass a XCom value
+from your Pod you must specify the ``do_xcom_push`` as ``True``. This will create a sidecar container that runs
+alongside the Pod. The Pod must write the XCom value into this location at the ``airflow/xcom/return.json`` path.

Review comment:
       Yikes - this was a good one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] tanjinP commented on pull request #9079: WIP: [8970] Improve KubernetesPodOperator guide

Posted by GitBox <gi...@apache.org>.
tanjinP commented on pull request #9079:
URL: https://github.com/apache/airflow/pull/9079#issuecomment-653900111


   > @tanjinP Have you encountered any difficulties? Can I help you?
   
   @mik-laj thanks for checking in. Will come back to this and have the initial version complete by end of today if not tomorrow. Will mark the PR as ready then.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org