You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/04/24 17:44:37 UTC

[GitHub] [airflow] dimberman opened a new issue #8544: Create EKS operators

dimberman opened a new issue #8544:
URL: https://github.com/apache/airflow/issues/8544


   **Description**
   
   The same way that we have `GKEStartPodOperator` and `GKECreateClusterOperator` we should add operators that interact with EKS and abstracts all amazon logic from the airflow user.
   
   **Use case / motivation**
   
   This will allow airflow to be a general purpose kubernetes orchestrator. Able to do multi-cluster orchestration across multiple clouds.
   
   **Related Issues**
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ferruzzi edited a comment on issue #8544: Create EKS operators

Posted by GitBox <gi...@apache.org>.
ferruzzi edited a comment on issue #8544:
URL: https://github.com/apache/airflow/issues/8544#issuecomment-800643763


   I've started putting down the boilerplate code, I think this is how I am going to tackle this.  I'm not sure how formal you folks like it yet, but please let me know what you think of the plan.
   
   ### Goals
   
   Add a collection of Apache Airflow Operators which interact with Amazon Elastic Kubernetes Service (EKS) and abstract all Amazon logic from the Airflow user.  This will allow Airflow to be a general purpose Kubernetes orchestrator able to do multi-cluster orchestration across multiple clouds.
   
   ### Proposal
   
   The proposed solution is a collection of Operators, and their underlying Hooks, which will be added to the Amazon AWS provider package.  These Operators will handle creating and deleting clusters, as well as executing tasks using EKS Managed Node Groups.  
   
   ### Assumptions and Prerequisites
   
   * The account running the DAGs will need eks:DescribeCluster IAM permissions to retrieve the information currently provided by the manual kubeconfig file.
   
   ### Definitions
   
   *Pod* - A Kubernetes *pod* is the way that Kubernetes runs containers on a compute instance and includes containers and specifications for how they should run, networking, and storage. A *pod* can be a single container or multiple containers that always run together.
   
   *Cluster* - An Amazon EKS *cluster* consists of the Amazon EKS control plane, which runs the Kubernetes software and API server, and the *pod* that is registered with the control plane.
   
   *Operator* - An *operator* defines a single task within the workflow.
   
   *kubectl* - The Kubernetes command-line tool which allows users to run commands against Kubernetes clusters. Uses include deploying applications, inspecting and managing cluster resources, and viewing logs.
   
   *eksctl* - An open source CLI tool created by the community to create clusters on EKS using CloudFormation.
   
   *aws eks (cli tool)* - A CLI tool which, among other things, is used to generate the kubeconfig file.
   
   *kubeconfig* - A config file containing required information about clusters, users, namespaces, and authentication mechanisms. *kubectl* uses *kubeconfig* files to find the information it needs to choose a cluster and communicate with the API server of a cluster.
   
   *EKS Managed Node Groups* (nodegroup) - Infrastructure as a Service - *EKS Managed Node Groups* create and manage Amazon Elastic Compute Cloud (EC2) instances which host a Kubernetes cluster.  This is the default underlying compute platform for EKS clusters. 
   
   *Task* - The process or command being run in a pod.
   
   
   ### Context and User Experience
   
   While the basic functions of creating and running pods on EKS can be handled through the existing Cloud Native Computing Foundation (CNCF) Kubernetes Pod Operator, running the pods on EKS introduces pain points to the users, some of which are detailed below, and requires some specific EKS knowledge.  By abstracting away some of this Amazon-specific logic, we can automate and streamline the configuration and deployment of new pods.
   
   Currently, in order to deploy a new pod on EKS, the user needs to leverage the kubectl, eksctl, and aws command-line tools and generate config files to manually pass data to the Kubernetes Pod Operator.  The current manual process is:
   
   1. Create a cluster - uses the eksctl CLI tool
   2. Create a namespace - uses the kubectl CLI tool
   3. Create and attach an IAM Role for permission to log into the cluster - uses eksctl CLI tool
   4. Create or modify the Airflow requirements.txt file to ensure it contains two required packages: awscli and kubernetes==12.0.1
   5. Create and possibly edit the kubeconfig file - uses aws eks CLI tool
   6. Copy the edited kubeconfig file to the dags directory
   
   
   Using the BOTO3 python API, new Operators can automate most or all of those steps and create a more seamless experience for the user.
   
   ### Use Cases
   Use Case # | Short Description | Priority | Supporting Operator
   -- | -- | -- | --
   1 | As a user, I want to create a new cluster using existing pods. | 0 | Create Cluster
   2 | As a user, I want to be able to delete a cluster I have created. | 0 | Delete Cluster
   3 | As a user, I want to execute a new task on my existing pod. | 0 | Start Pod
   4 | As a user, I want to delete a pod that I created on a nodegroup. | 0 | Delete Nodegroup
   5 | As a user, I want to create a new pod using managed nodegroups. | 0 | Create Nodegroup
   
   Benchmarks
   
   At a minimum, this solution should offer feature parity with the Google Kubernetes Engine (GKE) Pod Operator functionality.
   
   **Create Cluster** - Create a Google Kubernetes Engine Cluster of specified dimensions
   
   ```
   operator = GKEClusterCreateOperator(
                   task_id='cluster_create',
                   project_id='my-project',
                   location='my-location',
                   body=cluster_def,)
   ```
   
   **Delete Cluster** - Deletes the cluster, including the Kubernetes endpoint and all worker nodes
   
   ```
   operator = GKEClusterDeleteOperator(
                   task_id='cluster_delete',
                   project_id='my-project',
                   location='cluster-location',
                   name='cluster-name',)
   ```
   
   
   **Start Pod** - Executes a task in a Kubernetes pod in the specified Google Kubernetes Engine cluster
   
   ```
    operator = GKEStartPodOperator(
                   task_id="pod_task",
                   project_id=GCP_PROJECT_ID,
                   location=GCP_LOCATION,
                   cluster_name=CLUSTER_NAME,
                   namespace="default",
                   image="perl",
                   name="test-pod",)
   ```
   
   And some sequence diagrams for the Operators for folks who like that kinda thing:
   
   ![Create Cluster](https://i.imgur.com/usjjBVp.png)
   ![Delete Cluster](https://i.imgur.com/dtCv1al.png)
   ![Start Pod](https://i.imgur.com/pwiLvqV.png)
   
   [EDIT: corrected CreateCluster sequence diagram image; initially uploaded the wrong file]


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ferruzzi commented on issue #8544: Create EKS operators

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on issue #8544:
URL: https://github.com/apache/airflow/issues/8544#issuecomment-789123621


   Hi folks, I'm going to be working with @john-jac on this; the code etc will be in my fork and I'll have a design plan/doc posted here in this ticket in the next day or two.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb closed issue #8544: Create EKS operators

Posted by GitBox <gi...@apache.org>.
ashb closed issue #8544:
URL: https://github.com/apache/airflow/issues/8544


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] john-jac commented on issue #8544: Create EKS operators

Posted by GitBox <gi...@apache.org>.
john-jac commented on issue #8544:
URL: https://github.com/apache/airflow/issues/8544#issuecomment-768665608


   Happy to lead this effort


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ferruzzi commented on issue #8544: Create EKS operators

Posted by GitBox <gi...@apache.org>.
ferruzzi commented on issue #8544:
URL: https://github.com/apache/airflow/issues/8544#issuecomment-800643763


   I've started putting down the boilerplate code, I think this is how I am going to tackle this.  I'm not sure how formal you folks like it yet, but please let me know what you think of the plan.
   
   ### Goals
   
   Add a collection of Apache Airflow Operators which interact with Amazon Elastic Kubernetes Service (EKS) and abstract all Amazon logic from the Airflow user.  This will allow Airflow to be a general purpose Kubernetes orchestrator able to do multi-cluster orchestration across multiple clouds.
   
   ### Proposal
   
   The proposed solution is a collection of Operators, and their underlying Hooks, which will be added to the Amazon AWS provider package.  These Operators will handle creating and deleting clusters, as well as executing tasks using EKS Managed Node Groups.  
   
   ### Assumptions and Prerequisites
   
   * The account running the DAGs will need eks:DescribeCluster IAM permissions to retrieve the information currently provided by the manual kubeconfig file.
   
   ### Definitions
   
   *Pod* - A Kubernetes *pod* is the way that Kubernetes runs containers on a compute instance and includes containers and specifications for how they should run, networking, and storage. A *pod* can be a single container or multiple containers that always run together.
   
   *Cluster* - An Amazon EKS *cluster* consists of the Amazon EKS control plane, which runs the Kubernetes software and API server, and the *pod* that is registered with the control plane.
   
   *Operator* - An *operator* defines a single task within the workflow.
   
   *kubectl* - The Kubernetes command-line tool which allows users to run commands against Kubernetes clusters. Uses include deploying applications, inspecting and managing cluster resources, and viewing logs.
   
   *eksctl* - An open source CLI tool created by the community to create clusters on EKS using CloudFormation.
   
   *aws eks (cli tool)* - A CLI tool which, among other things, is used to generate the kubeconfig file.
   
   *kubeconfig* - A config file containing required information about clusters, users, namespaces, and authentication mechanisms. *kubectl* uses *kubeconfig* files to find the information it needs to choose a cluster and communicate with the API server of a cluster.
   
   *EKS Managed Node Groups* (nodegroup) - Infrastructure as a Service - *EKS Managed Node Groups* create and manage Amazon Elastic Compute Cloud (EC2) instances which host a Kubernetes cluster.  This is the default underlying compute platform for EKS clusters. 
   
   *Task* - The process or command being run in a pod.
   
   
   ### Context and User Experience
   
   While the basic functions of creating and running pods on EKS can be handled through the existing Cloud Native Computing Foundation (CNCF) Kubernetes Pod Operator, running the pods on EKS introduces pain points to the users, some of which are detailed below, and requires some specific EKS knowledge.  By abstracting away some of this Amazon-specific logic, we can automate and streamline the configuration and deployment of new pods.
   
   Currently, in order to deploy a new pod on EKS, the user needs to leverage the kubectl, eksctl, and aws command-line tools and generate config files to manually pass data to the Kubernetes Pod Operator.  The current manual process is:
   
   1. Create a cluster - uses the eksctl CLI tool
   2. Create a namespace - uses the kubectl CLI tool
   3. Create and attach an IAM Role for permission to log into the cluster - uses eksctl CLI tool
   4. Create or modify the Airflow requirements.txt file to ensure it contains two required packages: awscli and kubernetes==12.0.1
   5. Create and possibly edit the kubeconfig file - uses aws eks CLI tool
   6. Copy the edited kubeconfig file to the dags directory
   
   
   Using the BOTO3 python API, new Operators can automate most or all of those steps and create a more seamless experience for the user.
   
   ### Use Cases
   Use Case # | Short Description | Priority | Supporting Operator
   -- | -- | -- | --
   1 | As a user, I want to create a new cluster using existing pods. | 0 | Create Cluster
   2 | As a user, I want to be able to delete a cluster I have created. | 0 | Delete Cluster
   3 | As a user, I want to execute a new task on my existing pod. | 0 | Start Pod
   4 | As a user, I want to delete a pod that I created on a nodegroup. | 0 | Delete Nodegroup
   5 | As a user, I want to create a new pod using managed nodegroups. | 0 | Create Nodegroup
   
   Benchmarks
   
   At a minimum, this solution should offer feature parity with the Google Kubernetes Engine (GKE) Pod Operator functionality.
   
   **Create Cluster** - Create a Google Kubernetes Engine Cluster of specified dimensions
   
   ```
   operator = GKEClusterCreateOperator(
                   task_id='cluster_create',
                   project_id='my-project',
                   location='my-location',
                   body=cluster_def,)
   ```
   
   **Delete Cluster** - Deletes the cluster, including the Kubernetes endpoint and all worker nodes
   
   ```
   operator = GKEClusterDeleteOperator(
                   task_id='cluster_delete',
                   project_id='my-project',
                   location='cluster-location',
                   name='cluster-name',)
   ```
   
   
   **Start Pod** - Executes a task in a Kubernetes pod in the specified Google Kubernetes Engine cluster
   
   ```
    operator = GKEStartPodOperator(
                   task_id="pod_task",
                   project_id=GCP_PROJECT_ID,
                   location=GCP_LOCATION,
                   cluster_name=CLUSTER_NAME,
                   namespace="default",
                   image="perl",
                   name="test-pod",)
   ```
   
   And some sequence diagrams for the Operators for folks who like that kinda thing:
   
   ![Create Cluster](https://i.imgur.com/qXyBjgd.png)
   ![Delete Cluster](https://i.imgur.com/dtCv1al.png)
   ![Start Pod](https://i.imgur.com/pwiLvqV.png)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #8544: Create EKS operators

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #8544:
URL: https://github.com/apache/airflow/issues/8544#issuecomment-809756555


   Hi @ferruzzi, this looks great!
   
   In general,  we don't need anything super formal for creating operators (though this is useful). If you were to make a core change that would involve submitting an AIP, but for this you're good to go!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #8544: Create EKS operators

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #8544:
URL: https://github.com/apache/airflow/issues/8544#issuecomment-768667404


   Thank you @john-jac! Please let me know if you need help on the kubernetes or airflow core side of things. I think this would be a great addition to Airflow's k8s story :).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org