You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Daniel Imberman <da...@gmail.com> on 2020/08/12 14:24:16 UTC

Simplifying the KubernetesExecutor

Hello, fellow Airflowers! I hope you are all well in these trying times.


With the recent launch of Airflow 2.0 preparation, it now seems like a good time to review the project's state and where we can fit in some breaking changes that will improve the project for the future.


When we first created the KubernetesExecutor, we had two goals in mind. The first goal was to improve the airflow Auto scaling story. Previously, airflow users would have to manually provision celery workers, which could lead to wasted resources or missed SLAs. The other goal was to introduce a community that was not yet well versed in the Kubernetes API to the Kubernetes system.


To ease the community's transition, we abstracted many of the complexities of creating a Kubernetes object. We chose to offer a limited number of configurations and keep much of the pod creation process internal to airflow. In the short-term, this system lowered the barrier to entry. Over time, however, this abstraction has become a nightmare of tech debt as the Kubernetes API is expensive and constantly changing.


With this in mind, I think it's time for us to consider a more straightforward approach that takes the complexity out of Airflow and offers the full Kubernetes API to the airflow user.


What I'm proposing here is pretty straightforward. We remove all Kubernetes pod creation configurations from the airflow.cfg and instead offer only one way to use the KubernetesExecutor: with a YAML file.


We can easily supply all of the configurations to the KubernetesExecutor by offering example YAMLs (git sync mode is just a sidecar and an init container, DAG volumes are just an example volume and volume mount, etc.).


This system would simplify a user's ability to predict what a pod will look like once it is launched by airflow. They will know it's a base pod and will be able to simply modify the pod object using the executor config and the pod mutation hook.







This simplification could also lead to some pretty great new features in the KubernetesExecutor


Idea 1: Picking a pod_template_file per-task
Along with the existing customization with the executor config, solely relying on pod files can allow users to pick the pod template file that they want to use as their base pod on a per-task basis. An Airflow engineer could supply several pre-made templates for their data scientists to reduce the amount of customization an airflow user would need to use.


Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
One idea that we've been excited about recently has been the idea of creating merged Celery and Kubernetes executor. This hybrid executor would default to launching celery workers with KEDA and would have the option to launch individual tasks using the Kubernetes executor when a user wants isolation or customization. Simplifying the Kubernetes executor reduces the number of fail-points that this merged executor would need to account for.



What would we need to do to implement this?
The good news here is that the hard work has already been done! As of AIRFLOW-5413 [https://issues.apache.org/jira/browse/AIRFLOW-5413] by David Lum, airflow already has the ability to use base worker pods on a template file. This would involve primarily code deletion and very little new code.


Thank you for your time and I look forward to the community’s discussion.

Daniel

Re: Simplifying the KubernetesExecutor

Posted by Jarek Potiuk <Ja...@polidea.com>.
Yep  +1 for queue parameter.

I had implemented once a POC for a multi-executor that also used 'queue' to
route between any executors (and potentially multiple of them):

* Main commit here:
https://github.com/PolideaInternal/airflow/commit/c87e87a5e7673e67e78d4ce39d5a15a8cc494e56

* fixup here:
https://github.com/PolideaInternal/airflow/commit/6590f665d4d9aa2629052c1af2e6f4c36a9a4dd3

But I agree that more "generic" solution is not needed here - really the
case we want to handle is 'Celery' + 'Kubernetes'.

J.

On Thu, Aug 27, 2020 at 9:21 AM Kaxil Naik <ka...@gmail.com> wrote:

> +1 for queue param
>
> On Thu, Aug 27, 2020, 04:38 Ping Zhang <pi...@umich.edu> wrote:
>
> > Hi Daniel,
> >
> > Thanks for checking that. I like the idea of using the queue field too. I
> > will make the change  and tag you in the PR.
> >
> > Best wishes
> >
> > Ping Zhang
> >
> >
> > On Mon, Aug 24, 2020 at 9:10 AM Daniel Imberman <
> daniel.imberman@gmail.com
> > >
> > wrote:
> >
> > > Hi Ping,
> > >
> > > This looks great! The only change I might suggest is maybe we check the
> > > “queue” field and set the router to KubernetesExecutor is
> > > “queue=‘kubernetes’”. What do you think?
> > >
> > > via Newton Mail [
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > > ]
> > > On Mon, Aug 24, 2020 at 12:12 AM, Ping Zhang <pi...@umich.edu> wrote:
> > > ah, did not know that. Thanks for letting me know.
> > >
> > > The image is not that important, all the important information is in
> the
> > > gist:
> > >
> > >
> >
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py
> > >
> > > Best wishes
> > >
> > > Ping Zhang
> > >
> > >
> > > On Sun, Aug 23, 2020 at 11:59 PM Deng Xiaodong <xd...@gmail.com>
> > > wrote:
> > >
> > > > Hi Ping,
> > > >
> > > > Inserting images in the email body doesn’t work for this mail list.
> So
> > we
> > > > cannot actually see the pictures you shared.
> > > >
> > > > If you find them important, you may share the pictures somewhere then
> > > > provide the links in the mail body.
> > > >
> > > > Thanks.
> > > >
> > > > XD
> > > >
> > > > On Mon, Aug 24, 2020 at 08:56 Ping Zhang <pi...@umich.edu> wrote:
> > > >
> > > > > Great ideas.
> > > > >
> > > > > Actually, when I evaluated using KubernetesExecutor in Airbnb, I
> > > created
> > > > > *JobDispatcherExecutor*, which is similar to the idea 2.
> > > > >
> > > > > We have tons of sensor tasks launched at around UTC 00:00 and we
> > > enabled
> > > > > the SmartSensor, which means each sensor task is very short
> running.
> > If
> > > > we
> > > > > choose to have one pod per task, the k8s cluster will not be able
> to
> > > > handle
> > > > > the bursty load.
> > > > >
> > > > > Thus, I created a JobDispatcherExecutor which takes an instance of
> > > > > `celery_executor` and `kubernetes_executor`.
> > > > > [image: image.png]
> > > > > And there is a router
> > > > > <
> > > >
> > >
> >
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py-L165-L168
> > > > >
> > > > > method to determine use which executor to queue_command,
> > > > > [image: image.png]
> > > > > It is currently based on whether a task is a smart sensor task, but
> > it
> > > > can
> > > > > be easily changed to other attributes and/or settings.policy (so
> that
> > > > infra
> > > > > can have control).
> > > > >
> > > > > Here is the gist:
> > > > >
> > > >
> > >
> >
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py
> > > > >
> > > > > [image: image.png]
> > > > >
> > > > >
> > > > > Would love to get some feedback.
> > > > >
> > > > > Best wishes
> > > > >
> > > > > Ping Zhang
> > > > >
> > > > >
> > > > > On Thu, Aug 13, 2020 at 12:58 PM Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com>
> > > > > wrote:
> > > > >
> > > > >> +1 for pod mutation in config. It's not a "yamly" thing - it's a
> > > python
> > > > >>
> > > > >>
> > > > >> code to run, so it's quite appropriate to have it in config.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Wed, Aug 12, 2020 at 9:43 PM Daniel Imberman <
> > > > >> daniel.imberman@gmail.com>
> > > > >>
> > > > >>
> > > > >> wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> > I think pod_mutation is important as an “admin override”. E.g.
> > there
> > > > >> might
> > > > >>
> > > > >>
> > > > >> > be a label that is required by security policy for all tasks
> > > launched
> > > > by
> > > > >>
> > > > >>
> > > > >> > airflow. I do agree that the executor_config should take in a
> > V1Pod
> > > > >> object
> > > > >>
> > > > >>
> > > > >> >
> > > > >>
> > > > >>
> > > > >> > One option is that the executor_config could take in a
> > > > pod_mutation_hook
> > > > >>
> > > > >>
> > > > >> > function as well? So users can manually override the pod
> > > functionally
> > > > >>
> > > > >>
> > > > >> > rather than just giving an object and hoping it merges the way
> > they
> > > > >> expect
> > > > >>
> > > > >>
> > > > >> > it to?
> > > > >>
> > > > >>
> > > > >> > via Newton Mail [
> > > > >>
> > > > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > > > >>
> > > > >>
> > > > >> > ]
> > > > >>
> > > > >>
> > > > >> > On Wed, Aug 12, 2020 at 12:19 PM, David Lum <
> davidlum91@gmail.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >>
> > > > >> > I'll just leave some passing thoughts here. How the pod template
> > > file
> > > > is
> > > > >>
> > > > >>
> > > > >> > modified by the executor config might not be very well defined.
> > > > >> Currently
> > > > >>
> > > > >>
> > > > >> > when a user passes in a k8s.V1Pod object through executor_config
> > we
> > > > >> kind of
> > > > >>
> > > > >>
> > > > >> > arbitrarily merge the two pod definitions to what we imagine the
> > > > client
> > > > >>
> > > > >>
> > > > >> > would want, but it could be often misinterpreted. Previously the
> > > > >>
> > > > >>
> > > > >> > Airflow-defined Pod class was fairly flat. This made merging the
> > two
> > > > >>
> > > > >>
> > > > >> > objects fairly straight forward. If the user wrote the field in
> > > > >>
> > > > >>
> > > > >> > executor_config, it overwrote that field in the Pod definition,
> > but
> > > > the
> > > > >>
> > > > >>
> > > > >> > rest stayed the same. When attributes are arbitrarily nested
> like
> > in
> > > > the
> > > > >>
> > > > >>
> > > > >> > k8s.V1Pod, some aspects can be mistakenly overwritten. For
> > example,
> > > > >>
> > > > >>
> > > > >> > specifying the volumeMounts of a container requires also
> > specifying
> > > > the
> > > > >>
> > > > >>
> > > > >> > rest of the V1Pod. How do we handle this? Do we just overwrite
> > > > >>
> > > > >>
> > > > >> > volumeMounts? The client must have specified a container, so do
> we
> > > > >>
> > > > >>
> > > > >> > overwrite that too? The fact that there is a choice means the
> API
> > > > could
> > > > >> be
> > > > >>
> > > > >>
> > > > >> > unintuitive to the user.
> > > > >>
> > > > >>
> > > > >> >
> > > > >>
> > > > >>
> > > > >> > What I suggest might be easier is that executor_config
> completely
> > > > >> replaces
> > > > >>
> > > > >>
> > > > >> > the pod_template_file if it is specified, but then also offer
> > > > something
> > > > >>
> > > > >>
> > > > >> > like pod_mutation_hook at the operator level. This makes it
> quite
> > > > clear
> > > > >> to
> > > > >>
> > > > >>
> > > > >> > the user, you are either replacing the pod_template, or you are
> > > > >> modifying
> > > > >>
> > > > >>
> > > > >> > the pod_template with pod_mutation_hook and you are doing it by
> > your
> > > > own
> > > > >>
> > > > >>
> > > > >> > rules. the argument to executor_config can be something like
> > > > >>
> > > > >>
> > > > >> > ```
> > > > >>
> > > > >>
> > > > >> > executor_config: { 'KubernetesExector': {'pod_mutation':
> > some_func,
> > > > >>
> > > > >>
> > > > >> > 'pod_template': k8s.V1Pod(...) }} # Pod template could also be a
> > > path
> > > > >> to a
> > > > >>
> > > > >>
> > > > >> > file
> > > > >>
> > > > >>
> > > > >> > ```
> > > > >>
> > > > >>
> > > > >> >
> > > > >>
> > > > >>
> > > > >> > If both are specified then pod_mutation is a function applied to
> > > > >>
> > > > >>
> > > > >> > pod_template. Thoughts?
> > > > >>
> > > > >>
> > > > >> >
> > > > >>
> > > > >>
> > > > >> > On Wed, Aug 12, 2020 at 11:24 AM Tomasz Urbaszek <
> > > > turbaszek@apache.org>
> > > > >>
> > > > >>
> > > > >> > wrote:
> > > > >>
> > > > >>
> > > > >> >
> > > > >>
> > > > >>
> > > > >> > > +1 for the idea, thanks Daniel! I agree that multi-executor
> and
> > > pod
> > > > >>
> > > > >>
> > > > >> > > templates should be 2.1.
> > > > >>
> > > > >>
> > > > >> > >
> > > > >>
> > > > >>
> > > > >> > > T.
> > > > >>
> > > > >>
> > > > >> > >
> > > > >>
> > > > >>
> > > > >> > > On Wed, Aug 12, 2020 at 5:13 PM Daniel Imberman <
> > > > >>
> > > > >>
> > > > >> > daniel.imberman@gmail.com>
> > > > >>
> > > > >>
> > > > >> > > wrote:
> > > > >>
> > > > >>
> > > > >> > >
> > > > >>
> > > > >>
> > > > >> > >> (Also funny enough we could use a lot of the existing
> > > > infrastructure
> > > > >> int
> > > > >>
> > > > >>
> > > > >> > >> 1.10 to create that migration script. Would just need to take
> > the
> > > > >> python
> > > > >>
> > > > >>
> > > > >> > >> dictionary and use the python yaml library)
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >> via Newton Mail [
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > > > >>
> > > > >>
> > > > >> > >> ]
> > > > >>
> > > > >>
> > > > >> > >> On Wed, Aug 12, 2020 at 8:09 AM, Daniel Imberman <
> > > > >>
> > > > >>
> > > > >> > >> daniel.imberman@gmail.com> wrote:
> > > > >>
> > > > >>
> > > > >> > >> 100% agreed on timing. I think 2.0 should be for the breaking
> > > > aspect
> > > > >>
> > > > >>
> > > > >> > >> (losing the configurations) and then 2.1/2.2 we can start
> > adding
> > > on
> > > > >> the
> > > > >>
> > > > >>
> > > > >> > new
> > > > >>
> > > > >>
> > > > >> > >> features. I also like the idea of a migration tool. We can
> > make a
> > > > >> script
> > > > >>
> > > > >>
> > > > >> > >> that takes your airflow.cfg and converts that into a yaml
> file.
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >> via Newton Mail [
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > > > >>
> > > > >>
> > > > >> > >> ]
> > > > >>
> > > > >>
> > > > >> > >> On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <
> > > > >> Jarek.Potiuk@polidea.com
> > > > >>
> > > > >>
> > > > >> > >
> > > > >>
> > > > >>
> > > > >> > >> wrote:
> > > > >>
> > > > >>
> > > > >> > >> Big +1. All the arguments are very appealing to me and
> > > simplifying
> > > > >> the
> > > > >>
> > > > >>
> > > > >> > >> Kubernetes Executor down to YAML-configurable one seems like
> a
> > > > >>
> > > > >>
> > > > >> > no-brainer
> > > > >>
> > > > >>
> > > > >> > >> especially if we provide some migration tools. I've lost
> > > countless
> > > > >> hours
> > > > >>
> > > > >>
> > > > >> > >> on
> > > > >>
> > > > >>
> > > > >> > >> debugging some configuration problems, simply because the
> > > relevant
> > > > >>
> > > > >>
> > > > >> > >> Kubernetes-related configuration was in the least expected
> > place
> > > -
> > > > >> i.e.
> > > > >>
> > > > >>
> > > > >> > >> airflow.cfg. YAML configuration.
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >> I am also a big fan of both 1. and 2. I've implemented a POC
> of
> > > > >>
> > > > >>
> > > > >> > >> queue-based multi-scheduler once but having it embedded as
> part
> > > of
> > > > >> core
> > > > >>
> > > > >>
> > > > >> > >> Airflow rather than based it on queues (which are basically a
> > > > Celery
> > > > >>
> > > > >>
> > > > >> > >> Executor concept) is I think much better approach. Both 1.
> and
> > 2.
> > > > are
> > > > >>
> > > > >>
> > > > >> > >> cool.
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >> Now - question about timing. If we decide to go that route -
> my
> > > > view
> > > > >> is
> > > > >>
> > > > >>
> > > > >> > >> that simplifying Kubernetes should be an Airflow 2.0 task -
> > > > alongside
> > > > >>
> > > > >>
> > > > >> > more
> > > > >>
> > > > >>
> > > > >> > >> comprehensive tests (which will be much easier to write in
> this
> > > > >> case).
> > > > >>
> > > > >>
> > > > >> > The
> > > > >>
> > > > >>
> > > > >> > >> new features/ideas 1. 2. for KE I think should come after
> that
> > -
> > > > >> when we
> > > > >>
> > > > >>
> > > > >> > >> release and stabilize 2.0. Sounds like great candidates for
> 2.1
> > > to
> > > > >> me.
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >> J.
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >> On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <
> > > > >>
> > > > >>
> > > > >> > >> daniel.imberman@gmail.com>
> > > > >>
> > > > >>
> > > > >> > >> wrote:
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >> > Hello, fellow Airflowers! I hope you are all well in these
> > > trying
> > > > >>
> > > > >>
> > > > >> > times.
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > With the recent launch of Airflow 2.0 preparation, it now
> > seems
> > > > >> like a
> > > > >>
> > > > >>
> > > > >> > >> > good time to review the project's state and where we can
> fit
> > in
> > > > >> some
> > > > >>
> > > > >>
> > > > >> > >> > breaking changes that will improve the project for the
> > future.
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > When we first created the KubernetesExecutor, we had two
> > goals
> > > in
> > > > >>
> > > > >>
> > > > >> > mind.
> > > > >>
> > > > >>
> > > > >> > >> > The first goal was to improve the airflow Auto scaling
> story.
> > > > >>
> > > > >>
> > > > >> > >> Previously,
> > > > >>
> > > > >>
> > > > >> > >> > airflow users would have to manually provision celery
> > workers,
> > > > >> which
> > > > >>
> > > > >>
> > > > >> > >> could
> > > > >>
> > > > >>
> > > > >> > >> > lead to wasted resources or missed SLAs. The other goal was
> > to
> > > > >>
> > > > >>
> > > > >> > >> introduce a
> > > > >>
> > > > >>
> > > > >> > >> > community that was not yet well versed in the Kubernetes
> API
> > to
> > > > the
> > > > >>
> > > > >>
> > > > >> > >> > Kubernetes system.
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > To ease the community's transition, we abstracted many of
> the
> > > > >>
> > > > >>
> > > > >> > >> complexities
> > > > >>
> > > > >>
> > > > >> > >> > of creating a Kubernetes object. We chose to offer a
> limited
> > > > >> number of
> > > > >>
> > > > >>
> > > > >> > >> > configurations and keep much of the pod creation process
> > > internal
> > > > >> to
> > > > >>
> > > > >>
> > > > >> > >> > airflow. In the short-term, this system lowered the barrier
> > to
> > > > >> entry.
> > > > >>
> > > > >>
> > > > >> > >> Over
> > > > >>
> > > > >>
> > > > >> > >> > time, however, this abstraction has become a nightmare of
> > tech
> > > > >> debt as
> > > > >>
> > > > >>
> > > > >> > >> the
> > > > >>
> > > > >>
> > > > >> > >> > Kubernetes API is expensive and constantly changing.
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > With this in mind, I think it's time for us to consider a
> > more
> > > > >>
> > > > >>
> > > > >> > >> > straightforward approach that takes the complexity out of
> > > Airflow
> > > > >> and
> > > > >>
> > > > >>
> > > > >> > >> > offers the full Kubernetes API to the airflow user.
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > What I'm proposing here is pretty straightforward. We
> remove
> > > all
> > > > >>
> > > > >>
> > > > >> > >> > Kubernetes pod creation configurations from the airflow.cfg
> > and
> > > > >>
> > > > >>
> > > > >> > instead
> > > > >>
> > > > >>
> > > > >> > >> > offer only one way to use the KubernetesExecutor: with a
> YAML
> > > > file.
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > We can easily supply all of the configurations to the
> > > > >>
> > > > >>
> > > > >> > KubernetesExecutor
> > > > >>
> > > > >>
> > > > >> > >> > by offering example YAMLs (git sync mode is just a sidecar
> > and
> > > an
> > > > >> init
> > > > >>
> > > > >>
> > > > >> > >> > container, DAG volumes are just an example volume and
> volume
> > > > mount,
> > > > >>
> > > > >>
> > > > >> > >> etc.).
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > This system would simplify a user's ability to predict
> what a
> > > pod
> > > > >> will
> > > > >>
> > > > >>
> > > > >> > >> > look like once it is launched by airflow. They will know
> > it's a
> > > > >> base
> > > > >>
> > > > >>
> > > > >> > pod
> > > > >>
> > > > >>
> > > > >> > >> > and will be able to simply modify the pod object using the
> > > > executor
> > > > >>
> > > > >>
> > > > >> > >> config
> > > > >>
> > > > >>
> > > > >> > >> > and the pod mutation hook.
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > This simplification could also lead to some pretty great
> new
> > > > >> features
> > > > >>
> > > > >>
> > > > >> > in
> > > > >>
> > > > >>
> > > > >> > >> > the KubernetesExecutor
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > Idea 1: Picking a pod_template_file per-task
> > > > >>
> > > > >>
> > > > >> > >> > Along with the existing customization with the executor
> > config,
> > > > >> solely
> > > > >>
> > > > >>
> > > > >> > >> > relying on pod files can allow users to pick the pod
> template
> > > > file
> > > > >>
> > > > >>
> > > > >> > that
> > > > >>
> > > > >>
> > > > >> > >> > they want to use as their base pod on a per-task basis. An
> > > > Airflow
> > > > >>
> > > > >>
> > > > >> > >> engineer
> > > > >>
> > > > >>
> > > > >> > >> > could supply several pre-made templates for their data
> > > scientists
> > > > >> to
> > > > >>
> > > > >>
> > > > >> > >> reduce
> > > > >>
> > > > >>
> > > > >> > >> > the amount of customization an airflow user would need to
> > use.
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > Idea 2: Merging the KubernetesExecutor into the
> > CeleryExecutor
> > > > >>
> > > > >>
> > > > >> > >> > One idea that we've been excited about recently has been
> the
> > > idea
> > > > >> of
> > > > >>
> > > > >>
> > > > >> > >> > creating merged Celery and Kubernetes executor. This hybrid
> > > > >> executor
> > > > >>
> > > > >>
> > > > >> > >> would
> > > > >>
> > > > >>
> > > > >> > >> > default to launching celery workers with KEDA and would
> have
> > > the
> > > > >>
> > > > >>
> > > > >> > option
> > > > >>
> > > > >>
> > > > >> > >> to
> > > > >>
> > > > >>
> > > > >> > >> > launch individual tasks using the Kubernetes executor when
> a
> > > user
> > > > >>
> > > > >>
> > > > >> > wants
> > > > >>
> > > > >>
> > > > >> > >> > isolation or customization. Simplifying the Kubernetes
> > executor
> > > > >>
> > > > >>
> > > > >> > reduces
> > > > >>
> > > > >>
> > > > >> > >> the
> > > > >>
> > > > >>
> > > > >> > >> > number of fail-points that this merged executor would need
> to
> > > > >> account
> > > > >>
> > > > >>
> > > > >> > >> for.
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > What would we need to do to implement this?
> > > > >>
> > > > >>
> > > > >> > >> > The good news here is that the hard work has already been
> > done!
> > > > As
> > > > >> of
> > > > >>
> > > > >>
> > > > >> > >> > AIRFLOW-5413 [
> > > https://issues.apache.org/jira/browse/AIRFLOW-5413
> > > > ]
> > > > >> by
> > > > >>
> > > > >>
> > > > >> > >> > David Lum, airflow already has the ability to use base
> worker
> > > > pods
> > > > >> on
> > > > >>
> > > > >>
> > > > >> > a
> > > > >>
> > > > >>
> > > > >> > >> > template file. This would involve primarily code deletion
> and
> > > > very
> > > > >>
> > > > >>
> > > > >> > >> little
> > > > >>
> > > > >>
> > > > >> > >> > new code.
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > Thank you for your time and I look forward to the
> community’s
> > > > >>
> > > > >>
> > > > >> > >> discussion.
> > > > >>
> > > > >>
> > > > >> > >> >
> > > > >>
> > > > >>
> > > > >> > >> > Daniel
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >> --
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >> Jarek Potiuk
> > > > >>
> > > > >>
> > > > >> > >> Polidea <https://www.polidea.com/> | Principal Software
> > Engineer
> > > > >>
> > > > >>
> > > > >> > >>
> > > > >>
> > > > >>
> > > > >> > >> M: +48 660 796 129 <+48660796129>
> > > > >>
> > > > >>
> > > > >> > >> [image: Polidea] <https://www.polidea.com/>
> > > > >>
> > > > >>
> > > > >> > >
> > > > >>
> > > > >>
> > > > >> > >
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> Jarek Potiuk
> > > > >>
> > > > >>
> > > > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> M: +48 660 796 129 <+48660796129>
> > > > >>
> > > > >>
> > > > >> [image: Polidea] <https://www.polidea.com/>
> > > > >>
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > >
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Simplifying the KubernetesExecutor

Posted by Kaxil Naik <ka...@gmail.com>.
+1 for queue param

On Thu, Aug 27, 2020, 04:38 Ping Zhang <pi...@umich.edu> wrote:

> Hi Daniel,
>
> Thanks for checking that. I like the idea of using the queue field too. I
> will make the change  and tag you in the PR.
>
> Best wishes
>
> Ping Zhang
>
>
> On Mon, Aug 24, 2020 at 9:10 AM Daniel Imberman <daniel.imberman@gmail.com
> >
> wrote:
>
> > Hi Ping,
> >
> > This looks great! The only change I might suggest is maybe we check the
> > “queue” field and set the router to KubernetesExecutor is
> > “queue=‘kubernetes’”. What do you think?
> >
> > via Newton Mail [
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > ]
> > On Mon, Aug 24, 2020 at 12:12 AM, Ping Zhang <pi...@umich.edu> wrote:
> > ah, did not know that. Thanks for letting me know.
> >
> > The image is not that important, all the important information is in the
> > gist:
> >
> >
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py
> >
> > Best wishes
> >
> > Ping Zhang
> >
> >
> > On Sun, Aug 23, 2020 at 11:59 PM Deng Xiaodong <xd...@gmail.com>
> > wrote:
> >
> > > Hi Ping,
> > >
> > > Inserting images in the email body doesn’t work for this mail list. So
> we
> > > cannot actually see the pictures you shared.
> > >
> > > If you find them important, you may share the pictures somewhere then
> > > provide the links in the mail body.
> > >
> > > Thanks.
> > >
> > > XD
> > >
> > > On Mon, Aug 24, 2020 at 08:56 Ping Zhang <pi...@umich.edu> wrote:
> > >
> > > > Great ideas.
> > > >
> > > > Actually, when I evaluated using KubernetesExecutor in Airbnb, I
> > created
> > > > *JobDispatcherExecutor*, which is similar to the idea 2.
> > > >
> > > > We have tons of sensor tasks launched at around UTC 00:00 and we
> > enabled
> > > > the SmartSensor, which means each sensor task is very short running.
> If
> > > we
> > > > choose to have one pod per task, the k8s cluster will not be able to
> > > handle
> > > > the bursty load.
> > > >
> > > > Thus, I created a JobDispatcherExecutor which takes an instance of
> > > > `celery_executor` and `kubernetes_executor`.
> > > > [image: image.png]
> > > > And there is a router
> > > > <
> > >
> >
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py-L165-L168
> > > >
> > > > method to determine use which executor to queue_command,
> > > > [image: image.png]
> > > > It is currently based on whether a task is a smart sensor task, but
> it
> > > can
> > > > be easily changed to other attributes and/or settings.policy (so that
> > > infra
> > > > can have control).
> > > >
> > > > Here is the gist:
> > > >
> > >
> >
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py
> > > >
> > > > [image: image.png]
> > > >
> > > >
> > > > Would love to get some feedback.
> > > >
> > > > Best wishes
> > > >
> > > > Ping Zhang
> > > >
> > > >
> > > > On Thu, Aug 13, 2020 at 12:58 PM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > > > wrote:
> > > >
> > > >> +1 for pod mutation in config. It's not a "yamly" thing - it's a
> > python
> > > >>
> > > >>
> > > >> code to run, so it's quite appropriate to have it in config.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Aug 12, 2020 at 9:43 PM Daniel Imberman <
> > > >> daniel.imberman@gmail.com>
> > > >>
> > > >>
> > > >> wrote:
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> > I think pod_mutation is important as an “admin override”. E.g.
> there
> > > >> might
> > > >>
> > > >>
> > > >> > be a label that is required by security policy for all tasks
> > launched
> > > by
> > > >>
> > > >>
> > > >> > airflow. I do agree that the executor_config should take in a
> V1Pod
> > > >> object
> > > >>
> > > >>
> > > >> >
> > > >>
> > > >>
> > > >> > One option is that the executor_config could take in a
> > > pod_mutation_hook
> > > >>
> > > >>
> > > >> > function as well? So users can manually override the pod
> > functionally
> > > >>
> > > >>
> > > >> > rather than just giving an object and hoping it merges the way
> they
> > > >> expect
> > > >>
> > > >>
> > > >> > it to?
> > > >>
> > > >>
> > > >> > via Newton Mail [
> > > >>
> > > >>
> > > >> >
> > > >>
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > > >>
> > > >>
> > > >> > ]
> > > >>
> > > >>
> > > >> > On Wed, Aug 12, 2020 at 12:19 PM, David Lum <davidlum91@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >>
> > > >> > I'll just leave some passing thoughts here. How the pod template
> > file
> > > is
> > > >>
> > > >>
> > > >> > modified by the executor config might not be very well defined.
> > > >> Currently
> > > >>
> > > >>
> > > >> > when a user passes in a k8s.V1Pod object through executor_config
> we
> > > >> kind of
> > > >>
> > > >>
> > > >> > arbitrarily merge the two pod definitions to what we imagine the
> > > client
> > > >>
> > > >>
> > > >> > would want, but it could be often misinterpreted. Previously the
> > > >>
> > > >>
> > > >> > Airflow-defined Pod class was fairly flat. This made merging the
> two
> > > >>
> > > >>
> > > >> > objects fairly straight forward. If the user wrote the field in
> > > >>
> > > >>
> > > >> > executor_config, it overwrote that field in the Pod definition,
> but
> > > the
> > > >>
> > > >>
> > > >> > rest stayed the same. When attributes are arbitrarily nested like
> in
> > > the
> > > >>
> > > >>
> > > >> > k8s.V1Pod, some aspects can be mistakenly overwritten. For
> example,
> > > >>
> > > >>
> > > >> > specifying the volumeMounts of a container requires also
> specifying
> > > the
> > > >>
> > > >>
> > > >> > rest of the V1Pod. How do we handle this? Do we just overwrite
> > > >>
> > > >>
> > > >> > volumeMounts? The client must have specified a container, so do we
> > > >>
> > > >>
> > > >> > overwrite that too? The fact that there is a choice means the API
> > > could
> > > >> be
> > > >>
> > > >>
> > > >> > unintuitive to the user.
> > > >>
> > > >>
> > > >> >
> > > >>
> > > >>
> > > >> > What I suggest might be easier is that executor_config completely
> > > >> replaces
> > > >>
> > > >>
> > > >> > the pod_template_file if it is specified, but then also offer
> > > something
> > > >>
> > > >>
> > > >> > like pod_mutation_hook at the operator level. This makes it quite
> > > clear
> > > >> to
> > > >>
> > > >>
> > > >> > the user, you are either replacing the pod_template, or you are
> > > >> modifying
> > > >>
> > > >>
> > > >> > the pod_template with pod_mutation_hook and you are doing it by
> your
> > > own
> > > >>
> > > >>
> > > >> > rules. the argument to executor_config can be something like
> > > >>
> > > >>
> > > >> > ```
> > > >>
> > > >>
> > > >> > executor_config: { 'KubernetesExector': {'pod_mutation':
> some_func,
> > > >>
> > > >>
> > > >> > 'pod_template': k8s.V1Pod(...) }} # Pod template could also be a
> > path
> > > >> to a
> > > >>
> > > >>
> > > >> > file
> > > >>
> > > >>
> > > >> > ```
> > > >>
> > > >>
> > > >> >
> > > >>
> > > >>
> > > >> > If both are specified then pod_mutation is a function applied to
> > > >>
> > > >>
> > > >> > pod_template. Thoughts?
> > > >>
> > > >>
> > > >> >
> > > >>
> > > >>
> > > >> > On Wed, Aug 12, 2020 at 11:24 AM Tomasz Urbaszek <
> > > turbaszek@apache.org>
> > > >>
> > > >>
> > > >> > wrote:
> > > >>
> > > >>
> > > >> >
> > > >>
> > > >>
> > > >> > > +1 for the idea, thanks Daniel! I agree that multi-executor and
> > pod
> > > >>
> > > >>
> > > >> > > templates should be 2.1.
> > > >>
> > > >>
> > > >> > >
> > > >>
> > > >>
> > > >> > > T.
> > > >>
> > > >>
> > > >> > >
> > > >>
> > > >>
> > > >> > > On Wed, Aug 12, 2020 at 5:13 PM Daniel Imberman <
> > > >>
> > > >>
> > > >> > daniel.imberman@gmail.com>
> > > >>
> > > >>
> > > >> > > wrote:
> > > >>
> > > >>
> > > >> > >
> > > >>
> > > >>
> > > >> > >> (Also funny enough we could use a lot of the existing
> > > infrastructure
> > > >> int
> > > >>
> > > >>
> > > >> > >> 1.10 to create that migration script. Would just need to take
> the
> > > >> python
> > > >>
> > > >>
> > > >> > >> dictionary and use the python yaml library)
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >> via Newton Mail [
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> >
> > > >>
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > > >>
> > > >>
> > > >> > >> ]
> > > >>
> > > >>
> > > >> > >> On Wed, Aug 12, 2020 at 8:09 AM, Daniel Imberman <
> > > >>
> > > >>
> > > >> > >> daniel.imberman@gmail.com> wrote:
> > > >>
> > > >>
> > > >> > >> 100% agreed on timing. I think 2.0 should be for the breaking
> > > aspect
> > > >>
> > > >>
> > > >> > >> (losing the configurations) and then 2.1/2.2 we can start
> adding
> > on
> > > >> the
> > > >>
> > > >>
> > > >> > new
> > > >>
> > > >>
> > > >> > >> features. I also like the idea of a migration tool. We can
> make a
> > > >> script
> > > >>
> > > >>
> > > >> > >> that takes your airflow.cfg and converts that into a yaml file.
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >> via Newton Mail [
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> >
> > > >>
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > > >>
> > > >>
> > > >> > >> ]
> > > >>
> > > >>
> > > >> > >> On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <
> > > >> Jarek.Potiuk@polidea.com
> > > >>
> > > >>
> > > >> > >
> > > >>
> > > >>
> > > >> > >> wrote:
> > > >>
> > > >>
> > > >> > >> Big +1. All the arguments are very appealing to me and
> > simplifying
> > > >> the
> > > >>
> > > >>
> > > >> > >> Kubernetes Executor down to YAML-configurable one seems like a
> > > >>
> > > >>
> > > >> > no-brainer
> > > >>
> > > >>
> > > >> > >> especially if we provide some migration tools. I've lost
> > countless
> > > >> hours
> > > >>
> > > >>
> > > >> > >> on
> > > >>
> > > >>
> > > >> > >> debugging some configuration problems, simply because the
> > relevant
> > > >>
> > > >>
> > > >> > >> Kubernetes-related configuration was in the least expected
> place
> > -
> > > >> i.e.
> > > >>
> > > >>
> > > >> > >> airflow.cfg. YAML configuration.
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >> I am also a big fan of both 1. and 2. I've implemented a POC of
> > > >>
> > > >>
> > > >> > >> queue-based multi-scheduler once but having it embedded as part
> > of
> > > >> core
> > > >>
> > > >>
> > > >> > >> Airflow rather than based it on queues (which are basically a
> > > Celery
> > > >>
> > > >>
> > > >> > >> Executor concept) is I think much better approach. Both 1. and
> 2.
> > > are
> > > >>
> > > >>
> > > >> > >> cool.
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >> Now - question about timing. If we decide to go that route - my
> > > view
> > > >> is
> > > >>
> > > >>
> > > >> > >> that simplifying Kubernetes should be an Airflow 2.0 task -
> > > alongside
> > > >>
> > > >>
> > > >> > more
> > > >>
> > > >>
> > > >> > >> comprehensive tests (which will be much easier to write in this
> > > >> case).
> > > >>
> > > >>
> > > >> > The
> > > >>
> > > >>
> > > >> > >> new features/ideas 1. 2. for KE I think should come after that
> -
> > > >> when we
> > > >>
> > > >>
> > > >> > >> release and stabilize 2.0. Sounds like great candidates for 2.1
> > to
> > > >> me.
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >> J.
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >> On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <
> > > >>
> > > >>
> > > >> > >> daniel.imberman@gmail.com>
> > > >>
> > > >>
> > > >> > >> wrote:
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >> > Hello, fellow Airflowers! I hope you are all well in these
> > trying
> > > >>
> > > >>
> > > >> > times.
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > With the recent launch of Airflow 2.0 preparation, it now
> seems
> > > >> like a
> > > >>
> > > >>
> > > >> > >> > good time to review the project's state and where we can fit
> in
> > > >> some
> > > >>
> > > >>
> > > >> > >> > breaking changes that will improve the project for the
> future.
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > When we first created the KubernetesExecutor, we had two
> goals
> > in
> > > >>
> > > >>
> > > >> > mind.
> > > >>
> > > >>
> > > >> > >> > The first goal was to improve the airflow Auto scaling story.
> > > >>
> > > >>
> > > >> > >> Previously,
> > > >>
> > > >>
> > > >> > >> > airflow users would have to manually provision celery
> workers,
> > > >> which
> > > >>
> > > >>
> > > >> > >> could
> > > >>
> > > >>
> > > >> > >> > lead to wasted resources or missed SLAs. The other goal was
> to
> > > >>
> > > >>
> > > >> > >> introduce a
> > > >>
> > > >>
> > > >> > >> > community that was not yet well versed in the Kubernetes API
> to
> > > the
> > > >>
> > > >>
> > > >> > >> > Kubernetes system.
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > To ease the community's transition, we abstracted many of the
> > > >>
> > > >>
> > > >> > >> complexities
> > > >>
> > > >>
> > > >> > >> > of creating a Kubernetes object. We chose to offer a limited
> > > >> number of
> > > >>
> > > >>
> > > >> > >> > configurations and keep much of the pod creation process
> > internal
> > > >> to
> > > >>
> > > >>
> > > >> > >> > airflow. In the short-term, this system lowered the barrier
> to
> > > >> entry.
> > > >>
> > > >>
> > > >> > >> Over
> > > >>
> > > >>
> > > >> > >> > time, however, this abstraction has become a nightmare of
> tech
> > > >> debt as
> > > >>
> > > >>
> > > >> > >> the
> > > >>
> > > >>
> > > >> > >> > Kubernetes API is expensive and constantly changing.
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > With this in mind, I think it's time for us to consider a
> more
> > > >>
> > > >>
> > > >> > >> > straightforward approach that takes the complexity out of
> > Airflow
> > > >> and
> > > >>
> > > >>
> > > >> > >> > offers the full Kubernetes API to the airflow user.
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > What I'm proposing here is pretty straightforward. We remove
> > all
> > > >>
> > > >>
> > > >> > >> > Kubernetes pod creation configurations from the airflow.cfg
> and
> > > >>
> > > >>
> > > >> > instead
> > > >>
> > > >>
> > > >> > >> > offer only one way to use the KubernetesExecutor: with a YAML
> > > file.
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > We can easily supply all of the configurations to the
> > > >>
> > > >>
> > > >> > KubernetesExecutor
> > > >>
> > > >>
> > > >> > >> > by offering example YAMLs (git sync mode is just a sidecar
> and
> > an
> > > >> init
> > > >>
> > > >>
> > > >> > >> > container, DAG volumes are just an example volume and volume
> > > mount,
> > > >>
> > > >>
> > > >> > >> etc.).
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > This system would simplify a user's ability to predict what a
> > pod
> > > >> will
> > > >>
> > > >>
> > > >> > >> > look like once it is launched by airflow. They will know
> it's a
> > > >> base
> > > >>
> > > >>
> > > >> > pod
> > > >>
> > > >>
> > > >> > >> > and will be able to simply modify the pod object using the
> > > executor
> > > >>
> > > >>
> > > >> > >> config
> > > >>
> > > >>
> > > >> > >> > and the pod mutation hook.
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > This simplification could also lead to some pretty great new
> > > >> features
> > > >>
> > > >>
> > > >> > in
> > > >>
> > > >>
> > > >> > >> > the KubernetesExecutor
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > Idea 1: Picking a pod_template_file per-task
> > > >>
> > > >>
> > > >> > >> > Along with the existing customization with the executor
> config,
> > > >> solely
> > > >>
> > > >>
> > > >> > >> > relying on pod files can allow users to pick the pod template
> > > file
> > > >>
> > > >>
> > > >> > that
> > > >>
> > > >>
> > > >> > >> > they want to use as their base pod on a per-task basis. An
> > > Airflow
> > > >>
> > > >>
> > > >> > >> engineer
> > > >>
> > > >>
> > > >> > >> > could supply several pre-made templates for their data
> > scientists
> > > >> to
> > > >>
> > > >>
> > > >> > >> reduce
> > > >>
> > > >>
> > > >> > >> > the amount of customization an airflow user would need to
> use.
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > Idea 2: Merging the KubernetesExecutor into the
> CeleryExecutor
> > > >>
> > > >>
> > > >> > >> > One idea that we've been excited about recently has been the
> > idea
> > > >> of
> > > >>
> > > >>
> > > >> > >> > creating merged Celery and Kubernetes executor. This hybrid
> > > >> executor
> > > >>
> > > >>
> > > >> > >> would
> > > >>
> > > >>
> > > >> > >> > default to launching celery workers with KEDA and would have
> > the
> > > >>
> > > >>
> > > >> > option
> > > >>
> > > >>
> > > >> > >> to
> > > >>
> > > >>
> > > >> > >> > launch individual tasks using the Kubernetes executor when a
> > user
> > > >>
> > > >>
> > > >> > wants
> > > >>
> > > >>
> > > >> > >> > isolation or customization. Simplifying the Kubernetes
> executor
> > > >>
> > > >>
> > > >> > reduces
> > > >>
> > > >>
> > > >> > >> the
> > > >>
> > > >>
> > > >> > >> > number of fail-points that this merged executor would need to
> > > >> account
> > > >>
> > > >>
> > > >> > >> for.
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > What would we need to do to implement this?
> > > >>
> > > >>
> > > >> > >> > The good news here is that the hard work has already been
> done!
> > > As
> > > >> of
> > > >>
> > > >>
> > > >> > >> > AIRFLOW-5413 [
> > https://issues.apache.org/jira/browse/AIRFLOW-5413
> > > ]
> > > >> by
> > > >>
> > > >>
> > > >> > >> > David Lum, airflow already has the ability to use base worker
> > > pods
> > > >> on
> > > >>
> > > >>
> > > >> > a
> > > >>
> > > >>
> > > >> > >> > template file. This would involve primarily code deletion and
> > > very
> > > >>
> > > >>
> > > >> > >> little
> > > >>
> > > >>
> > > >> > >> > new code.
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > Thank you for your time and I look forward to the community’s
> > > >>
> > > >>
> > > >> > >> discussion.
> > > >>
> > > >>
> > > >> > >> >
> > > >>
> > > >>
> > > >> > >> > Daniel
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >> --
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >> Jarek Potiuk
> > > >>
> > > >>
> > > >> > >> Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > > >>
> > > >>
> > > >> > >>
> > > >>
> > > >>
> > > >> > >> M: +48 660 796 129 <+48660796129>
> > > >>
> > > >>
> > > >> > >> [image: Polidea] <https://www.polidea.com/>
> > > >>
> > > >>
> > > >> > >
> > > >>
> > > >>
> > > >> > >
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> Jarek Potiuk
> > > >>
> > > >>
> > > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> M: +48 660 796 129 <+48660796129>
> > > >>
> > > >>
> > > >> [image: Polidea] <https://www.polidea.com/>
> > > >>
> > > >>
> > > >>
> > > >
> > > >
> > >
>

Re: Simplifying the KubernetesExecutor

Posted by Ping Zhang <pi...@umich.edu>.
Hi Daniel,

Thanks for checking that. I like the idea of using the queue field too. I
will make the change  and tag you in the PR.

Best wishes

Ping Zhang


On Mon, Aug 24, 2020 at 9:10 AM Daniel Imberman <da...@gmail.com>
wrote:

> Hi Ping,
>
> This looks great! The only change I might suggest is maybe we check the
> “queue” field and set the router to KubernetesExecutor is
> “queue=‘kubernetes’”. What do you think?
>
> via Newton Mail [
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> ]
> On Mon, Aug 24, 2020 at 12:12 AM, Ping Zhang <pi...@umich.edu> wrote:
> ah, did not know that. Thanks for letting me know.
>
> The image is not that important, all the important information is in the
> gist:
>
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py
>
> Best wishes
>
> Ping Zhang
>
>
> On Sun, Aug 23, 2020 at 11:59 PM Deng Xiaodong <xd...@gmail.com>
> wrote:
>
> > Hi Ping,
> >
> > Inserting images in the email body doesn’t work for this mail list. So we
> > cannot actually see the pictures you shared.
> >
> > If you find them important, you may share the pictures somewhere then
> > provide the links in the mail body.
> >
> > Thanks.
> >
> > XD
> >
> > On Mon, Aug 24, 2020 at 08:56 Ping Zhang <pi...@umich.edu> wrote:
> >
> > > Great ideas.
> > >
> > > Actually, when I evaluated using KubernetesExecutor in Airbnb, I
> created
> > > *JobDispatcherExecutor*, which is similar to the idea 2.
> > >
> > > We have tons of sensor tasks launched at around UTC 00:00 and we
> enabled
> > > the SmartSensor, which means each sensor task is very short running. If
> > we
> > > choose to have one pod per task, the k8s cluster will not be able to
> > handle
> > > the bursty load.
> > >
> > > Thus, I created a JobDispatcherExecutor which takes an instance of
> > > `celery_executor` and `kubernetes_executor`.
> > > [image: image.png]
> > > And there is a router
> > > <
> >
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py-L165-L168
> > >
> > > method to determine use which executor to queue_command,
> > > [image: image.png]
> > > It is currently based on whether a task is a smart sensor task, but it
> > can
> > > be easily changed to other attributes and/or settings.policy (so that
> > infra
> > > can have control).
> > >
> > > Here is the gist:
> > >
> >
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py
> > >
> > > [image: image.png]
> > >
> > >
> > > Would love to get some feedback.
> > >
> > > Best wishes
> > >
> > > Ping Zhang
> > >
> > >
> > > On Thu, Aug 13, 2020 at 12:58 PM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > > wrote:
> > >
> > >> +1 for pod mutation in config. It's not a "yamly" thing - it's a
> python
> > >>
> > >>
> > >> code to run, so it's quite appropriate to have it in config.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Wed, Aug 12, 2020 at 9:43 PM Daniel Imberman <
> > >> daniel.imberman@gmail.com>
> > >>
> > >>
> > >> wrote:
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> > I think pod_mutation is important as an “admin override”. E.g. there
> > >> might
> > >>
> > >>
> > >> > be a label that is required by security policy for all tasks
> launched
> > by
> > >>
> > >>
> > >> > airflow. I do agree that the executor_config should take in a V1Pod
> > >> object
> > >>
> > >>
> > >> >
> > >>
> > >>
> > >> > One option is that the executor_config could take in a
> > pod_mutation_hook
> > >>
> > >>
> > >> > function as well? So users can manually override the pod
> functionally
> > >>
> > >>
> > >> > rather than just giving an object and hoping it merges the way they
> > >> expect
> > >>
> > >>
> > >> > it to?
> > >>
> > >>
> > >> > via Newton Mail [
> > >>
> > >>
> > >> >
> > >>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > >>
> > >>
> > >> > ]
> > >>
> > >>
> > >> > On Wed, Aug 12, 2020 at 12:19 PM, David Lum <da...@gmail.com>
> > >> wrote:
> > >>
> > >>
> > >> > I'll just leave some passing thoughts here. How the pod template
> file
> > is
> > >>
> > >>
> > >> > modified by the executor config might not be very well defined.
> > >> Currently
> > >>
> > >>
> > >> > when a user passes in a k8s.V1Pod object through executor_config we
> > >> kind of
> > >>
> > >>
> > >> > arbitrarily merge the two pod definitions to what we imagine the
> > client
> > >>
> > >>
> > >> > would want, but it could be often misinterpreted. Previously the
> > >>
> > >>
> > >> > Airflow-defined Pod class was fairly flat. This made merging the two
> > >>
> > >>
> > >> > objects fairly straight forward. If the user wrote the field in
> > >>
> > >>
> > >> > executor_config, it overwrote that field in the Pod definition, but
> > the
> > >>
> > >>
> > >> > rest stayed the same. When attributes are arbitrarily nested like in
> > the
> > >>
> > >>
> > >> > k8s.V1Pod, some aspects can be mistakenly overwritten. For example,
> > >>
> > >>
> > >> > specifying the volumeMounts of a container requires also specifying
> > the
> > >>
> > >>
> > >> > rest of the V1Pod. How do we handle this? Do we just overwrite
> > >>
> > >>
> > >> > volumeMounts? The client must have specified a container, so do we
> > >>
> > >>
> > >> > overwrite that too? The fact that there is a choice means the API
> > could
> > >> be
> > >>
> > >>
> > >> > unintuitive to the user.
> > >>
> > >>
> > >> >
> > >>
> > >>
> > >> > What I suggest might be easier is that executor_config completely
> > >> replaces
> > >>
> > >>
> > >> > the pod_template_file if it is specified, but then also offer
> > something
> > >>
> > >>
> > >> > like pod_mutation_hook at the operator level. This makes it quite
> > clear
> > >> to
> > >>
> > >>
> > >> > the user, you are either replacing the pod_template, or you are
> > >> modifying
> > >>
> > >>
> > >> > the pod_template with pod_mutation_hook and you are doing it by your
> > own
> > >>
> > >>
> > >> > rules. the argument to executor_config can be something like
> > >>
> > >>
> > >> > ```
> > >>
> > >>
> > >> > executor_config: { 'KubernetesExector': {'pod_mutation': some_func,
> > >>
> > >>
> > >> > 'pod_template': k8s.V1Pod(...) }} # Pod template could also be a
> path
> > >> to a
> > >>
> > >>
> > >> > file
> > >>
> > >>
> > >> > ```
> > >>
> > >>
> > >> >
> > >>
> > >>
> > >> > If both are specified then pod_mutation is a function applied to
> > >>
> > >>
> > >> > pod_template. Thoughts?
> > >>
> > >>
> > >> >
> > >>
> > >>
> > >> > On Wed, Aug 12, 2020 at 11:24 AM Tomasz Urbaszek <
> > turbaszek@apache.org>
> > >>
> > >>
> > >> > wrote:
> > >>
> > >>
> > >> >
> > >>
> > >>
> > >> > > +1 for the idea, thanks Daniel! I agree that multi-executor and
> pod
> > >>
> > >>
> > >> > > templates should be 2.1.
> > >>
> > >>
> > >> > >
> > >>
> > >>
> > >> > > T.
> > >>
> > >>
> > >> > >
> > >>
> > >>
> > >> > > On Wed, Aug 12, 2020 at 5:13 PM Daniel Imberman <
> > >>
> > >>
> > >> > daniel.imberman@gmail.com>
> > >>
> > >>
> > >> > > wrote:
> > >>
> > >>
> > >> > >
> > >>
> > >>
> > >> > >> (Also funny enough we could use a lot of the existing
> > infrastructure
> > >> int
> > >>
> > >>
> > >> > >> 1.10 to create that migration script. Would just need to take the
> > >> python
> > >>
> > >>
> > >> > >> dictionary and use the python yaml library)
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >> via Newton Mail [
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> >
> > >>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > >>
> > >>
> > >> > >> ]
> > >>
> > >>
> > >> > >> On Wed, Aug 12, 2020 at 8:09 AM, Daniel Imberman <
> > >>
> > >>
> > >> > >> daniel.imberman@gmail.com> wrote:
> > >>
> > >>
> > >> > >> 100% agreed on timing. I think 2.0 should be for the breaking
> > aspect
> > >>
> > >>
> > >> > >> (losing the configurations) and then 2.1/2.2 we can start adding
> on
> > >> the
> > >>
> > >>
> > >> > new
> > >>
> > >>
> > >> > >> features. I also like the idea of a migration tool. We can make a
> > >> script
> > >>
> > >>
> > >> > >> that takes your airflow.cfg and converts that into a yaml file.
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >> via Newton Mail [
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> >
> > >>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > >>
> > >>
> > >> > >> ]
> > >>
> > >>
> > >> > >> On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <
> > >> Jarek.Potiuk@polidea.com
> > >>
> > >>
> > >> > >
> > >>
> > >>
> > >> > >> wrote:
> > >>
> > >>
> > >> > >> Big +1. All the arguments are very appealing to me and
> simplifying
> > >> the
> > >>
> > >>
> > >> > >> Kubernetes Executor down to YAML-configurable one seems like a
> > >>
> > >>
> > >> > no-brainer
> > >>
> > >>
> > >> > >> especially if we provide some migration tools. I've lost
> countless
> > >> hours
> > >>
> > >>
> > >> > >> on
> > >>
> > >>
> > >> > >> debugging some configuration problems, simply because the
> relevant
> > >>
> > >>
> > >> > >> Kubernetes-related configuration was in the least expected place
> -
> > >> i.e.
> > >>
> > >>
> > >> > >> airflow.cfg. YAML configuration.
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >> I am also a big fan of both 1. and 2. I've implemented a POC of
> > >>
> > >>
> > >> > >> queue-based multi-scheduler once but having it embedded as part
> of
> > >> core
> > >>
> > >>
> > >> > >> Airflow rather than based it on queues (which are basically a
> > Celery
> > >>
> > >>
> > >> > >> Executor concept) is I think much better approach. Both 1. and 2.
> > are
> > >>
> > >>
> > >> > >> cool.
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >> Now - question about timing. If we decide to go that route - my
> > view
> > >> is
> > >>
> > >>
> > >> > >> that simplifying Kubernetes should be an Airflow 2.0 task -
> > alongside
> > >>
> > >>
> > >> > more
> > >>
> > >>
> > >> > >> comprehensive tests (which will be much easier to write in this
> > >> case).
> > >>
> > >>
> > >> > The
> > >>
> > >>
> > >> > >> new features/ideas 1. 2. for KE I think should come after that -
> > >> when we
> > >>
> > >>
> > >> > >> release and stabilize 2.0. Sounds like great candidates for 2.1
> to
> > >> me.
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >> J.
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >> On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <
> > >>
> > >>
> > >> > >> daniel.imberman@gmail.com>
> > >>
> > >>
> > >> > >> wrote:
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >> > Hello, fellow Airflowers! I hope you are all well in these
> trying
> > >>
> > >>
> > >> > times.
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > With the recent launch of Airflow 2.0 preparation, it now seems
> > >> like a
> > >>
> > >>
> > >> > >> > good time to review the project's state and where we can fit in
> > >> some
> > >>
> > >>
> > >> > >> > breaking changes that will improve the project for the future.
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > When we first created the KubernetesExecutor, we had two goals
> in
> > >>
> > >>
> > >> > mind.
> > >>
> > >>
> > >> > >> > The first goal was to improve the airflow Auto scaling story.
> > >>
> > >>
> > >> > >> Previously,
> > >>
> > >>
> > >> > >> > airflow users would have to manually provision celery workers,
> > >> which
> > >>
> > >>
> > >> > >> could
> > >>
> > >>
> > >> > >> > lead to wasted resources or missed SLAs. The other goal was to
> > >>
> > >>
> > >> > >> introduce a
> > >>
> > >>
> > >> > >> > community that was not yet well versed in the Kubernetes API to
> > the
> > >>
> > >>
> > >> > >> > Kubernetes system.
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > To ease the community's transition, we abstracted many of the
> > >>
> > >>
> > >> > >> complexities
> > >>
> > >>
> > >> > >> > of creating a Kubernetes object. We chose to offer a limited
> > >> number of
> > >>
> > >>
> > >> > >> > configurations and keep much of the pod creation process
> internal
> > >> to
> > >>
> > >>
> > >> > >> > airflow. In the short-term, this system lowered the barrier to
> > >> entry.
> > >>
> > >>
> > >> > >> Over
> > >>
> > >>
> > >> > >> > time, however, this abstraction has become a nightmare of tech
> > >> debt as
> > >>
> > >>
> > >> > >> the
> > >>
> > >>
> > >> > >> > Kubernetes API is expensive and constantly changing.
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > With this in mind, I think it's time for us to consider a more
> > >>
> > >>
> > >> > >> > straightforward approach that takes the complexity out of
> Airflow
> > >> and
> > >>
> > >>
> > >> > >> > offers the full Kubernetes API to the airflow user.
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > What I'm proposing here is pretty straightforward. We remove
> all
> > >>
> > >>
> > >> > >> > Kubernetes pod creation configurations from the airflow.cfg and
> > >>
> > >>
> > >> > instead
> > >>
> > >>
> > >> > >> > offer only one way to use the KubernetesExecutor: with a YAML
> > file.
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > We can easily supply all of the configurations to the
> > >>
> > >>
> > >> > KubernetesExecutor
> > >>
> > >>
> > >> > >> > by offering example YAMLs (git sync mode is just a sidecar and
> an
> > >> init
> > >>
> > >>
> > >> > >> > container, DAG volumes are just an example volume and volume
> > mount,
> > >>
> > >>
> > >> > >> etc.).
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > This system would simplify a user's ability to predict what a
> pod
> > >> will
> > >>
> > >>
> > >> > >> > look like once it is launched by airflow. They will know it's a
> > >> base
> > >>
> > >>
> > >> > pod
> > >>
> > >>
> > >> > >> > and will be able to simply modify the pod object using the
> > executor
> > >>
> > >>
> > >> > >> config
> > >>
> > >>
> > >> > >> > and the pod mutation hook.
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > This simplification could also lead to some pretty great new
> > >> features
> > >>
> > >>
> > >> > in
> > >>
> > >>
> > >> > >> > the KubernetesExecutor
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > Idea 1: Picking a pod_template_file per-task
> > >>
> > >>
> > >> > >> > Along with the existing customization with the executor config,
> > >> solely
> > >>
> > >>
> > >> > >> > relying on pod files can allow users to pick the pod template
> > file
> > >>
> > >>
> > >> > that
> > >>
> > >>
> > >> > >> > they want to use as their base pod on a per-task basis. An
> > Airflow
> > >>
> > >>
> > >> > >> engineer
> > >>
> > >>
> > >> > >> > could supply several pre-made templates for their data
> scientists
> > >> to
> > >>
> > >>
> > >> > >> reduce
> > >>
> > >>
> > >> > >> > the amount of customization an airflow user would need to use.
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
> > >>
> > >>
> > >> > >> > One idea that we've been excited about recently has been the
> idea
> > >> of
> > >>
> > >>
> > >> > >> > creating merged Celery and Kubernetes executor. This hybrid
> > >> executor
> > >>
> > >>
> > >> > >> would
> > >>
> > >>
> > >> > >> > default to launching celery workers with KEDA and would have
> the
> > >>
> > >>
> > >> > option
> > >>
> > >>
> > >> > >> to
> > >>
> > >>
> > >> > >> > launch individual tasks using the Kubernetes executor when a
> user
> > >>
> > >>
> > >> > wants
> > >>
> > >>
> > >> > >> > isolation or customization. Simplifying the Kubernetes executor
> > >>
> > >>
> > >> > reduces
> > >>
> > >>
> > >> > >> the
> > >>
> > >>
> > >> > >> > number of fail-points that this merged executor would need to
> > >> account
> > >>
> > >>
> > >> > >> for.
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > What would we need to do to implement this?
> > >>
> > >>
> > >> > >> > The good news here is that the hard work has already been done!
> > As
> > >> of
> > >>
> > >>
> > >> > >> > AIRFLOW-5413 [
> https://issues.apache.org/jira/browse/AIRFLOW-5413
> > ]
> > >> by
> > >>
> > >>
> > >> > >> > David Lum, airflow already has the ability to use base worker
> > pods
> > >> on
> > >>
> > >>
> > >> > a
> > >>
> > >>
> > >> > >> > template file. This would involve primarily code deletion and
> > very
> > >>
> > >>
> > >> > >> little
> > >>
> > >>
> > >> > >> > new code.
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > Thank you for your time and I look forward to the community’s
> > >>
> > >>
> > >> > >> discussion.
> > >>
> > >>
> > >> > >> >
> > >>
> > >>
> > >> > >> > Daniel
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >> --
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >> Jarek Potiuk
> > >>
> > >>
> > >> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>
> > >>
> > >> > >>
> > >>
> > >>
> > >> > >> M: +48 660 796 129 <+48660796129>
> > >>
> > >>
> > >> > >> [image: Polidea] <https://www.polidea.com/>
> > >>
> > >>
> > >> > >
> > >>
> > >>
> > >> > >
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> Jarek Potiuk
> > >>
> > >>
> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> M: +48 660 796 129 <+48660796129>
> > >>
> > >>
> > >> [image: Polidea] <https://www.polidea.com/>
> > >>
> > >>
> > >>
> > >
> > >
> >

Re: Simplifying the KubernetesExecutor

Posted by Daniel Imberman <da...@gmail.com>.
Hi Ping,

This looks great! The only change I might suggest is maybe we check the “queue” field and set the router to KubernetesExecutor is “queue=‘kubernetes’”. What do you think?

via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2]
On Mon, Aug 24, 2020 at 12:12 AM, Ping Zhang <pi...@umich.edu> wrote:
ah, did not know that. Thanks for letting me know.

The image is not that important, all the important information is in the
gist:
https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py

Best wishes

Ping Zhang


On Sun, Aug 23, 2020 at 11:59 PM Deng Xiaodong <xd...@gmail.com> wrote:

> Hi Ping,
>
> Inserting images in the email body doesn’t work for this mail list. So we
> cannot actually see the pictures you shared.
>
> If you find them important, you may share the pictures somewhere then
> provide the links in the mail body.
>
> Thanks.
>
> XD
>
> On Mon, Aug 24, 2020 at 08:56 Ping Zhang <pi...@umich.edu> wrote:
>
> > Great ideas.
> >
> > Actually, when I evaluated using KubernetesExecutor in Airbnb, I created
> > *JobDispatcherExecutor*, which is similar to the idea 2.
> >
> > We have tons of sensor tasks launched at around UTC 00:00 and we enabled
> > the SmartSensor, which means each sensor task is very short running. If
> we
> > choose to have one pod per task, the k8s cluster will not be able to
> handle
> > the bursty load.
> >
> > Thus, I created a JobDispatcherExecutor which takes an instance of
> > `celery_executor` and `kubernetes_executor`.
> > [image: image.png]
> > And there is a router
> > <
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py-L165-L168
> >
> > method to determine use which executor to queue_command,
> > [image: image.png]
> > It is currently based on whether a task is a smart sensor task, but it
> can
> > be easily changed to other attributes and/or settings.policy (so that
> infra
> > can have control).
> >
> > Here is the gist:
> >
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py
> >
> > [image: image.png]
> >
> >
> > Would love to get some feedback.
> >
> > Best wishes
> >
> > Ping Zhang
> >
> >
> > On Thu, Aug 13, 2020 at 12:58 PM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> >> +1 for pod mutation in config. It's not a "yamly" thing - it's a python
> >>
> >>
> >> code to run, so it's quite appropriate to have it in config.
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Aug 12, 2020 at 9:43 PM Daniel Imberman <
> >> daniel.imberman@gmail.com>
> >>
> >>
> >> wrote:
> >>
> >>
> >>
> >>
> >>
> >> > I think pod_mutation is important as an “admin override”. E.g. there
> >> might
> >>
> >>
> >> > be a label that is required by security policy for all tasks launched
> by
> >>
> >>
> >> > airflow. I do agree that the executor_config should take in a V1Pod
> >> object
> >>
> >>
> >> >
> >>
> >>
> >> > One option is that the executor_config could take in a
> pod_mutation_hook
> >>
> >>
> >> > function as well? So users can manually override the pod functionally
> >>
> >>
> >> > rather than just giving an object and hoping it merges the way they
> >> expect
> >>
> >>
> >> > it to?
> >>
> >>
> >> > via Newton Mail [
> >>
> >>
> >> >
> >>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> >>
> >>
> >> > ]
> >>
> >>
> >> > On Wed, Aug 12, 2020 at 12:19 PM, David Lum <da...@gmail.com>
> >> wrote:
> >>
> >>
> >> > I'll just leave some passing thoughts here. How the pod template file
> is
> >>
> >>
> >> > modified by the executor config might not be very well defined.
> >> Currently
> >>
> >>
> >> > when a user passes in a k8s.V1Pod object through executor_config we
> >> kind of
> >>
> >>
> >> > arbitrarily merge the two pod definitions to what we imagine the
> client
> >>
> >>
> >> > would want, but it could be often misinterpreted. Previously the
> >>
> >>
> >> > Airflow-defined Pod class was fairly flat. This made merging the two
> >>
> >>
> >> > objects fairly straight forward. If the user wrote the field in
> >>
> >>
> >> > executor_config, it overwrote that field in the Pod definition, but
> the
> >>
> >>
> >> > rest stayed the same. When attributes are arbitrarily nested like in
> the
> >>
> >>
> >> > k8s.V1Pod, some aspects can be mistakenly overwritten. For example,
> >>
> >>
> >> > specifying the volumeMounts of a container requires also specifying
> the
> >>
> >>
> >> > rest of the V1Pod. How do we handle this? Do we just overwrite
> >>
> >>
> >> > volumeMounts? The client must have specified a container, so do we
> >>
> >>
> >> > overwrite that too? The fact that there is a choice means the API
> could
> >> be
> >>
> >>
> >> > unintuitive to the user.
> >>
> >>
> >> >
> >>
> >>
> >> > What I suggest might be easier is that executor_config completely
> >> replaces
> >>
> >>
> >> > the pod_template_file if it is specified, but then also offer
> something
> >>
> >>
> >> > like pod_mutation_hook at the operator level. This makes it quite
> clear
> >> to
> >>
> >>
> >> > the user, you are either replacing the pod_template, or you are
> >> modifying
> >>
> >>
> >> > the pod_template with pod_mutation_hook and you are doing it by your
> own
> >>
> >>
> >> > rules. the argument to executor_config can be something like
> >>
> >>
> >> > ```
> >>
> >>
> >> > executor_config: { 'KubernetesExector': {'pod_mutation': some_func,
> >>
> >>
> >> > 'pod_template': k8s.V1Pod(...) }} # Pod template could also be a path
> >> to a
> >>
> >>
> >> > file
> >>
> >>
> >> > ```
> >>
> >>
> >> >
> >>
> >>
> >> > If both are specified then pod_mutation is a function applied to
> >>
> >>
> >> > pod_template. Thoughts?
> >>
> >>
> >> >
> >>
> >>
> >> > On Wed, Aug 12, 2020 at 11:24 AM Tomasz Urbaszek <
> turbaszek@apache.org>
> >>
> >>
> >> > wrote:
> >>
> >>
> >> >
> >>
> >>
> >> > > +1 for the idea, thanks Daniel! I agree that multi-executor and pod
> >>
> >>
> >> > > templates should be 2.1.
> >>
> >>
> >> > >
> >>
> >>
> >> > > T.
> >>
> >>
> >> > >
> >>
> >>
> >> > > On Wed, Aug 12, 2020 at 5:13 PM Daniel Imberman <
> >>
> >>
> >> > daniel.imberman@gmail.com>
> >>
> >>
> >> > > wrote:
> >>
> >>
> >> > >
> >>
> >>
> >> > >> (Also funny enough we could use a lot of the existing
> infrastructure
> >> int
> >>
> >>
> >> > >> 1.10 to create that migration script. Would just need to take the
> >> python
> >>
> >>
> >> > >> dictionary and use the python yaml library)
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> via Newton Mail [
> >>
> >>
> >> > >>
> >>
> >>
> >> >
> >>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> >>
> >>
> >> > >> ]
> >>
> >>
> >> > >> On Wed, Aug 12, 2020 at 8:09 AM, Daniel Imberman <
> >>
> >>
> >> > >> daniel.imberman@gmail.com> wrote:
> >>
> >>
> >> > >> 100% agreed on timing. I think 2.0 should be for the breaking
> aspect
> >>
> >>
> >> > >> (losing the configurations) and then 2.1/2.2 we can start adding on
> >> the
> >>
> >>
> >> > new
> >>
> >>
> >> > >> features. I also like the idea of a migration tool. We can make a
> >> script
> >>
> >>
> >> > >> that takes your airflow.cfg and converts that into a yaml file.
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> via Newton Mail [
> >>
> >>
> >> > >>
> >>
> >>
> >> >
> >>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> >>
> >>
> >> > >> ]
> >>
> >>
> >> > >> On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <
> >> Jarek.Potiuk@polidea.com
> >>
> >>
> >> > >
> >>
> >>
> >> > >> wrote:
> >>
> >>
> >> > >> Big +1. All the arguments are very appealing to me and simplifying
> >> the
> >>
> >>
> >> > >> Kubernetes Executor down to YAML-configurable one seems like a
> >>
> >>
> >> > no-brainer
> >>
> >>
> >> > >> especially if we provide some migration tools. I've lost countless
> >> hours
> >>
> >>
> >> > >> on
> >>
> >>
> >> > >> debugging some configuration problems, simply because the relevant
> >>
> >>
> >> > >> Kubernetes-related configuration was in the least expected place -
> >> i.e.
> >>
> >>
> >> > >> airflow.cfg. YAML configuration.
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> I am also a big fan of both 1. and 2. I've implemented a POC of
> >>
> >>
> >> > >> queue-based multi-scheduler once but having it embedded as part of
> >> core
> >>
> >>
> >> > >> Airflow rather than based it on queues (which are basically a
> Celery
> >>
> >>
> >> > >> Executor concept) is I think much better approach. Both 1. and 2.
> are
> >>
> >>
> >> > >> cool.
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> Now - question about timing. If we decide to go that route - my
> view
> >> is
> >>
> >>
> >> > >> that simplifying Kubernetes should be an Airflow 2.0 task -
> alongside
> >>
> >>
> >> > more
> >>
> >>
> >> > >> comprehensive tests (which will be much easier to write in this
> >> case).
> >>
> >>
> >> > The
> >>
> >>
> >> > >> new features/ideas 1. 2. for KE I think should come after that -
> >> when we
> >>
> >>
> >> > >> release and stabilize 2.0. Sounds like great candidates for 2.1 to
> >> me.
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> J.
> >>
> >>
> >> > >>
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <
> >>
> >>
> >> > >> daniel.imberman@gmail.com>
> >>
> >>
> >> > >> wrote:
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> > Hello, fellow Airflowers! I hope you are all well in these trying
> >>
> >>
> >> > times.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > With the recent launch of Airflow 2.0 preparation, it now seems
> >> like a
> >>
> >>
> >> > >> > good time to review the project's state and where we can fit in
> >> some
> >>
> >>
> >> > >> > breaking changes that will improve the project for the future.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > When we first created the KubernetesExecutor, we had two goals in
> >>
> >>
> >> > mind.
> >>
> >>
> >> > >> > The first goal was to improve the airflow Auto scaling story.
> >>
> >>
> >> > >> Previously,
> >>
> >>
> >> > >> > airflow users would have to manually provision celery workers,
> >> which
> >>
> >>
> >> > >> could
> >>
> >>
> >> > >> > lead to wasted resources or missed SLAs. The other goal was to
> >>
> >>
> >> > >> introduce a
> >>
> >>
> >> > >> > community that was not yet well versed in the Kubernetes API to
> the
> >>
> >>
> >> > >> > Kubernetes system.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > To ease the community's transition, we abstracted many of the
> >>
> >>
> >> > >> complexities
> >>
> >>
> >> > >> > of creating a Kubernetes object. We chose to offer a limited
> >> number of
> >>
> >>
> >> > >> > configurations and keep much of the pod creation process internal
> >> to
> >>
> >>
> >> > >> > airflow. In the short-term, this system lowered the barrier to
> >> entry.
> >>
> >>
> >> > >> Over
> >>
> >>
> >> > >> > time, however, this abstraction has become a nightmare of tech
> >> debt as
> >>
> >>
> >> > >> the
> >>
> >>
> >> > >> > Kubernetes API is expensive and constantly changing.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > With this in mind, I think it's time for us to consider a more
> >>
> >>
> >> > >> > straightforward approach that takes the complexity out of Airflow
> >> and
> >>
> >>
> >> > >> > offers the full Kubernetes API to the airflow user.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > What I'm proposing here is pretty straightforward. We remove all
> >>
> >>
> >> > >> > Kubernetes pod creation configurations from the airflow.cfg and
> >>
> >>
> >> > instead
> >>
> >>
> >> > >> > offer only one way to use the KubernetesExecutor: with a YAML
> file.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > We can easily supply all of the configurations to the
> >>
> >>
> >> > KubernetesExecutor
> >>
> >>
> >> > >> > by offering example YAMLs (git sync mode is just a sidecar and an
> >> init
> >>
> >>
> >> > >> > container, DAG volumes are just an example volume and volume
> mount,
> >>
> >>
> >> > >> etc.).
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > This system would simplify a user's ability to predict what a pod
> >> will
> >>
> >>
> >> > >> > look like once it is launched by airflow. They will know it's a
> >> base
> >>
> >>
> >> > pod
> >>
> >>
> >> > >> > and will be able to simply modify the pod object using the
> executor
> >>
> >>
> >> > >> config
> >>
> >>
> >> > >> > and the pod mutation hook.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > This simplification could also lead to some pretty great new
> >> features
> >>
> >>
> >> > in
> >>
> >>
> >> > >> > the KubernetesExecutor
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > Idea 1: Picking a pod_template_file per-task
> >>
> >>
> >> > >> > Along with the existing customization with the executor config,
> >> solely
> >>
> >>
> >> > >> > relying on pod files can allow users to pick the pod template
> file
> >>
> >>
> >> > that
> >>
> >>
> >> > >> > they want to use as their base pod on a per-task basis. An
> Airflow
> >>
> >>
> >> > >> engineer
> >>
> >>
> >> > >> > could supply several pre-made templates for their data scientists
> >> to
> >>
> >>
> >> > >> reduce
> >>
> >>
> >> > >> > the amount of customization an airflow user would need to use.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
> >>
> >>
> >> > >> > One idea that we've been excited about recently has been the idea
> >> of
> >>
> >>
> >> > >> > creating merged Celery and Kubernetes executor. This hybrid
> >> executor
> >>
> >>
> >> > >> would
> >>
> >>
> >> > >> > default to launching celery workers with KEDA and would have the
> >>
> >>
> >> > option
> >>
> >>
> >> > >> to
> >>
> >>
> >> > >> > launch individual tasks using the Kubernetes executor when a user
> >>
> >>
> >> > wants
> >>
> >>
> >> > >> > isolation or customization. Simplifying the Kubernetes executor
> >>
> >>
> >> > reduces
> >>
> >>
> >> > >> the
> >>
> >>
> >> > >> > number of fail-points that this merged executor would need to
> >> account
> >>
> >>
> >> > >> for.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > What would we need to do to implement this?
> >>
> >>
> >> > >> > The good news here is that the hard work has already been done!
> As
> >> of
> >>
> >>
> >> > >> > AIRFLOW-5413 [https://issues.apache.org/jira/browse/AIRFLOW-5413
> ]
> >> by
> >>
> >>
> >> > >> > David Lum, airflow already has the ability to use base worker
> pods
> >> on
> >>
> >>
> >> > a
> >>
> >>
> >> > >> > template file. This would involve primarily code deletion and
> very
> >>
> >>
> >> > >> little
> >>
> >>
> >> > >> > new code.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > Thank you for your time and I look forward to the community’s
> >>
> >>
> >> > >> discussion.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > Daniel
> >>
> >>
> >> > >>
> >>
> >>
> >> > >>
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> --
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> Jarek Potiuk
> >>
> >>
> >> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> M: +48 660 796 129 <+48660796129>
> >>
> >>
> >> > >> [image: Polidea] <https://www.polidea.com/>
> >>
> >>
> >> > >
> >>
> >>
> >> > >
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >>
> >>
> >>
> >>
> >> Jarek Potiuk
> >>
> >>
> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >>
> >>
> >>
> >>
> >> M: +48 660 796 129 <+48660796129>
> >>
> >>
> >> [image: Polidea] <https://www.polidea.com/>
> >>
> >>
> >>
> >
> >
>

Re: Simplifying the KubernetesExecutor

Posted by Ping Zhang <pi...@umich.edu>.
ah, did not know that. Thanks for letting me know.

The image is not that important, all the important information is in the
gist:
https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py

Best wishes

Ping Zhang


On Sun, Aug 23, 2020 at 11:59 PM Deng Xiaodong <xd...@gmail.com> wrote:

> Hi Ping,
>
> Inserting images in the email body doesn’t work for this mail list. So we
> cannot actually see the pictures you shared.
>
> If you find them important, you may share the pictures somewhere then
> provide the links in the mail body.
>
> Thanks.
>
> XD
>
> On Mon, Aug 24, 2020 at 08:56 Ping Zhang <pi...@umich.edu> wrote:
>
> > Great ideas.
> >
> > Actually, when I evaluated using KubernetesExecutor in Airbnb, I created
> > *JobDispatcherExecutor*, which is similar to the idea 2.
> >
> > We have tons of sensor tasks launched at around UTC 00:00 and we enabled
> > the SmartSensor, which means each sensor task is very short running. If
> we
> > choose to have one pod per task, the k8s cluster will not be able to
> handle
> > the bursty load.
> >
> > Thus, I created a JobDispatcherExecutor which takes an instance of
> > `celery_executor` and `kubernetes_executor`.
> > [image: image.png]
> > And there is a router
> > <
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py-L165-L168
> >
> > method to determine use which executor to queue_command,
> > [image: image.png]
> > It is currently based on whether a task is a smart sensor task, but it
> can
> > be easily changed to other attributes and/or settings.policy (so that
> infra
> > can have control).
> >
> > Here is the gist:
> >
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py
> >
> > [image: image.png]
> >
> >
> > Would love to get some feedback.
> >
> > Best wishes
> >
> > Ping Zhang
> >
> >
> > On Thu, Aug 13, 2020 at 12:58 PM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> >> +1 for pod mutation in config. It's not a "yamly" thing - it's a python
> >>
> >>
> >> code to run, so it's quite appropriate to have it in config.
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Aug 12, 2020 at 9:43 PM Daniel Imberman <
> >> daniel.imberman@gmail.com>
> >>
> >>
> >> wrote:
> >>
> >>
> >>
> >>
> >>
> >> > I think pod_mutation is important as an “admin override”. E.g. there
> >> might
> >>
> >>
> >> > be a label that is required by security policy for all tasks launched
> by
> >>
> >>
> >> > airflow. I do agree that the executor_config should take in a V1Pod
> >> object
> >>
> >>
> >> >
> >>
> >>
> >> > One option is that the executor_config could take in a
> pod_mutation_hook
> >>
> >>
> >> > function as well? So users can manually override the pod functionally
> >>
> >>
> >> > rather than just giving an object and hoping it merges the way they
> >> expect
> >>
> >>
> >> > it to?
> >>
> >>
> >> > via Newton Mail [
> >>
> >>
> >> >
> >>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> >>
> >>
> >> > ]
> >>
> >>
> >> > On Wed, Aug 12, 2020 at 12:19 PM, David Lum <da...@gmail.com>
> >> wrote:
> >>
> >>
> >> > I'll just leave some passing thoughts here. How the pod template file
> is
> >>
> >>
> >> > modified by the executor config might not be very well defined.
> >> Currently
> >>
> >>
> >> > when a user passes in a k8s.V1Pod object through executor_config we
> >> kind of
> >>
> >>
> >> > arbitrarily merge the two pod definitions to what we imagine the
> client
> >>
> >>
> >> > would want, but it could be often misinterpreted. Previously the
> >>
> >>
> >> > Airflow-defined Pod class was fairly flat. This made merging the two
> >>
> >>
> >> > objects fairly straight forward. If the user wrote the field in
> >>
> >>
> >> > executor_config, it overwrote that field in the Pod definition, but
> the
> >>
> >>
> >> > rest stayed the same. When attributes are arbitrarily nested like in
> the
> >>
> >>
> >> > k8s.V1Pod, some aspects can be mistakenly overwritten. For example,
> >>
> >>
> >> > specifying the volumeMounts of a container requires also specifying
> the
> >>
> >>
> >> > rest of the V1Pod. How do we handle this? Do we just overwrite
> >>
> >>
> >> > volumeMounts? The client must have specified a container, so do we
> >>
> >>
> >> > overwrite that too? The fact that there is a choice means the API
> could
> >> be
> >>
> >>
> >> > unintuitive to the user.
> >>
> >>
> >> >
> >>
> >>
> >> > What I suggest might be easier is that executor_config completely
> >> replaces
> >>
> >>
> >> > the pod_template_file if it is specified, but then also offer
> something
> >>
> >>
> >> > like pod_mutation_hook at the operator level. This makes it quite
> clear
> >> to
> >>
> >>
> >> > the user, you are either replacing the pod_template, or you are
> >> modifying
> >>
> >>
> >> > the pod_template with pod_mutation_hook and you are doing it by your
> own
> >>
> >>
> >> > rules. the argument to executor_config can be something like
> >>
> >>
> >> > ```
> >>
> >>
> >> > executor_config: { 'KubernetesExector': {'pod_mutation': some_func,
> >>
> >>
> >> > 'pod_template': k8s.V1Pod(...) }} # Pod template could also be a path
> >> to a
> >>
> >>
> >> > file
> >>
> >>
> >> > ```
> >>
> >>
> >> >
> >>
> >>
> >> > If both are specified then pod_mutation is a function applied to
> >>
> >>
> >> > pod_template. Thoughts?
> >>
> >>
> >> >
> >>
> >>
> >> > On Wed, Aug 12, 2020 at 11:24 AM Tomasz Urbaszek <
> turbaszek@apache.org>
> >>
> >>
> >> > wrote:
> >>
> >>
> >> >
> >>
> >>
> >> > > +1 for the idea, thanks Daniel! I agree that multi-executor and pod
> >>
> >>
> >> > > templates should be 2.1.
> >>
> >>
> >> > >
> >>
> >>
> >> > > T.
> >>
> >>
> >> > >
> >>
> >>
> >> > > On Wed, Aug 12, 2020 at 5:13 PM Daniel Imberman <
> >>
> >>
> >> > daniel.imberman@gmail.com>
> >>
> >>
> >> > > wrote:
> >>
> >>
> >> > >
> >>
> >>
> >> > >> (Also funny enough we could use a lot of the existing
> infrastructure
> >> int
> >>
> >>
> >> > >> 1.10 to create that migration script. Would just need to take the
> >> python
> >>
> >>
> >> > >> dictionary and use the python yaml library)
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> via Newton Mail [
> >>
> >>
> >> > >>
> >>
> >>
> >> >
> >>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> >>
> >>
> >> > >> ]
> >>
> >>
> >> > >> On Wed, Aug 12, 2020 at 8:09 AM, Daniel Imberman <
> >>
> >>
> >> > >> daniel.imberman@gmail.com> wrote:
> >>
> >>
> >> > >> 100% agreed on timing. I think 2.0 should be for the breaking
> aspect
> >>
> >>
> >> > >> (losing the configurations) and then 2.1/2.2 we can start adding on
> >> the
> >>
> >>
> >> > new
> >>
> >>
> >> > >> features. I also like the idea of a migration tool. We can make a
> >> script
> >>
> >>
> >> > >> that takes your airflow.cfg and converts that into a yaml file.
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> via Newton Mail [
> >>
> >>
> >> > >>
> >>
> >>
> >> >
> >>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> >>
> >>
> >> > >> ]
> >>
> >>
> >> > >> On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <
> >> Jarek.Potiuk@polidea.com
> >>
> >>
> >> > >
> >>
> >>
> >> > >> wrote:
> >>
> >>
> >> > >> Big +1. All the arguments are very appealing to me and simplifying
> >> the
> >>
> >>
> >> > >> Kubernetes Executor down to YAML-configurable one seems like a
> >>
> >>
> >> > no-brainer
> >>
> >>
> >> > >> especially if we provide some migration tools. I've lost countless
> >> hours
> >>
> >>
> >> > >> on
> >>
> >>
> >> > >> debugging some configuration problems, simply because the relevant
> >>
> >>
> >> > >> Kubernetes-related configuration was in the least expected place -
> >> i.e.
> >>
> >>
> >> > >> airflow.cfg. YAML configuration.
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> I am also a big fan of both 1. and 2. I've implemented a POC of
> >>
> >>
> >> > >> queue-based multi-scheduler once but having it embedded as part of
> >> core
> >>
> >>
> >> > >> Airflow rather than based it on queues (which are basically a
> Celery
> >>
> >>
> >> > >> Executor concept) is I think much better approach. Both 1. and 2.
> are
> >>
> >>
> >> > >> cool.
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> Now - question about timing. If we decide to go that route - my
> view
> >> is
> >>
> >>
> >> > >> that simplifying Kubernetes should be an Airflow 2.0 task -
> alongside
> >>
> >>
> >> > more
> >>
> >>
> >> > >> comprehensive tests (which will be much easier to write in this
> >> case).
> >>
> >>
> >> > The
> >>
> >>
> >> > >> new features/ideas 1. 2. for KE I think should come after that -
> >> when we
> >>
> >>
> >> > >> release and stabilize 2.0. Sounds like great candidates for 2.1 to
> >> me.
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> J.
> >>
> >>
> >> > >>
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <
> >>
> >>
> >> > >> daniel.imberman@gmail.com>
> >>
> >>
> >> > >> wrote:
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> > Hello, fellow Airflowers! I hope you are all well in these trying
> >>
> >>
> >> > times.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > With the recent launch of Airflow 2.0 preparation, it now seems
> >> like a
> >>
> >>
> >> > >> > good time to review the project's state and where we can fit in
> >> some
> >>
> >>
> >> > >> > breaking changes that will improve the project for the future.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > When we first created the KubernetesExecutor, we had two goals in
> >>
> >>
> >> > mind.
> >>
> >>
> >> > >> > The first goal was to improve the airflow Auto scaling story.
> >>
> >>
> >> > >> Previously,
> >>
> >>
> >> > >> > airflow users would have to manually provision celery workers,
> >> which
> >>
> >>
> >> > >> could
> >>
> >>
> >> > >> > lead to wasted resources or missed SLAs. The other goal was to
> >>
> >>
> >> > >> introduce a
> >>
> >>
> >> > >> > community that was not yet well versed in the Kubernetes API to
> the
> >>
> >>
> >> > >> > Kubernetes system.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > To ease the community's transition, we abstracted many of the
> >>
> >>
> >> > >> complexities
> >>
> >>
> >> > >> > of creating a Kubernetes object. We chose to offer a limited
> >> number of
> >>
> >>
> >> > >> > configurations and keep much of the pod creation process internal
> >> to
> >>
> >>
> >> > >> > airflow. In the short-term, this system lowered the barrier to
> >> entry.
> >>
> >>
> >> > >> Over
> >>
> >>
> >> > >> > time, however, this abstraction has become a nightmare of tech
> >> debt as
> >>
> >>
> >> > >> the
> >>
> >>
> >> > >> > Kubernetes API is expensive and constantly changing.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > With this in mind, I think it's time for us to consider a more
> >>
> >>
> >> > >> > straightforward approach that takes the complexity out of Airflow
> >> and
> >>
> >>
> >> > >> > offers the full Kubernetes API to the airflow user.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > What I'm proposing here is pretty straightforward. We remove all
> >>
> >>
> >> > >> > Kubernetes pod creation configurations from the airflow.cfg and
> >>
> >>
> >> > instead
> >>
> >>
> >> > >> > offer only one way to use the KubernetesExecutor: with a YAML
> file.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > We can easily supply all of the configurations to the
> >>
> >>
> >> > KubernetesExecutor
> >>
> >>
> >> > >> > by offering example YAMLs (git sync mode is just a sidecar and an
> >> init
> >>
> >>
> >> > >> > container, DAG volumes are just an example volume and volume
> mount,
> >>
> >>
> >> > >> etc.).
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > This system would simplify a user's ability to predict what a pod
> >> will
> >>
> >>
> >> > >> > look like once it is launched by airflow. They will know it's a
> >> base
> >>
> >>
> >> > pod
> >>
> >>
> >> > >> > and will be able to simply modify the pod object using the
> executor
> >>
> >>
> >> > >> config
> >>
> >>
> >> > >> > and the pod mutation hook.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > This simplification could also lead to some pretty great new
> >> features
> >>
> >>
> >> > in
> >>
> >>
> >> > >> > the KubernetesExecutor
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > Idea 1: Picking a pod_template_file per-task
> >>
> >>
> >> > >> > Along with the existing customization with the executor config,
> >> solely
> >>
> >>
> >> > >> > relying on pod files can allow users to pick the pod template
> file
> >>
> >>
> >> > that
> >>
> >>
> >> > >> > they want to use as their base pod on a per-task basis. An
> Airflow
> >>
> >>
> >> > >> engineer
> >>
> >>
> >> > >> > could supply several pre-made templates for their data scientists
> >> to
> >>
> >>
> >> > >> reduce
> >>
> >>
> >> > >> > the amount of customization an airflow user would need to use.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
> >>
> >>
> >> > >> > One idea that we've been excited about recently has been the idea
> >> of
> >>
> >>
> >> > >> > creating merged Celery and Kubernetes executor. This hybrid
> >> executor
> >>
> >>
> >> > >> would
> >>
> >>
> >> > >> > default to launching celery workers with KEDA and would have the
> >>
> >>
> >> > option
> >>
> >>
> >> > >> to
> >>
> >>
> >> > >> > launch individual tasks using the Kubernetes executor when a user
> >>
> >>
> >> > wants
> >>
> >>
> >> > >> > isolation or customization. Simplifying the Kubernetes executor
> >>
> >>
> >> > reduces
> >>
> >>
> >> > >> the
> >>
> >>
> >> > >> > number of fail-points that this merged executor would need to
> >> account
> >>
> >>
> >> > >> for.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > What would we need to do to implement this?
> >>
> >>
> >> > >> > The good news here is that the hard work has already been done!
> As
> >> of
> >>
> >>
> >> > >> > AIRFLOW-5413 [https://issues.apache.org/jira/browse/AIRFLOW-5413
> ]
> >> by
> >>
> >>
> >> > >> > David Lum, airflow already has the ability to use base worker
> pods
> >> on
> >>
> >>
> >> > a
> >>
> >>
> >> > >> > template file. This would involve primarily code deletion and
> very
> >>
> >>
> >> > >> little
> >>
> >>
> >> > >> > new code.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > Thank you for your time and I look forward to the community’s
> >>
> >>
> >> > >> discussion.
> >>
> >>
> >> > >> >
> >>
> >>
> >> > >> > Daniel
> >>
> >>
> >> > >>
> >>
> >>
> >> > >>
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> --
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> Jarek Potiuk
> >>
> >>
> >> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >>
> >> > >>
> >>
> >>
> >> > >> M: +48 660 796 129 <+48660796129>
> >>
> >>
> >> > >> [image: Polidea] <https://www.polidea.com/>
> >>
> >>
> >> > >
> >>
> >>
> >> > >
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >>
> >>
> >>
> >>
> >>
> >> Jarek Potiuk
> >>
> >>
> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >>
> >>
> >>
> >>
> >> M: +48 660 796 129 <+48660796129>
> >>
> >>
> >> [image: Polidea] <https://www.polidea.com/>
> >>
> >>
> >>
> >
> >
>

Re: Simplifying the KubernetesExecutor

Posted by Deng Xiaodong <xd...@gmail.com>.
Hi Ping,

Inserting images in the email body doesn’t work for this mail list. So we
cannot actually see the pictures you shared.

If you find them important, you may share the pictures somewhere then
provide the links in the mail body.

Thanks.

XD

On Mon, Aug 24, 2020 at 08:56 Ping Zhang <pi...@umich.edu> wrote:

> Great ideas.
>
> Actually, when I evaluated using KubernetesExecutor in Airbnb, I created
> *JobDispatcherExecutor*, which is similar to the idea 2.
>
> We have tons of sensor tasks launched at around UTC 00:00 and we enabled
> the SmartSensor, which means each sensor task is very short running. If we
> choose to have one pod per task, the k8s cluster will not be able to handle
> the bursty load.
>
> Thus, I created a JobDispatcherExecutor which takes an instance of
> `celery_executor` and `kubernetes_executor`.
> [image: image.png]
> And there is a router
> <https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py-L165-L168>
> method to determine use which executor to queue_command,
> [image: image.png]
> It is currently based on whether a task is a smart sensor task, but it can
> be easily changed to other attributes and/or settings.policy (so that infra
> can have control).
>
> Here is the gist:
> https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py
>
> [image: image.png]
>
>
> Would love to get some feedback.
>
> Best wishes
>
> Ping Zhang
>
>
> On Thu, Aug 13, 2020 at 12:58 PM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> +1 for pod mutation in config. It's not a "yamly" thing - it's a python
>>
>>
>> code to run, so it's quite appropriate to have it in config.
>>
>>
>>
>>
>>
>> On Wed, Aug 12, 2020 at 9:43 PM Daniel Imberman <
>> daniel.imberman@gmail.com>
>>
>>
>> wrote:
>>
>>
>>
>>
>>
>> > I think pod_mutation is important as an “admin override”. E.g. there
>> might
>>
>>
>> > be a label that is required by security policy for all tasks launched by
>>
>>
>> > airflow. I do agree that the executor_config should take in a V1Pod
>> object
>>
>>
>> >
>>
>>
>> > One option is that the executor_config could take in a pod_mutation_hook
>>
>>
>> > function as well? So users can manually override the pod functionally
>>
>>
>> > rather than just giving an object and hoping it merges the way they
>> expect
>>
>>
>> > it to?
>>
>>
>> > via Newton Mail [
>>
>>
>> >
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
>>
>>
>> > ]
>>
>>
>> > On Wed, Aug 12, 2020 at 12:19 PM, David Lum <da...@gmail.com>
>> wrote:
>>
>>
>> > I'll just leave some passing thoughts here. How the pod template file is
>>
>>
>> > modified by the executor config might not be very well defined.
>> Currently
>>
>>
>> > when a user passes in a k8s.V1Pod object through executor_config we
>> kind of
>>
>>
>> > arbitrarily merge the two pod definitions to what we imagine the client
>>
>>
>> > would want, but it could be often misinterpreted. Previously the
>>
>>
>> > Airflow-defined Pod class was fairly flat. This made merging the two
>>
>>
>> > objects fairly straight forward. If the user wrote the field in
>>
>>
>> > executor_config, it overwrote that field in the Pod definition, but the
>>
>>
>> > rest stayed the same. When attributes are arbitrarily nested like in the
>>
>>
>> > k8s.V1Pod, some aspects can be mistakenly overwritten. For example,
>>
>>
>> > specifying the volumeMounts of a container requires also specifying the
>>
>>
>> > rest of the V1Pod. How do we handle this? Do we just overwrite
>>
>>
>> > volumeMounts? The client must have specified a container, so do we
>>
>>
>> > overwrite that too? The fact that there is a choice means the API could
>> be
>>
>>
>> > unintuitive to the user.
>>
>>
>> >
>>
>>
>> > What I suggest might be easier is that executor_config completely
>> replaces
>>
>>
>> > the pod_template_file if it is specified, but then also offer something
>>
>>
>> > like pod_mutation_hook at the operator level. This makes it quite clear
>> to
>>
>>
>> > the user, you are either replacing the pod_template, or you are
>> modifying
>>
>>
>> > the pod_template with pod_mutation_hook and you are doing it by your own
>>
>>
>> > rules. the argument to executor_config can be something like
>>
>>
>> > ```
>>
>>
>> > executor_config: { 'KubernetesExector': {'pod_mutation': some_func,
>>
>>
>> > 'pod_template': k8s.V1Pod(...) }} # Pod template could also be a path
>> to a
>>
>>
>> > file
>>
>>
>> > ```
>>
>>
>> >
>>
>>
>> > If both are specified then pod_mutation is a function applied to
>>
>>
>> > pod_template. Thoughts?
>>
>>
>> >
>>
>>
>> > On Wed, Aug 12, 2020 at 11:24 AM Tomasz Urbaszek <tu...@apache.org>
>>
>>
>> > wrote:
>>
>>
>> >
>>
>>
>> > > +1 for the idea, thanks Daniel! I agree that multi-executor and pod
>>
>>
>> > > templates should be 2.1.
>>
>>
>> > >
>>
>>
>> > > T.
>>
>>
>> > >
>>
>>
>> > > On Wed, Aug 12, 2020 at 5:13 PM Daniel Imberman <
>>
>>
>> > daniel.imberman@gmail.com>
>>
>>
>> > > wrote:
>>
>>
>> > >
>>
>>
>> > >> (Also funny enough we could use a lot of the existing infrastructure
>> int
>>
>>
>> > >> 1.10 to create that migration script. Would just need to take the
>> python
>>
>>
>> > >> dictionary and use the python yaml library)
>>
>>
>> > >>
>>
>>
>> > >> via Newton Mail [
>>
>>
>> > >>
>>
>>
>> >
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
>>
>>
>> > >> ]
>>
>>
>> > >> On Wed, Aug 12, 2020 at 8:09 AM, Daniel Imberman <
>>
>>
>> > >> daniel.imberman@gmail.com> wrote:
>>
>>
>> > >> 100% agreed on timing. I think 2.0 should be for the breaking aspect
>>
>>
>> > >> (losing the configurations) and then 2.1/2.2 we can start adding on
>> the
>>
>>
>> > new
>>
>>
>> > >> features. I also like the idea of a migration tool. We can make a
>> script
>>
>>
>> > >> that takes your airflow.cfg and converts that into a yaml file.
>>
>>
>> > >>
>>
>>
>> > >> via Newton Mail [
>>
>>
>> > >>
>>
>>
>> >
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
>>
>>
>> > >> ]
>>
>>
>> > >> On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <
>> Jarek.Potiuk@polidea.com
>>
>>
>> > >
>>
>>
>> > >> wrote:
>>
>>
>> > >> Big +1. All the arguments are very appealing to me and simplifying
>> the
>>
>>
>> > >> Kubernetes Executor down to YAML-configurable one seems like a
>>
>>
>> > no-brainer
>>
>>
>> > >> especially if we provide some migration tools. I've lost countless
>> hours
>>
>>
>> > >> on
>>
>>
>> > >> debugging some configuration problems, simply because the relevant
>>
>>
>> > >> Kubernetes-related configuration was in the least expected place -
>> i.e.
>>
>>
>> > >> airflow.cfg. YAML configuration.
>>
>>
>> > >>
>>
>>
>> > >> I am also a big fan of both 1. and 2. I've implemented a POC of
>>
>>
>> > >> queue-based multi-scheduler once but having it embedded as part of
>> core
>>
>>
>> > >> Airflow rather than based it on queues (which are basically a Celery
>>
>>
>> > >> Executor concept) is I think much better approach. Both 1. and 2. are
>>
>>
>> > >> cool.
>>
>>
>> > >>
>>
>>
>> > >> Now - question about timing. If we decide to go that route - my view
>> is
>>
>>
>> > >> that simplifying Kubernetes should be an Airflow 2.0 task - alongside
>>
>>
>> > more
>>
>>
>> > >> comprehensive tests (which will be much easier to write in this
>> case).
>>
>>
>> > The
>>
>>
>> > >> new features/ideas 1. 2. for KE I think should come after that -
>> when we
>>
>>
>> > >> release and stabilize 2.0. Sounds like great candidates for 2.1 to
>> me.
>>
>>
>> > >>
>>
>>
>> > >> J.
>>
>>
>> > >>
>>
>>
>> > >>
>>
>>
>> > >> On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <
>>
>>
>> > >> daniel.imberman@gmail.com>
>>
>>
>> > >> wrote:
>>
>>
>> > >>
>>
>>
>> > >> > Hello, fellow Airflowers! I hope you are all well in these trying
>>
>>
>> > times.
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> > With the recent launch of Airflow 2.0 preparation, it now seems
>> like a
>>
>>
>> > >> > good time to review the project's state and where we can fit in
>> some
>>
>>
>> > >> > breaking changes that will improve the project for the future.
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> > When we first created the KubernetesExecutor, we had two goals in
>>
>>
>> > mind.
>>
>>
>> > >> > The first goal was to improve the airflow Auto scaling story.
>>
>>
>> > >> Previously,
>>
>>
>> > >> > airflow users would have to manually provision celery workers,
>> which
>>
>>
>> > >> could
>>
>>
>> > >> > lead to wasted resources or missed SLAs. The other goal was to
>>
>>
>> > >> introduce a
>>
>>
>> > >> > community that was not yet well versed in the Kubernetes API to the
>>
>>
>> > >> > Kubernetes system.
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> > To ease the community's transition, we abstracted many of the
>>
>>
>> > >> complexities
>>
>>
>> > >> > of creating a Kubernetes object. We chose to offer a limited
>> number of
>>
>>
>> > >> > configurations and keep much of the pod creation process internal
>> to
>>
>>
>> > >> > airflow. In the short-term, this system lowered the barrier to
>> entry.
>>
>>
>> > >> Over
>>
>>
>> > >> > time, however, this abstraction has become a nightmare of tech
>> debt as
>>
>>
>> > >> the
>>
>>
>> > >> > Kubernetes API is expensive and constantly changing.
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> > With this in mind, I think it's time for us to consider a more
>>
>>
>> > >> > straightforward approach that takes the complexity out of Airflow
>> and
>>
>>
>> > >> > offers the full Kubernetes API to the airflow user.
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> > What I'm proposing here is pretty straightforward. We remove all
>>
>>
>> > >> > Kubernetes pod creation configurations from the airflow.cfg and
>>
>>
>> > instead
>>
>>
>> > >> > offer only one way to use the KubernetesExecutor: with a YAML file.
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> > We can easily supply all of the configurations to the
>>
>>
>> > KubernetesExecutor
>>
>>
>> > >> > by offering example YAMLs (git sync mode is just a sidecar and an
>> init
>>
>>
>> > >> > container, DAG volumes are just an example volume and volume mount,
>>
>>
>> > >> etc.).
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> > This system would simplify a user's ability to predict what a pod
>> will
>>
>>
>> > >> > look like once it is launched by airflow. They will know it's a
>> base
>>
>>
>> > pod
>>
>>
>> > >> > and will be able to simply modify the pod object using the executor
>>
>>
>> > >> config
>>
>>
>> > >> > and the pod mutation hook.
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> > This simplification could also lead to some pretty great new
>> features
>>
>>
>> > in
>>
>>
>> > >> > the KubernetesExecutor
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> > Idea 1: Picking a pod_template_file per-task
>>
>>
>> > >> > Along with the existing customization with the executor config,
>> solely
>>
>>
>> > >> > relying on pod files can allow users to pick the pod template file
>>
>>
>> > that
>>
>>
>> > >> > they want to use as their base pod on a per-task basis. An Airflow
>>
>>
>> > >> engineer
>>
>>
>> > >> > could supply several pre-made templates for their data scientists
>> to
>>
>>
>> > >> reduce
>>
>>
>> > >> > the amount of customization an airflow user would need to use.
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> > Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
>>
>>
>> > >> > One idea that we've been excited about recently has been the idea
>> of
>>
>>
>> > >> > creating merged Celery and Kubernetes executor. This hybrid
>> executor
>>
>>
>> > >> would
>>
>>
>> > >> > default to launching celery workers with KEDA and would have the
>>
>>
>> > option
>>
>>
>> > >> to
>>
>>
>> > >> > launch individual tasks using the Kubernetes executor when a user
>>
>>
>> > wants
>>
>>
>> > >> > isolation or customization. Simplifying the Kubernetes executor
>>
>>
>> > reduces
>>
>>
>> > >> the
>>
>>
>> > >> > number of fail-points that this merged executor would need to
>> account
>>
>>
>> > >> for.
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> > What would we need to do to implement this?
>>
>>
>> > >> > The good news here is that the hard work has already been done! As
>> of
>>
>>
>> > >> > AIRFLOW-5413 [https://issues.apache.org/jira/browse/AIRFLOW-5413]
>> by
>>
>>
>> > >> > David Lum, airflow already has the ability to use base worker pods
>> on
>>
>>
>> > a
>>
>>
>> > >> > template file. This would involve primarily code deletion and very
>>
>>
>> > >> little
>>
>>
>> > >> > new code.
>>
>>
>> > >> >
>>
>>
>> > >> >
>>
>>
>> > >> > Thank you for your time and I look forward to the community’s
>>
>>
>> > >> discussion.
>>
>>
>> > >> >
>>
>>
>> > >> > Daniel
>>
>>
>> > >>
>>
>>
>> > >>
>>
>>
>> > >>
>>
>>
>> > >> --
>>
>>
>> > >>
>>
>>
>> > >> Jarek Potiuk
>>
>>
>> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>>
>> > >>
>>
>>
>> > >> M: +48 660 796 129 <+48660796129>
>>
>>
>> > >> [image: Polidea] <https://www.polidea.com/>
>>
>>
>> > >
>>
>>
>> > >
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>>
>>
>>
>>
>> Jarek Potiuk
>>
>>
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>>
>>
>>
>>
>> M: +48 660 796 129 <+48660796129>
>>
>>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>>
>
>

Re: Simplifying the KubernetesExecutor

Posted by Ping Zhang <pi...@umich.edu>.
Great ideas.

Actually, when I evaluated using KubernetesExecutor in Airbnb, I created
*JobDispatcherExecutor*, which is similar to the idea 2.

We have tons of sensor tasks launched at around UTC 00:00 and we enabled
the SmartSensor, which means each sensor task is very short running. If we
choose to have one pod per task, the k8s cluster will not be able to handle
the bursty load.

Thus, I created a JobDispatcherExecutor which takes an instance of
`celery_executor` and `kubernetes_executor`.
[image: image.png]
And there is a router
<https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py-L165-L168>
method to determine use which executor to queue_command,
[image: image.png]
It is currently based on whether a task is a smart sensor task, but it can
be easily changed to other attributes and/or settings.policy (so that infra
can have control).

Here is the gist:
https://gist.github.com/pingzh/cc44c97336560b658d012c225a2242cc#file-jobdispatcherexecutor-py

[image: image.png]


Would love to get some feedback.

Best wishes

Ping Zhang


On Thu, Aug 13, 2020 at 12:58 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> +1 for pod mutation in config. It's not a "yamly" thing - it's a python
> code to run, so it's quite appropriate to have it in config.
>
> On Wed, Aug 12, 2020 at 9:43 PM Daniel Imberman <daniel.imberman@gmail.com
> >
> wrote:
>
> > I think pod_mutation is important as an “admin override”. E.g. there
> might
> > be a label that is required by security policy for all tasks launched by
> > airflow. I do agree that the executor_config should take in a V1Pod
> object
> >
> > One option is that the executor_config could take in a pod_mutation_hook
> > function as well? So users can manually override the pod functionally
> > rather than just giving an object and hoping it merges the way they
> expect
> > it to?
> > via Newton Mail [
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > ]
> > On Wed, Aug 12, 2020 at 12:19 PM, David Lum <da...@gmail.com>
> wrote:
> > I'll just leave some passing thoughts here. How the pod template file is
> > modified by the executor config might not be very well defined. Currently
> > when a user passes in a k8s.V1Pod object through executor_config we kind
> of
> > arbitrarily merge the two pod definitions to what we imagine the client
> > would want, but it could be often misinterpreted. Previously the
> > Airflow-defined Pod class was fairly flat. This made merging the two
> > objects fairly straight forward. If the user wrote the field in
> > executor_config, it overwrote that field in the Pod definition, but the
> > rest stayed the same. When attributes are arbitrarily nested like in the
> > k8s.V1Pod, some aspects can be mistakenly overwritten. For example,
> > specifying the volumeMounts of a container requires also specifying the
> > rest of the V1Pod. How do we handle this? Do we just overwrite
> > volumeMounts? The client must have specified a container, so do we
> > overwrite that too? The fact that there is a choice means the API could
> be
> > unintuitive to the user.
> >
> > What I suggest might be easier is that executor_config completely
> replaces
> > the pod_template_file if it is specified, but then also offer something
> > like pod_mutation_hook at the operator level. This makes it quite clear
> to
> > the user, you are either replacing the pod_template, or you are modifying
> > the pod_template with pod_mutation_hook and you are doing it by your own
> > rules. the argument to executor_config can be something like
> > ```
> > executor_config: { 'KubernetesExector': {'pod_mutation': some_func,
> > 'pod_template': k8s.V1Pod(...) }} # Pod template could also be a path to
> a
> > file
> > ```
> >
> > If both are specified then pod_mutation is a function applied to
> > pod_template. Thoughts?
> >
> > On Wed, Aug 12, 2020 at 11:24 AM Tomasz Urbaszek <tu...@apache.org>
> > wrote:
> >
> > > +1 for the idea, thanks Daniel! I agree that multi-executor and pod
> > > templates should be 2.1.
> > >
> > > T.
> > >
> > > On Wed, Aug 12, 2020 at 5:13 PM Daniel Imberman <
> > daniel.imberman@gmail.com>
> > > wrote:
> > >
> > >> (Also funny enough we could use a lot of the existing infrastructure
> int
> > >> 1.10 to create that migration script. Would just need to take the
> python
> > >> dictionary and use the python yaml library)
> > >>
> > >> via Newton Mail [
> > >>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > >> ]
> > >> On Wed, Aug 12, 2020 at 8:09 AM, Daniel Imberman <
> > >> daniel.imberman@gmail.com> wrote:
> > >> 100% agreed on timing. I think 2.0 should be for the breaking aspect
> > >> (losing the configurations) and then 2.1/2.2 we can start adding on
> the
> > new
> > >> features. I also like the idea of a migration tool. We can make a
> script
> > >> that takes your airflow.cfg and converts that into a yaml file.
> > >>
> > >> via Newton Mail [
> > >>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> > >> ]
> > >> On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <
> Jarek.Potiuk@polidea.com
> > >
> > >> wrote:
> > >> Big +1. All the arguments are very appealing to me and simplifying the
> > >> Kubernetes Executor down to YAML-configurable one seems like a
> > no-brainer
> > >> especially if we provide some migration tools. I've lost countless
> hours
> > >> on
> > >> debugging some configuration problems, simply because the relevant
> > >> Kubernetes-related configuration was in the least expected place -
> i.e.
> > >> airflow.cfg. YAML configuration.
> > >>
> > >> I am also a big fan of both 1. and 2. I've implemented a POC of
> > >> queue-based multi-scheduler once but having it embedded as part of
> core
> > >> Airflow rather than based it on queues (which are basically a Celery
> > >> Executor concept) is I think much better approach. Both 1. and 2. are
> > >> cool.
> > >>
> > >> Now - question about timing. If we decide to go that route - my view
> is
> > >> that simplifying Kubernetes should be an Airflow 2.0 task - alongside
> > more
> > >> comprehensive tests (which will be much easier to write in this case).
> > The
> > >> new features/ideas 1. 2. for KE I think should come after that - when
> we
> > >> release and stabilize 2.0. Sounds like great candidates for 2.1 to me.
> > >>
> > >> J.
> > >>
> > >>
> > >> On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <
> > >> daniel.imberman@gmail.com>
> > >> wrote:
> > >>
> > >> > Hello, fellow Airflowers! I hope you are all well in these trying
> > times.
> > >> >
> > >> >
> > >> > With the recent launch of Airflow 2.0 preparation, it now seems
> like a
> > >> > good time to review the project's state and where we can fit in some
> > >> > breaking changes that will improve the project for the future.
> > >> >
> > >> >
> > >> > When we first created the KubernetesExecutor, we had two goals in
> > mind.
> > >> > The first goal was to improve the airflow Auto scaling story.
> > >> Previously,
> > >> > airflow users would have to manually provision celery workers, which
> > >> could
> > >> > lead to wasted resources or missed SLAs. The other goal was to
> > >> introduce a
> > >> > community that was not yet well versed in the Kubernetes API to the
> > >> > Kubernetes system.
> > >> >
> > >> >
> > >> > To ease the community's transition, we abstracted many of the
> > >> complexities
> > >> > of creating a Kubernetes object. We chose to offer a limited number
> of
> > >> > configurations and keep much of the pod creation process internal to
> > >> > airflow. In the short-term, this system lowered the barrier to
> entry.
> > >> Over
> > >> > time, however, this abstraction has become a nightmare of tech debt
> as
> > >> the
> > >> > Kubernetes API is expensive and constantly changing.
> > >> >
> > >> >
> > >> > With this in mind, I think it's time for us to consider a more
> > >> > straightforward approach that takes the complexity out of Airflow
> and
> > >> > offers the full Kubernetes API to the airflow user.
> > >> >
> > >> >
> > >> > What I'm proposing here is pretty straightforward. We remove all
> > >> > Kubernetes pod creation configurations from the airflow.cfg and
> > instead
> > >> > offer only one way to use the KubernetesExecutor: with a YAML file.
> > >> >
> > >> >
> > >> > We can easily supply all of the configurations to the
> > KubernetesExecutor
> > >> > by offering example YAMLs (git sync mode is just a sidecar and an
> init
> > >> > container, DAG volumes are just an example volume and volume mount,
> > >> etc.).
> > >> >
> > >> >
> > >> > This system would simplify a user's ability to predict what a pod
> will
> > >> > look like once it is launched by airflow. They will know it's a base
> > pod
> > >> > and will be able to simply modify the pod object using the executor
> > >> config
> > >> > and the pod mutation hook.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > This simplification could also lead to some pretty great new
> features
> > in
> > >> > the KubernetesExecutor
> > >> >
> > >> >
> > >> > Idea 1: Picking a pod_template_file per-task
> > >> > Along with the existing customization with the executor config,
> solely
> > >> > relying on pod files can allow users to pick the pod template file
> > that
> > >> > they want to use as their base pod on a per-task basis. An Airflow
> > >> engineer
> > >> > could supply several pre-made templates for their data scientists to
> > >> reduce
> > >> > the amount of customization an airflow user would need to use.
> > >> >
> > >> >
> > >> > Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
> > >> > One idea that we've been excited about recently has been the idea of
> > >> > creating merged Celery and Kubernetes executor. This hybrid executor
> > >> would
> > >> > default to launching celery workers with KEDA and would have the
> > option
> > >> to
> > >> > launch individual tasks using the Kubernetes executor when a user
> > wants
> > >> > isolation or customization. Simplifying the Kubernetes executor
> > reduces
> > >> the
> > >> > number of fail-points that this merged executor would need to
> account
> > >> for.
> > >> >
> > >> >
> > >> >
> > >> > What would we need to do to implement this?
> > >> > The good news here is that the hard work has already been done! As
> of
> > >> > AIRFLOW-5413 [https://issues.apache.org/jira/browse/AIRFLOW-5413]
> by
> > >> > David Lum, airflow already has the ability to use base worker pods
> on
> > a
> > >> > template file. This would involve primarily code deletion and very
> > >> little
> > >> > new code.
> > >> >
> > >> >
> > >> > Thank you for your time and I look forward to the community’s
> > >> discussion.
> > >> >
> > >> > Daniel
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Jarek Potiuk
> > >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >>
> > >> M: +48 660 796 129 <+48660796129>
> > >> [image: Polidea] <https://www.polidea.com/>
> > >
> > >
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Re: Simplifying the KubernetesExecutor

Posted by Jarek Potiuk <Ja...@polidea.com>.
+1 for pod mutation in config. It's not a "yamly" thing - it's a python
code to run, so it's quite appropriate to have it in config.

On Wed, Aug 12, 2020 at 9:43 PM Daniel Imberman <da...@gmail.com>
wrote:

> I think pod_mutation is important as an “admin override”. E.g. there might
> be a label that is required by security policy for all tasks launched by
> airflow. I do agree that the executor_config should take in a V1Pod object
>
> One option is that the executor_config could take in a pod_mutation_hook
> function as well? So users can manually override the pod functionally
> rather than just giving an object and hoping it merges the way they expect
> it to?
> via Newton Mail [
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> ]
> On Wed, Aug 12, 2020 at 12:19 PM, David Lum <da...@gmail.com> wrote:
> I'll just leave some passing thoughts here. How the pod template file is
> modified by the executor config might not be very well defined. Currently
> when a user passes in a k8s.V1Pod object through executor_config we kind of
> arbitrarily merge the two pod definitions to what we imagine the client
> would want, but it could be often misinterpreted. Previously the
> Airflow-defined Pod class was fairly flat. This made merging the two
> objects fairly straight forward. If the user wrote the field in
> executor_config, it overwrote that field in the Pod definition, but the
> rest stayed the same. When attributes are arbitrarily nested like in the
> k8s.V1Pod, some aspects can be mistakenly overwritten. For example,
> specifying the volumeMounts of a container requires also specifying the
> rest of the V1Pod. How do we handle this? Do we just overwrite
> volumeMounts? The client must have specified a container, so do we
> overwrite that too? The fact that there is a choice means the API could be
> unintuitive to the user.
>
> What I suggest might be easier is that executor_config completely replaces
> the pod_template_file if it is specified, but then also offer something
> like pod_mutation_hook at the operator level. This makes it quite clear to
> the user, you are either replacing the pod_template, or you are modifying
> the pod_template with pod_mutation_hook and you are doing it by your own
> rules. the argument to executor_config can be something like
> ```
> executor_config: { 'KubernetesExector': {'pod_mutation': some_func,
> 'pod_template': k8s.V1Pod(...) }} # Pod template could also be a path to a
> file
> ```
>
> If both are specified then pod_mutation is a function applied to
> pod_template. Thoughts?
>
> On Wed, Aug 12, 2020 at 11:24 AM Tomasz Urbaszek <tu...@apache.org>
> wrote:
>
> > +1 for the idea, thanks Daniel! I agree that multi-executor and pod
> > templates should be 2.1.
> >
> > T.
> >
> > On Wed, Aug 12, 2020 at 5:13 PM Daniel Imberman <
> daniel.imberman@gmail.com>
> > wrote:
> >
> >> (Also funny enough we could use a lot of the existing infrastructure int
> >> 1.10 to create that migration script. Would just need to take the python
> >> dictionary and use the python yaml library)
> >>
> >> via Newton Mail [
> >>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> >> ]
> >> On Wed, Aug 12, 2020 at 8:09 AM, Daniel Imberman <
> >> daniel.imberman@gmail.com> wrote:
> >> 100% agreed on timing. I think 2.0 should be for the breaking aspect
> >> (losing the configurations) and then 2.1/2.2 we can start adding on the
> new
> >> features. I also like the idea of a migration tool. We can make a script
> >> that takes your airflow.cfg and converts that into a yaml file.
> >>
> >> via Newton Mail [
> >>
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> >> ]
> >> On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> >> wrote:
> >> Big +1. All the arguments are very appealing to me and simplifying the
> >> Kubernetes Executor down to YAML-configurable one seems like a
> no-brainer
> >> especially if we provide some migration tools. I've lost countless hours
> >> on
> >> debugging some configuration problems, simply because the relevant
> >> Kubernetes-related configuration was in the least expected place - i.e.
> >> airflow.cfg. YAML configuration.
> >>
> >> I am also a big fan of both 1. and 2. I've implemented a POC of
> >> queue-based multi-scheduler once but having it embedded as part of core
> >> Airflow rather than based it on queues (which are basically a Celery
> >> Executor concept) is I think much better approach. Both 1. and 2. are
> >> cool.
> >>
> >> Now - question about timing. If we decide to go that route - my view is
> >> that simplifying Kubernetes should be an Airflow 2.0 task - alongside
> more
> >> comprehensive tests (which will be much easier to write in this case).
> The
> >> new features/ideas 1. 2. for KE I think should come after that - when we
> >> release and stabilize 2.0. Sounds like great candidates for 2.1 to me.
> >>
> >> J.
> >>
> >>
> >> On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <
> >> daniel.imberman@gmail.com>
> >> wrote:
> >>
> >> > Hello, fellow Airflowers! I hope you are all well in these trying
> times.
> >> >
> >> >
> >> > With the recent launch of Airflow 2.0 preparation, it now seems like a
> >> > good time to review the project's state and where we can fit in some
> >> > breaking changes that will improve the project for the future.
> >> >
> >> >
> >> > When we first created the KubernetesExecutor, we had two goals in
> mind.
> >> > The first goal was to improve the airflow Auto scaling story.
> >> Previously,
> >> > airflow users would have to manually provision celery workers, which
> >> could
> >> > lead to wasted resources or missed SLAs. The other goal was to
> >> introduce a
> >> > community that was not yet well versed in the Kubernetes API to the
> >> > Kubernetes system.
> >> >
> >> >
> >> > To ease the community's transition, we abstracted many of the
> >> complexities
> >> > of creating a Kubernetes object. We chose to offer a limited number of
> >> > configurations and keep much of the pod creation process internal to
> >> > airflow. In the short-term, this system lowered the barrier to entry.
> >> Over
> >> > time, however, this abstraction has become a nightmare of tech debt as
> >> the
> >> > Kubernetes API is expensive and constantly changing.
> >> >
> >> >
> >> > With this in mind, I think it's time for us to consider a more
> >> > straightforward approach that takes the complexity out of Airflow and
> >> > offers the full Kubernetes API to the airflow user.
> >> >
> >> >
> >> > What I'm proposing here is pretty straightforward. We remove all
> >> > Kubernetes pod creation configurations from the airflow.cfg and
> instead
> >> > offer only one way to use the KubernetesExecutor: with a YAML file.
> >> >
> >> >
> >> > We can easily supply all of the configurations to the
> KubernetesExecutor
> >> > by offering example YAMLs (git sync mode is just a sidecar and an init
> >> > container, DAG volumes are just an example volume and volume mount,
> >> etc.).
> >> >
> >> >
> >> > This system would simplify a user's ability to predict what a pod will
> >> > look like once it is launched by airflow. They will know it's a base
> pod
> >> > and will be able to simply modify the pod object using the executor
> >> config
> >> > and the pod mutation hook.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > This simplification could also lead to some pretty great new features
> in
> >> > the KubernetesExecutor
> >> >
> >> >
> >> > Idea 1: Picking a pod_template_file per-task
> >> > Along with the existing customization with the executor config, solely
> >> > relying on pod files can allow users to pick the pod template file
> that
> >> > they want to use as their base pod on a per-task basis. An Airflow
> >> engineer
> >> > could supply several pre-made templates for their data scientists to
> >> reduce
> >> > the amount of customization an airflow user would need to use.
> >> >
> >> >
> >> > Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
> >> > One idea that we've been excited about recently has been the idea of
> >> > creating merged Celery and Kubernetes executor. This hybrid executor
> >> would
> >> > default to launching celery workers with KEDA and would have the
> option
> >> to
> >> > launch individual tasks using the Kubernetes executor when a user
> wants
> >> > isolation or customization. Simplifying the Kubernetes executor
> reduces
> >> the
> >> > number of fail-points that this merged executor would need to account
> >> for.
> >> >
> >> >
> >> >
> >> > What would we need to do to implement this?
> >> > The good news here is that the hard work has already been done! As of
> >> > AIRFLOW-5413 [https://issues.apache.org/jira/browse/AIRFLOW-5413] by
> >> > David Lum, airflow already has the ability to use base worker pods on
> a
> >> > template file. This would involve primarily code deletion and very
> >> little
> >> > new code.
> >> >
> >> >
> >> > Thank you for your time and I look forward to the community’s
> >> discussion.
> >> >
> >> > Daniel
> >>
> >>
> >>
> >> --
> >>
> >> Jarek Potiuk
> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> >>
> >> M: +48 660 796 129 <+48660796129>
> >> [image: Polidea] <https://www.polidea.com/>
> >
> >



-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Simplifying the KubernetesExecutor

Posted by Daniel Imberman <da...@gmail.com>.
I think pod_mutation is important as an “admin override”. E.g. there might be a label that is required by security policy for all tasks launched by airflow. I do agree that the executor_config should take in a V1Pod object

One option is that the executor_config could take in a pod_mutation_hook function as well? So users can manually override the pod functionally rather than just giving an object and hoping it merges the way they expect it to?
via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2]
On Wed, Aug 12, 2020 at 12:19 PM, David Lum <da...@gmail.com> wrote:
I'll just leave some passing thoughts here. How the pod template file is
modified by the executor config might not be very well defined. Currently
when a user passes in a k8s.V1Pod object through executor_config we kind of
arbitrarily merge the two pod definitions to what we imagine the client
would want, but it could be often misinterpreted. Previously the
Airflow-defined Pod class was fairly flat. This made merging the two
objects fairly straight forward. If the user wrote the field in
executor_config, it overwrote that field in the Pod definition, but the
rest stayed the same. When attributes are arbitrarily nested like in the
k8s.V1Pod, some aspects can be mistakenly overwritten. For example,
specifying the volumeMounts of a container requires also specifying the
rest of the V1Pod. How do we handle this? Do we just overwrite
volumeMounts? The client must have specified a container, so do we
overwrite that too? The fact that there is a choice means the API could be
unintuitive to the user.

What I suggest might be easier is that executor_config completely replaces
the pod_template_file if it is specified, but then also offer something
like pod_mutation_hook at the operator level. This makes it quite clear to
the user, you are either replacing the pod_template, or you are modifying
the pod_template with pod_mutation_hook and you are doing it by your own
rules. the argument to executor_config can be something like
```
executor_config: { 'KubernetesExector': {'pod_mutation': some_func,
'pod_template': k8s.V1Pod(...) }} # Pod template could also be a path to a
file
```

If both are specified then pod_mutation is a function applied to
pod_template. Thoughts?

On Wed, Aug 12, 2020 at 11:24 AM Tomasz Urbaszek <tu...@apache.org>
wrote:

> +1 for the idea, thanks Daniel! I agree that multi-executor and pod
> templates should be 2.1.
>
> T.
>
> On Wed, Aug 12, 2020 at 5:13 PM Daniel Imberman <da...@gmail.com>
> wrote:
>
>> (Also funny enough we could use a lot of the existing infrastructure int
>> 1.10 to create that migration script. Would just need to take the python
>> dictionary and use the python yaml library)
>>
>> via Newton Mail [
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
>> ]
>> On Wed, Aug 12, 2020 at 8:09 AM, Daniel Imberman <
>> daniel.imberman@gmail.com> wrote:
>> 100% agreed on timing. I think 2.0 should be for the breaking aspect
>> (losing the configurations) and then 2.1/2.2 we can start adding on the new
>> features. I also like the idea of a migration tool. We can make a script
>> that takes your airflow.cfg and converts that into a yaml file.
>>
>> via Newton Mail [
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
>> ]
>> On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>> Big +1. All the arguments are very appealing to me and simplifying the
>> Kubernetes Executor down to YAML-configurable one seems like a no-brainer
>> especially if we provide some migration tools. I've lost countless hours
>> on
>> debugging some configuration problems, simply because the relevant
>> Kubernetes-related configuration was in the least expected place - i.e.
>> airflow.cfg. YAML configuration.
>>
>> I am also a big fan of both 1. and 2. I've implemented a POC of
>> queue-based multi-scheduler once but having it embedded as part of core
>> Airflow rather than based it on queues (which are basically a Celery
>> Executor concept) is I think much better approach. Both 1. and 2. are
>> cool.
>>
>> Now - question about timing. If we decide to go that route - my view is
>> that simplifying Kubernetes should be an Airflow 2.0 task - alongside more
>> comprehensive tests (which will be much easier to write in this case). The
>> new features/ideas 1. 2. for KE I think should come after that - when we
>> release and stabilize 2.0. Sounds like great candidates for 2.1 to me.
>>
>> J.
>>
>>
>> On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <
>> daniel.imberman@gmail.com>
>> wrote:
>>
>> > Hello, fellow Airflowers! I hope you are all well in these trying times.
>> >
>> >
>> > With the recent launch of Airflow 2.0 preparation, it now seems like a
>> > good time to review the project's state and where we can fit in some
>> > breaking changes that will improve the project for the future.
>> >
>> >
>> > When we first created the KubernetesExecutor, we had two goals in mind.
>> > The first goal was to improve the airflow Auto scaling story.
>> Previously,
>> > airflow users would have to manually provision celery workers, which
>> could
>> > lead to wasted resources or missed SLAs. The other goal was to
>> introduce a
>> > community that was not yet well versed in the Kubernetes API to the
>> > Kubernetes system.
>> >
>> >
>> > To ease the community's transition, we abstracted many of the
>> complexities
>> > of creating a Kubernetes object. We chose to offer a limited number of
>> > configurations and keep much of the pod creation process internal to
>> > airflow. In the short-term, this system lowered the barrier to entry.
>> Over
>> > time, however, this abstraction has become a nightmare of tech debt as
>> the
>> > Kubernetes API is expensive and constantly changing.
>> >
>> >
>> > With this in mind, I think it's time for us to consider a more
>> > straightforward approach that takes the complexity out of Airflow and
>> > offers the full Kubernetes API to the airflow user.
>> >
>> >
>> > What I'm proposing here is pretty straightforward. We remove all
>> > Kubernetes pod creation configurations from the airflow.cfg and instead
>> > offer only one way to use the KubernetesExecutor: with a YAML file.
>> >
>> >
>> > We can easily supply all of the configurations to the KubernetesExecutor
>> > by offering example YAMLs (git sync mode is just a sidecar and an init
>> > container, DAG volumes are just an example volume and volume mount,
>> etc.).
>> >
>> >
>> > This system would simplify a user's ability to predict what a pod will
>> > look like once it is launched by airflow. They will know it's a base pod
>> > and will be able to simply modify the pod object using the executor
>> config
>> > and the pod mutation hook.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > This simplification could also lead to some pretty great new features in
>> > the KubernetesExecutor
>> >
>> >
>> > Idea 1: Picking a pod_template_file per-task
>> > Along with the existing customization with the executor config, solely
>> > relying on pod files can allow users to pick the pod template file that
>> > they want to use as their base pod on a per-task basis. An Airflow
>> engineer
>> > could supply several pre-made templates for their data scientists to
>> reduce
>> > the amount of customization an airflow user would need to use.
>> >
>> >
>> > Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
>> > One idea that we've been excited about recently has been the idea of
>> > creating merged Celery and Kubernetes executor. This hybrid executor
>> would
>> > default to launching celery workers with KEDA and would have the option
>> to
>> > launch individual tasks using the Kubernetes executor when a user wants
>> > isolation or customization. Simplifying the Kubernetes executor reduces
>> the
>> > number of fail-points that this merged executor would need to account
>> for.
>> >
>> >
>> >
>> > What would we need to do to implement this?
>> > The good news here is that the hard work has already been done! As of
>> > AIRFLOW-5413 [https://issues.apache.org/jira/browse/AIRFLOW-5413] by
>> > David Lum, airflow already has the ability to use base worker pods on a
>> > template file. This would involve primarily code deletion and very
>> little
>> > new code.
>> >
>> >
>> > Thank you for your time and I look forward to the community’s
>> discussion.
>> >
>> > Daniel
>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>
>

Re: Simplifying the KubernetesExecutor

Posted by David Lum <da...@gmail.com>.
I'll just leave some passing thoughts here. How the pod template file is
modified by the executor config might not be very well defined. Currently
when a user passes in a k8s.V1Pod object through executor_config we kind of
arbitrarily merge the two pod definitions to what we imagine the client
would want, but it could be often misinterpreted. Previously the
Airflow-defined Pod class was fairly flat. This made merging the two
objects fairly straight forward. If the user wrote the field in
executor_config, it overwrote that field in the Pod definition, but the
rest stayed the same. When attributes are arbitrarily nested like in the
k8s.V1Pod, some aspects can be mistakenly overwritten. For example,
specifying the volumeMounts of a container requires also specifying the
rest of the V1Pod. How do we handle this? Do we just overwrite
volumeMounts? The client must have specified a container, so do we
overwrite that too? The fact that there is a choice means the API could be
unintuitive to the user.

What I suggest might be easier is that executor_config completely replaces
the pod_template_file if it is specified, but then also offer something
like pod_mutation_hook at the operator level. This makes it quite clear to
the user, you are either replacing the pod_template, or you are modifying
the pod_template with pod_mutation_hook and you are doing it by your own
rules. the argument to executor_config can be something like
```
executor_config: { 'KubernetesExector': {'pod_mutation': some_func,
'pod_template': k8s.V1Pod(...) }} # Pod template could also be a path to a
file
```

If both are specified then pod_mutation is a function applied to
pod_template. Thoughts?

On Wed, Aug 12, 2020 at 11:24 AM Tomasz Urbaszek <tu...@apache.org>
wrote:

> +1 for the idea, thanks Daniel! I agree that multi-executor and pod
> templates should be 2.1.
>
> T.
>
> On Wed, Aug 12, 2020 at 5:13 PM Daniel Imberman <da...@gmail.com>
> wrote:
>
>> (Also funny enough we could use a lot of the existing infrastructure int
>> 1.10 to create that migration script. Would just need to take the python
>> dictionary and use the python yaml library)
>>
>> via Newton Mail [
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
>> ]
>> On Wed, Aug 12, 2020 at 8:09 AM, Daniel Imberman <
>> daniel.imberman@gmail.com> wrote:
>> 100% agreed on timing. I think 2.0 should be for the breaking aspect
>> (losing the configurations) and then 2.1/2.2 we can start adding on the new
>> features. I also like the idea of a migration tool. We can make a script
>> that takes your airflow.cfg and converts that into a yaml file.
>>
>> via Newton Mail [
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
>> ]
>> On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>> Big +1. All the arguments are very appealing to me and simplifying the
>> Kubernetes Executor down to YAML-configurable one seems like a no-brainer
>> especially if we provide some migration tools. I've lost countless hours
>> on
>> debugging some configuration problems, simply because the relevant
>> Kubernetes-related configuration was in the least expected place - i.e.
>> airflow.cfg. YAML configuration.
>>
>> I am also a big fan of both 1. and 2. I've implemented a POC of
>> queue-based multi-scheduler once but having it embedded as part of core
>> Airflow rather than based it on queues (which are basically a Celery
>> Executor concept) is I think much better approach. Both 1. and 2. are
>> cool.
>>
>> Now - question about timing. If we decide to go that route - my view is
>> that simplifying Kubernetes should be an Airflow 2.0 task - alongside more
>> comprehensive tests (which will be much easier to write in this case). The
>> new features/ideas 1. 2. for KE I think should come after that - when we
>> release and stabilize 2.0. Sounds like great candidates for 2.1 to me.
>>
>> J.
>>
>>
>> On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <
>> daniel.imberman@gmail.com>
>> wrote:
>>
>> > Hello, fellow Airflowers! I hope you are all well in these trying times.
>> >
>> >
>> > With the recent launch of Airflow 2.0 preparation, it now seems like a
>> > good time to review the project's state and where we can fit in some
>> > breaking changes that will improve the project for the future.
>> >
>> >
>> > When we first created the KubernetesExecutor, we had two goals in mind.
>> > The first goal was to improve the airflow Auto scaling story.
>> Previously,
>> > airflow users would have to manually provision celery workers, which
>> could
>> > lead to wasted resources or missed SLAs. The other goal was to
>> introduce a
>> > community that was not yet well versed in the Kubernetes API to the
>> > Kubernetes system.
>> >
>> >
>> > To ease the community's transition, we abstracted many of the
>> complexities
>> > of creating a Kubernetes object. We chose to offer a limited number of
>> > configurations and keep much of the pod creation process internal to
>> > airflow. In the short-term, this system lowered the barrier to entry.
>> Over
>> > time, however, this abstraction has become a nightmare of tech debt as
>> the
>> > Kubernetes API is expensive and constantly changing.
>> >
>> >
>> > With this in mind, I think it's time for us to consider a more
>> > straightforward approach that takes the complexity out of Airflow and
>> > offers the full Kubernetes API to the airflow user.
>> >
>> >
>> > What I'm proposing here is pretty straightforward. We remove all
>> > Kubernetes pod creation configurations from the airflow.cfg and instead
>> > offer only one way to use the KubernetesExecutor: with a YAML file.
>> >
>> >
>> > We can easily supply all of the configurations to the KubernetesExecutor
>> > by offering example YAMLs (git sync mode is just a sidecar and an init
>> > container, DAG volumes are just an example volume and volume mount,
>> etc.).
>> >
>> >
>> > This system would simplify a user's ability to predict what a pod will
>> > look like once it is launched by airflow. They will know it's a base pod
>> > and will be able to simply modify the pod object using the executor
>> config
>> > and the pod mutation hook.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > This simplification could also lead to some pretty great new features in
>> > the KubernetesExecutor
>> >
>> >
>> > Idea 1: Picking a pod_template_file per-task
>> > Along with the existing customization with the executor config, solely
>> > relying on pod files can allow users to pick the pod template file that
>> > they want to use as their base pod on a per-task basis. An Airflow
>> engineer
>> > could supply several pre-made templates for their data scientists to
>> reduce
>> > the amount of customization an airflow user would need to use.
>> >
>> >
>> > Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
>> > One idea that we've been excited about recently has been the idea of
>> > creating merged Celery and Kubernetes executor. This hybrid executor
>> would
>> > default to launching celery workers with KEDA and would have the option
>> to
>> > launch individual tasks using the Kubernetes executor when a user wants
>> > isolation or customization. Simplifying the Kubernetes executor reduces
>> the
>> > number of fail-points that this merged executor would need to account
>> for.
>> >
>> >
>> >
>> > What would we need to do to implement this?
>> > The good news here is that the hard work has already been done! As of
>> > AIRFLOW-5413 [https://issues.apache.org/jira/browse/AIRFLOW-5413] by
>> > David Lum, airflow already has the ability to use base worker pods on a
>> > template file. This would involve primarily code deletion and very
>> little
>> > new code.
>> >
>> >
>> > Thank you for your time and I look forward to the community’s
>> discussion.
>> >
>> > Daniel
>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>
>

Re: Simplifying the KubernetesExecutor

Posted by Tomasz Urbaszek <tu...@apache.org>.
+1 for the idea, thanks Daniel! I agree that multi-executor and pod
templates should be 2.1.

T.

On Wed, Aug 12, 2020 at 5:13 PM Daniel Imberman <da...@gmail.com>
wrote:

> (Also funny enough we could use a lot of the existing infrastructure int
> 1.10 to create that migration script. Would just need to take the python
> dictionary and use the python yaml library)
>
> via Newton Mail [
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> ]
> On Wed, Aug 12, 2020 at 8:09 AM, Daniel Imberman <
> daniel.imberman@gmail.com> wrote:
> 100% agreed on timing. I think 2.0 should be for the breaking aspect
> (losing the configurations) and then 2.1/2.2 we can start adding on the new
> features. I also like the idea of a migration tool. We can make a script
> that takes your airflow.cfg and converts that into a yaml file.
>
> via Newton Mail [
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2
> ]
> On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <Ja...@polidea.com>
> wrote:
> Big +1. All the arguments are very appealing to me and simplifying the
> Kubernetes Executor down to YAML-configurable one seems like a no-brainer
> especially if we provide some migration tools. I've lost countless hours on
> debugging some configuration problems, simply because the relevant
> Kubernetes-related configuration was in the least expected place - i.e.
> airflow.cfg. YAML configuration.
>
> I am also a big fan of both 1. and 2. I've implemented a POC of
> queue-based multi-scheduler once but having it embedded as part of core
> Airflow rather than based it on queues (which are basically a Celery
> Executor concept) is I think much better approach. Both 1. and 2. are cool.
>
> Now - question about timing. If we decide to go that route - my view is
> that simplifying Kubernetes should be an Airflow 2.0 task - alongside more
> comprehensive tests (which will be much easier to write in this case). The
> new features/ideas 1. 2. for KE I think should come after that - when we
> release and stabilize 2.0. Sounds like great candidates for 2.1 to me.
>
> J.
>
>
> On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <daniel.imberman@gmail.com
> >
> wrote:
>
> > Hello, fellow Airflowers! I hope you are all well in these trying times.
> >
> >
> > With the recent launch of Airflow 2.0 preparation, it now seems like a
> > good time to review the project's state and where we can fit in some
> > breaking changes that will improve the project for the future.
> >
> >
> > When we first created the KubernetesExecutor, we had two goals in mind.
> > The first goal was to improve the airflow Auto scaling story. Previously,
> > airflow users would have to manually provision celery workers, which
> could
> > lead to wasted resources or missed SLAs. The other goal was to introduce
> a
> > community that was not yet well versed in the Kubernetes API to the
> > Kubernetes system.
> >
> >
> > To ease the community's transition, we abstracted many of the
> complexities
> > of creating a Kubernetes object. We chose to offer a limited number of
> > configurations and keep much of the pod creation process internal to
> > airflow. In the short-term, this system lowered the barrier to entry.
> Over
> > time, however, this abstraction has become a nightmare of tech debt as
> the
> > Kubernetes API is expensive and constantly changing.
> >
> >
> > With this in mind, I think it's time for us to consider a more
> > straightforward approach that takes the complexity out of Airflow and
> > offers the full Kubernetes API to the airflow user.
> >
> >
> > What I'm proposing here is pretty straightforward. We remove all
> > Kubernetes pod creation configurations from the airflow.cfg and instead
> > offer only one way to use the KubernetesExecutor: with a YAML file.
> >
> >
> > We can easily supply all of the configurations to the KubernetesExecutor
> > by offering example YAMLs (git sync mode is just a sidecar and an init
> > container, DAG volumes are just an example volume and volume mount,
> etc.).
> >
> >
> > This system would simplify a user's ability to predict what a pod will
> > look like once it is launched by airflow. They will know it's a base pod
> > and will be able to simply modify the pod object using the executor
> config
> > and the pod mutation hook.
> >
> >
> >
> >
> >
> >
> >
> > This simplification could also lead to some pretty great new features in
> > the KubernetesExecutor
> >
> >
> > Idea 1: Picking a pod_template_file per-task
> > Along with the existing customization with the executor config, solely
> > relying on pod files can allow users to pick the pod template file that
> > they want to use as their base pod on a per-task basis. An Airflow
> engineer
> > could supply several pre-made templates for their data scientists to
> reduce
> > the amount of customization an airflow user would need to use.
> >
> >
> > Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
> > One idea that we've been excited about recently has been the idea of
> > creating merged Celery and Kubernetes executor. This hybrid executor
> would
> > default to launching celery workers with KEDA and would have the option
> to
> > launch individual tasks using the Kubernetes executor when a user wants
> > isolation or customization. Simplifying the Kubernetes executor reduces
> the
> > number of fail-points that this merged executor would need to account
> for.
> >
> >
> >
> > What would we need to do to implement this?
> > The good news here is that the hard work has already been done! As of
> > AIRFLOW-5413 [https://issues.apache.org/jira/browse/AIRFLOW-5413] by
> > David Lum, airflow already has the ability to use base worker pods on a
> > template file. This would involve primarily code deletion and very little
> > new code.
> >
> >
> > Thank you for your time and I look forward to the community’s discussion.
> >
> > Daniel
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>

Re: Simplifying the KubernetesExecutor

Posted by Daniel Imberman <da...@gmail.com>.
(Also funny enough we could use a lot of the existing infrastructure int 1.10 to create that migration script. Would just need to take the python dictionary and use the python yaml library)

via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2]
On Wed, Aug 12, 2020 at 8:09 AM, Daniel Imberman <da...@gmail.com> wrote:
100% agreed on timing. I think 2.0 should be for the breaking aspect (losing the configurations) and then 2.1/2.2 we can start adding on the new features. I also like the idea of a migration tool. We can make a script that takes your airflow.cfg and converts that into a yaml file.

via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2]
On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <Ja...@polidea.com> wrote:
Big +1. All the arguments are very appealing to me and simplifying the
Kubernetes Executor down to YAML-configurable one seems like a no-brainer
especially if we provide some migration tools. I've lost countless hours on
debugging some configuration problems, simply because the relevant
Kubernetes-related configuration was in the least expected place - i.e.
airflow.cfg. YAML configuration.

I am also a big fan of both 1. and 2. I've implemented a POC of
queue-based multi-scheduler once but having it embedded as part of core
Airflow rather than based it on queues (which are basically a Celery
Executor concept) is I think much better approach. Both 1. and 2. are cool.

Now - question about timing. If we decide to go that route - my view is
that simplifying Kubernetes should be an Airflow 2.0 task - alongside more
comprehensive tests (which will be much easier to write in this case). The
new features/ideas 1. 2. for KE I think should come after that - when we
release and stabilize 2.0. Sounds like great candidates for 2.1 to me.

J.


On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <da...@gmail.com>
wrote:

> Hello, fellow Airflowers! I hope you are all well in these trying times.
>
>
> With the recent launch of Airflow 2.0 preparation, it now seems like a
> good time to review the project's state and where we can fit in some
> breaking changes that will improve the project for the future.
>
>
> When we first created the KubernetesExecutor, we had two goals in mind.
> The first goal was to improve the airflow Auto scaling story. Previously,
> airflow users would have to manually provision celery workers, which could
> lead to wasted resources or missed SLAs. The other goal was to introduce a
> community that was not yet well versed in the Kubernetes API to the
> Kubernetes system.
>
>
> To ease the community's transition, we abstracted many of the complexities
> of creating a Kubernetes object. We chose to offer a limited number of
> configurations and keep much of the pod creation process internal to
> airflow. In the short-term, this system lowered the barrier to entry. Over
> time, however, this abstraction has become a nightmare of tech debt as the
> Kubernetes API is expensive and constantly changing.
>
>
> With this in mind, I think it's time for us to consider a more
> straightforward approach that takes the complexity out of Airflow and
> offers the full Kubernetes API to the airflow user.
>
>
> What I'm proposing here is pretty straightforward. We remove all
> Kubernetes pod creation configurations from the airflow.cfg and instead
> offer only one way to use the KubernetesExecutor: with a YAML file.
>
>
> We can easily supply all of the configurations to the KubernetesExecutor
> by offering example YAMLs (git sync mode is just a sidecar and an init
> container, DAG volumes are just an example volume and volume mount, etc.).
>
>
> This system would simplify a user's ability to predict what a pod will
> look like once it is launched by airflow. They will know it's a base pod
> and will be able to simply modify the pod object using the executor config
> and the pod mutation hook.
>
>
>
>
>
>
>
> This simplification could also lead to some pretty great new features in
> the KubernetesExecutor
>
>
> Idea 1: Picking a pod_template_file per-task
> Along with the existing customization with the executor config, solely
> relying on pod files can allow users to pick the pod template file that
> they want to use as their base pod on a per-task basis. An Airflow engineer
> could supply several pre-made templates for their data scientists to reduce
> the amount of customization an airflow user would need to use.
>
>
> Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
> One idea that we've been excited about recently has been the idea of
> creating merged Celery and Kubernetes executor. This hybrid executor would
> default to launching celery workers with KEDA and would have the option to
> launch individual tasks using the Kubernetes executor when a user wants
> isolation or customization. Simplifying the Kubernetes executor reduces the
> number of fail-points that this merged executor would need to account for.
>
>
>
> What would we need to do to implement this?
> The good news here is that the hard work has already been done! As of
> AIRFLOW-5413 [https://issues.apache.org/jira/browse/AIRFLOW-5413] by
> David Lum, airflow already has the ability to use base worker pods on a
> template file. This would involve primarily code deletion and very little
> new code.
>
>
> Thank you for your time and I look forward to the community’s discussion.
>
> Daniel



--

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Simplifying the KubernetesExecutor

Posted by Daniel Imberman <da...@gmail.com>.
100% agreed on timing. I think 2.0 should be for the breaking aspect (losing the configurations) and then 2.1/2.2 we can start adding on the new features. I also like the idea of a migration tool. We can make a script that takes your airflow.cfg and converts that into a yaml file.

via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.15.5&source=email_footer_2]
On Wed, Aug 12, 2020 at 7:48 AM, Jarek Potiuk <Ja...@polidea.com> wrote:
Big +1. All the arguments are very appealing to me and simplifying the
Kubernetes Executor down to YAML-configurable one seems like a no-brainer
especially if we provide some migration tools. I've lost countless hours on
debugging some configuration problems, simply because the relevant
Kubernetes-related configuration was in the least expected place - i.e.
airflow.cfg. YAML configuration.

I am also a big fan of both 1. and 2. I've implemented a POC of
queue-based multi-scheduler once but having it embedded as part of core
Airflow rather than based it on queues (which are basically a Celery
Executor concept) is I think much better approach. Both 1. and 2. are cool.

Now - question about timing. If we decide to go that route - my view is
that simplifying Kubernetes should be an Airflow 2.0 task - alongside more
comprehensive tests (which will be much easier to write in this case). The
new features/ideas 1. 2. for KE I think should come after that - when we
release and stabilize 2.0. Sounds like great candidates for 2.1 to me.

J.


On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <da...@gmail.com>
wrote:

> Hello, fellow Airflowers! I hope you are all well in these trying times.
>
>
> With the recent launch of Airflow 2.0 preparation, it now seems like a
> good time to review the project's state and where we can fit in some
> breaking changes that will improve the project for the future.
>
>
> When we first created the KubernetesExecutor, we had two goals in mind.
> The first goal was to improve the airflow Auto scaling story. Previously,
> airflow users would have to manually provision celery workers, which could
> lead to wasted resources or missed SLAs. The other goal was to introduce a
> community that was not yet well versed in the Kubernetes API to the
> Kubernetes system.
>
>
> To ease the community's transition, we abstracted many of the complexities
> of creating a Kubernetes object. We chose to offer a limited number of
> configurations and keep much of the pod creation process internal to
> airflow. In the short-term, this system lowered the barrier to entry. Over
> time, however, this abstraction has become a nightmare of tech debt as the
> Kubernetes API is expensive and constantly changing.
>
>
> With this in mind, I think it's time for us to consider a more
> straightforward approach that takes the complexity out of Airflow and
> offers the full Kubernetes API to the airflow user.
>
>
> What I'm proposing here is pretty straightforward. We remove all
> Kubernetes pod creation configurations from the airflow.cfg and instead
> offer only one way to use the KubernetesExecutor: with a YAML file.
>
>
> We can easily supply all of the configurations to the KubernetesExecutor
> by offering example YAMLs (git sync mode is just a sidecar and an init
> container, DAG volumes are just an example volume and volume mount, etc.).
>
>
> This system would simplify a user's ability to predict what a pod will
> look like once it is launched by airflow. They will know it's a base pod
> and will be able to simply modify the pod object using the executor config
> and the pod mutation hook.
>
>
>
>
>
>
>
> This simplification could also lead to some pretty great new features in
> the KubernetesExecutor
>
>
> Idea 1: Picking a pod_template_file per-task
> Along with the existing customization with the executor config, solely
> relying on pod files can allow users to pick the pod template file that
> they want to use as their base pod on a per-task basis. An Airflow engineer
> could supply several pre-made templates for their data scientists to reduce
> the amount of customization an airflow user would need to use.
>
>
> Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
> One idea that we've been excited about recently has been the idea of
> creating merged Celery and Kubernetes executor. This hybrid executor would
> default to launching celery workers with KEDA and would have the option to
> launch individual tasks using the Kubernetes executor when a user wants
> isolation or customization. Simplifying the Kubernetes executor reduces the
> number of fail-points that this merged executor would need to account for.
>
>
>
> What would we need to do to implement this?
> The good news here is that the hard work has already been done! As of
> AIRFLOW-5413 [https://issues.apache.org/jira/browse/AIRFLOW-5413] by
> David Lum, airflow already has the ability to use base worker pods on a
> template file. This would involve primarily code deletion and very little
> new code.
>
>
> Thank you for your time and I look forward to the community’s discussion.
>
> Daniel



--

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Simplifying the KubernetesExecutor

Posted by Jarek Potiuk <Ja...@polidea.com>.
Big +1. All the arguments are very appealing to me and simplifying the
Kubernetes Executor down to YAML-configurable one seems like a no-brainer
especially if we provide some migration tools. I've lost countless hours on
debugging some configuration problems, simply because the relevant
Kubernetes-related configuration was in the least expected place - i.e.
airflow.cfg. YAML configuration.

I am also a big fan of both 1. and 2.  I've implemented a POC of
queue-based multi-scheduler once but having it embedded as part of core
Airflow rather than based it on queues (which are basically a Celery
Executor concept) is I think much better approach. Both 1. and 2. are cool.

Now - question about timing. If we decide to go that route - my view is
that simplifying Kubernetes should be an Airflow 2.0 task - alongside more
comprehensive tests (which will be much easier to write in this case).  The
new features/ideas 1. 2. for KE I think should come after that - when we
release and stabilize 2.0. Sounds like great candidates for 2.1 to me.

J.


On Wed, Aug 12, 2020 at 4:24 PM Daniel Imberman <da...@gmail.com>
wrote:

> Hello, fellow Airflowers! I hope you are all well in these trying times.
>
>
> With the recent launch of Airflow 2.0 preparation, it now seems like a
> good time to review the project's state and where we can fit in some
> breaking changes that will improve the project for the future.
>
>
> When we first created the KubernetesExecutor, we had two goals in mind.
> The first goal was to improve the airflow Auto scaling story. Previously,
> airflow users would have to manually provision celery workers, which could
> lead to wasted resources or missed SLAs. The other goal was to introduce a
> community that was not yet well versed in the Kubernetes API to the
> Kubernetes system.
>
>
> To ease the community's transition, we abstracted many of the complexities
> of creating a Kubernetes object. We chose to offer a limited number of
> configurations and keep much of the pod creation process internal to
> airflow. In the short-term, this system lowered the barrier to entry. Over
> time, however, this abstraction has become a nightmare of tech debt as the
> Kubernetes API is expensive and constantly changing.
>
>
> With this in mind, I think it's time for us to consider a more
> straightforward approach that takes the complexity out of Airflow and
> offers the full Kubernetes API to the airflow user.
>
>
> What I'm proposing here is pretty straightforward. We remove all
> Kubernetes pod creation configurations from the airflow.cfg and instead
> offer only one way to use the KubernetesExecutor: with a YAML file.
>
>
> We can easily supply all of the configurations to the KubernetesExecutor
> by offering example YAMLs (git sync mode is just a sidecar and an init
> container, DAG volumes are just an example volume and volume mount, etc.).
>
>
> This system would simplify a user's ability to predict what a pod will
> look like once it is launched by airflow. They will know it's a base pod
> and will be able to simply modify the pod object using the executor config
> and the pod mutation hook.
>
>
>
>
>
>
>
> This simplification could also lead to some pretty great new features in
> the KubernetesExecutor
>
>
> Idea 1: Picking a pod_template_file per-task
> Along with the existing customization with the executor config, solely
> relying on pod files can allow users to pick the pod template file that
> they want to use as their base pod on a per-task basis. An Airflow engineer
> could supply several pre-made templates for their data scientists to reduce
> the amount of customization an airflow user would need to use.
>
>
> Idea 2: Merging the KubernetesExecutor into the CeleryExecutor
> One idea that we've been excited about recently has been the idea of
> creating merged Celery and Kubernetes executor. This hybrid executor would
> default to launching celery workers with KEDA and would have the option to
> launch individual tasks using the Kubernetes executor when a user wants
> isolation or customization. Simplifying the Kubernetes executor reduces the
> number of fail-points that this merged executor would need to account for.
>
>
>
> What would we need to do to implement this?
> The good news here is that the hard work has already been done! As of
> AIRFLOW-5413 [https://issues.apache.org/jira/browse/AIRFLOW-5413] by
> David Lum, airflow already has the ability to use base worker pods on a
> template file. This would involve primarily code deletion and very little
> new code.
>
>
> Thank you for your time and I look forward to the community’s discussion.
>
> Daniel



-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>