You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Javier Llorente Mañas <ja...@gmail.com> on 2020/03/26 21:22:18 UTC

Airflow Kubernetes pod Operator debug library

Hi all! it's Javier.

I am a Data Engineer, I have been working with Airflow and it's Kubernetes
integration for almost one year and a half. It's great and it has helped a
lot to me and the data engineering team I have been working with.

Although I and my colleagues find a recurring issue sometimes whenever we
were creating new dags using the KubernetesPodOperator as sometimes the pod
created was not as we expected. As an example arguments of the container
were not parsing as we expected or Secrets or Kubernetes resources were not
referenced as we want.

I just created this library
https://github.com/Javier162380/AirflowKuberentesDebugger.

The idea is that it can be a simple interface that can generate k8s pod
YAML files before deploying dags into a production environment so we can
test if the dag is going to generate all the k8s pods as we want or
something is wrong. The idea is to have a kind of helm debugger for
Airflow. Also, it can be really useful to recover historical data for
recurrent dags. just changing the pod resources and the container arguments
or entry points.

All your feedback is appreciated.

Cheers,

Javier

Re: Airflow Kubernetes pod Operator debug library

Posted by Jarek Potiuk <Ja...@polidea.com>.
Yeah. This seems like a really good feature (I saw that you added it
to the Awesome
Apache Airflow <https://github.com/jghoman/awesome-apache-airflow> :).
 And I will also be happy to help with getting it in the CLI.

It's rather simple task (especially after some of the refactorings we've
done in the CLI area) Happy to help with guiding you as well. It will
mainly be about adding your code there and convert parameters parsing to
the one used by Airflow :).

On Thu, Mar 26, 2020 at 11:54 PM Javier Llorente Mañas <
javierllorente16@gmail.com> wrote:

> Hi,
>
> Thank you! For the detail explanation.
>
> I will read everything and create a ticket with all the expectations.
>
> Cheers,
>
> Javier
>
> El jue., 26 mar. 2020 a las 23:46, Kamil Breguła (<
> kamil.bregula@polidea.com>)
> escribió:
>
> > Hello,
> >
> > Can you create a ticket in Github and describe your expectations? This
> > will allow you to discuss this topic more easily.
> > https://github.com/apache/airflow/issues/new/choose
> >
> > First of all, you should read the contribution guide, which contains a
> > very detailed description of how to propose changes to the project.
> > https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst
> >
> > In the next step, you should look at the airflow.cli package, which
> > contains all the code related to CLI.
> > https://github.com/apache/airflow/tree/master/airflow/cli
> > You will probably need to create a new root-level command and
> > subcommand., i.e. `airflow kubernetes preview` or something similar.
> >
> > Best regards,
> > Kamil
> >
> > On Thu, Mar 26, 2020 at 11:32 PM Javier Llorente Mañas
> > <ja...@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > Yes, I missed this feature in the Airflow CLI. For me, the ideal
> scenario
> > > is to have something similar to the helm install --debug --dry-run
> > command (
> > > https://helm.sh/docs/chart_template_guide/debugging/). So the same
> that
> > for
> > > a helm chart you generate multiple k8s templates, for an airflow dag
> you
> > > can generate multiple k8s pod templates if the same dag uses multiple
> > > KubernetesPodOperator, all of them with some naming convention may be
> the
> > > name of the dag + name of the task + _debug. Also, it can be
> interesting
> > to
> > > generate all the templates for all the dags present in a path.
> > >
> > > I was thinking to add this "library" to the Airflow CLI directly, but I
> > > don't know the code internals so well. I will like to contribute to the
> > > project but some help will be appreciated.
> > >
> > > On the other hand, I found an issue. To replicate in 100% the templates
> > you
> > > need to set the same variables in your production cluster than in your
> > > local environment, well this happened in my case testing with my
> company
> > > dags. So whenever you type this command maybe you need to run some
> extra
> > > airflow commands to set vars. I create this small CLI to run it locally
> > > with these arguments
> > >
> >
> https://github.com/Javier162380/AirflowKuberentesDebugger/blob/master/airflow_k8s_operator/cli.py
> > > .
> > >
> > > What do you think?
> > >
> > > Cheers
> > >
> > > Javier
> > >
> > >
> > >
> > >
> > > El jue., 26 mar. 2020 a las 23:04, Kamil Breguła (<
> > kamil.bregula@polidea.com>)
> > > escribió:
> > >
> > > > Hello,
> > > >
> > > > The idea is fantastic. I like it very much and it will facilitate the
> > > > work with Kubernethes. I'm just afraid that this tool will not be
> > > > available when I need it. Did you think to add this tool to CLI of
> > > > Airflow? This is the best place to share useful tools.
> > > >
> > > > When I missed the DAG preview in CLI, I added them to CLI.
> > > >
> > > >
> >
> https://airflow.readthedocs.io/en/latest/usage-cli.html#exporting-dags-structure-to-images
> > > >
> > > > Now I'm working on previewing the status of tasks after the dag
> > execution.
> > > > https://github.com/apache/airflow/pull/7776
> > > >
> > > > Best regards,
> > > > Kamil
> > > >
> > > > On Thu, Mar 26, 2020 at 10:22 PM Javier Llorente Mañas
> > > > <ja...@gmail.com> wrote:
> > > > >
> > > > > Hi all! it's Javier.
> > > > >
> > > > > I am a Data Engineer, I have been working with Airflow and it's
> > > > Kubernetes
> > > > > integration for almost one year and a half. It's great and it has
> > helped
> > > > a
> > > > > lot to me and the data engineering team I have been working with.
> > > > >
> > > > > Although I and my colleagues find a recurring issue sometimes
> > whenever we
> > > > > were creating new dags using the KubernetesPodOperator as sometimes
> > the
> > > > pod
> > > > > created was not as we expected. As an example arguments of the
> > container
> > > > > were not parsing as we expected or Secrets or Kubernetes resources
> > were
> > > > not
> > > > > referenced as we want.
> > > > >
> > > > > I just created this library
> > > > > https://github.com/Javier162380/AirflowKuberentesDebugger.
> > > > >
> > > > > The idea is that it can be a simple interface that can generate k8s
> > pod
> > > > > YAML files before deploying dags into a production environment so
> we
> > can
> > > > > test if the dag is going to generate all the k8s pods as we want or
> > > > > something is wrong. The idea is to have a kind of helm debugger for
> > > > > Airflow. Also, it can be really useful to recover historical data
> for
> > > > > recurrent dags. just changing the pod resources and the container
> > > > arguments
> > > > > or entry points.
> > > > >
> > > > > All your feedback is appreciated.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Javier
> > > >
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Airflow Kubernetes pod Operator debug library

Posted by Javier Llorente Mañas <ja...@gmail.com>.
Hi,

Thank you! For the detail explanation.

I will read everything and create a ticket with all the expectations.

Cheers,

Javier

El jue., 26 mar. 2020 a las 23:46, Kamil Breguła (<ka...@polidea.com>)
escribió:

> Hello,
>
> Can you create a ticket in Github and describe your expectations? This
> will allow you to discuss this topic more easily.
> https://github.com/apache/airflow/issues/new/choose
>
> First of all, you should read the contribution guide, which contains a
> very detailed description of how to propose changes to the project.
> https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst
>
> In the next step, you should look at the airflow.cli package, which
> contains all the code related to CLI.
> https://github.com/apache/airflow/tree/master/airflow/cli
> You will probably need to create a new root-level command and
> subcommand., i.e. `airflow kubernetes preview` or something similar.
>
> Best regards,
> Kamil
>
> On Thu, Mar 26, 2020 at 11:32 PM Javier Llorente Mañas
> <ja...@gmail.com> wrote:
> >
> > Hi,
> >
> > Yes, I missed this feature in the Airflow CLI. For me, the ideal scenario
> > is to have something similar to the helm install --debug --dry-run
> command (
> > https://helm.sh/docs/chart_template_guide/debugging/). So the same that
> for
> > a helm chart you generate multiple k8s templates, for an airflow dag you
> > can generate multiple k8s pod templates if the same dag uses multiple
> > KubernetesPodOperator, all of them with some naming convention may be the
> > name of the dag + name of the task + _debug. Also, it can be interesting
> to
> > generate all the templates for all the dags present in a path.
> >
> > I was thinking to add this "library" to the Airflow CLI directly, but I
> > don't know the code internals so well. I will like to contribute to the
> > project but some help will be appreciated.
> >
> > On the other hand, I found an issue. To replicate in 100% the templates
> you
> > need to set the same variables in your production cluster than in your
> > local environment, well this happened in my case testing with my company
> > dags. So whenever you type this command maybe you need to run some extra
> > airflow commands to set vars. I create this small CLI to run it locally
> > with these arguments
> >
> https://github.com/Javier162380/AirflowKuberentesDebugger/blob/master/airflow_k8s_operator/cli.py
> > .
> >
> > What do you think?
> >
> > Cheers
> >
> > Javier
> >
> >
> >
> >
> > El jue., 26 mar. 2020 a las 23:04, Kamil Breguła (<
> kamil.bregula@polidea.com>)
> > escribió:
> >
> > > Hello,
> > >
> > > The idea is fantastic. I like it very much and it will facilitate the
> > > work with Kubernethes. I'm just afraid that this tool will not be
> > > available when I need it. Did you think to add this tool to CLI of
> > > Airflow? This is the best place to share useful tools.
> > >
> > > When I missed the DAG preview in CLI, I added them to CLI.
> > >
> > >
> https://airflow.readthedocs.io/en/latest/usage-cli.html#exporting-dags-structure-to-images
> > >
> > > Now I'm working on previewing the status of tasks after the dag
> execution.
> > > https://github.com/apache/airflow/pull/7776
> > >
> > > Best regards,
> > > Kamil
> > >
> > > On Thu, Mar 26, 2020 at 10:22 PM Javier Llorente Mañas
> > > <ja...@gmail.com> wrote:
> > > >
> > > > Hi all! it's Javier.
> > > >
> > > > I am a Data Engineer, I have been working with Airflow and it's
> > > Kubernetes
> > > > integration for almost one year and a half. It's great and it has
> helped
> > > a
> > > > lot to me and the data engineering team I have been working with.
> > > >
> > > > Although I and my colleagues find a recurring issue sometimes
> whenever we
> > > > were creating new dags using the KubernetesPodOperator as sometimes
> the
> > > pod
> > > > created was not as we expected. As an example arguments of the
> container
> > > > were not parsing as we expected or Secrets or Kubernetes resources
> were
> > > not
> > > > referenced as we want.
> > > >
> > > > I just created this library
> > > > https://github.com/Javier162380/AirflowKuberentesDebugger.
> > > >
> > > > The idea is that it can be a simple interface that can generate k8s
> pod
> > > > YAML files before deploying dags into a production environment so we
> can
> > > > test if the dag is going to generate all the k8s pods as we want or
> > > > something is wrong. The idea is to have a kind of helm debugger for
> > > > Airflow. Also, it can be really useful to recover historical data for
> > > > recurrent dags. just changing the pod resources and the container
> > > arguments
> > > > or entry points.
> > > >
> > > > All your feedback is appreciated.
> > > >
> > > > Cheers,
> > > >
> > > > Javier
> > >
>

Re: Airflow Kubernetes pod Operator debug library

Posted by Kamil Breguła <ka...@polidea.com>.
Hello,

Can you create a ticket in Github and describe your expectations? This
will allow you to discuss this topic more easily.
https://github.com/apache/airflow/issues/new/choose

First of all, you should read the contribution guide, which contains a
very detailed description of how to propose changes to the project.
https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst

In the next step, you should look at the airflow.cli package, which
contains all the code related to CLI.
https://github.com/apache/airflow/tree/master/airflow/cli
You will probably need to create a new root-level command and
subcommand., i.e. `airflow kubernetes preview` or something similar.

Best regards,
Kamil

On Thu, Mar 26, 2020 at 11:32 PM Javier Llorente Mañas
<ja...@gmail.com> wrote:
>
> Hi,
>
> Yes, I missed this feature in the Airflow CLI. For me, the ideal scenario
> is to have something similar to the helm install --debug --dry-run command (
> https://helm.sh/docs/chart_template_guide/debugging/). So the same that for
> a helm chart you generate multiple k8s templates, for an airflow dag you
> can generate multiple k8s pod templates if the same dag uses multiple
> KubernetesPodOperator, all of them with some naming convention may be the
> name of the dag + name of the task + _debug. Also, it can be interesting to
> generate all the templates for all the dags present in a path.
>
> I was thinking to add this "library" to the Airflow CLI directly, but I
> don't know the code internals so well. I will like to contribute to the
> project but some help will be appreciated.
>
> On the other hand, I found an issue. To replicate in 100% the templates you
> need to set the same variables in your production cluster than in your
> local environment, well this happened in my case testing with my company
> dags. So whenever you type this command maybe you need to run some extra
> airflow commands to set vars. I create this small CLI to run it locally
> with these arguments
> https://github.com/Javier162380/AirflowKuberentesDebugger/blob/master/airflow_k8s_operator/cli.py
> .
>
> What do you think?
>
> Cheers
>
> Javier
>
>
>
>
> El jue., 26 mar. 2020 a las 23:04, Kamil Breguła (<ka...@polidea.com>)
> escribió:
>
> > Hello,
> >
> > The idea is fantastic. I like it very much and it will facilitate the
> > work with Kubernethes. I'm just afraid that this tool will not be
> > available when I need it. Did you think to add this tool to CLI of
> > Airflow? This is the best place to share useful tools.
> >
> > When I missed the DAG preview in CLI, I added them to CLI.
> >
> > https://airflow.readthedocs.io/en/latest/usage-cli.html#exporting-dags-structure-to-images
> >
> > Now I'm working on previewing the status of tasks after the dag execution.
> > https://github.com/apache/airflow/pull/7776
> >
> > Best regards,
> > Kamil
> >
> > On Thu, Mar 26, 2020 at 10:22 PM Javier Llorente Mañas
> > <ja...@gmail.com> wrote:
> > >
> > > Hi all! it's Javier.
> > >
> > > I am a Data Engineer, I have been working with Airflow and it's
> > Kubernetes
> > > integration for almost one year and a half. It's great and it has helped
> > a
> > > lot to me and the data engineering team I have been working with.
> > >
> > > Although I and my colleagues find a recurring issue sometimes whenever we
> > > were creating new dags using the KubernetesPodOperator as sometimes the
> > pod
> > > created was not as we expected. As an example arguments of the container
> > > were not parsing as we expected or Secrets or Kubernetes resources were
> > not
> > > referenced as we want.
> > >
> > > I just created this library
> > > https://github.com/Javier162380/AirflowKuberentesDebugger.
> > >
> > > The idea is that it can be a simple interface that can generate k8s pod
> > > YAML files before deploying dags into a production environment so we can
> > > test if the dag is going to generate all the k8s pods as we want or
> > > something is wrong. The idea is to have a kind of helm debugger for
> > > Airflow. Also, it can be really useful to recover historical data for
> > > recurrent dags. just changing the pod resources and the container
> > arguments
> > > or entry points.
> > >
> > > All your feedback is appreciated.
> > >
> > > Cheers,
> > >
> > > Javier
> >

Re: Airflow Kubernetes pod Operator debug library

Posted by Javier Llorente Mañas <ja...@gmail.com>.
Hi,

Yes, I missed this feature in the Airflow CLI. For me, the ideal scenario
is to have something similar to the helm install --debug --dry-run command (
https://helm.sh/docs/chart_template_guide/debugging/). So the same that for
a helm chart you generate multiple k8s templates, for an airflow dag you
can generate multiple k8s pod templates if the same dag uses multiple
KubernetesPodOperator, all of them with some naming convention may be the
name of the dag + name of the task + _debug. Also, it can be interesting to
generate all the templates for all the dags present in a path.

I was thinking to add this "library" to the Airflow CLI directly, but I
don't know the code internals so well. I will like to contribute to the
project but some help will be appreciated.

On the other hand, I found an issue. To replicate in 100% the templates you
need to set the same variables in your production cluster than in your
local environment, well this happened in my case testing with my company
dags. So whenever you type this command maybe you need to run some extra
airflow commands to set vars. I create this small CLI to run it locally
with these arguments
https://github.com/Javier162380/AirflowKuberentesDebugger/blob/master/airflow_k8s_operator/cli.py
.

What do you think?

Cheers

Javier




El jue., 26 mar. 2020 a las 23:04, Kamil Breguła (<ka...@polidea.com>)
escribió:

> Hello,
>
> The idea is fantastic. I like it very much and it will facilitate the
> work with Kubernethes. I'm just afraid that this tool will not be
> available when I need it. Did you think to add this tool to CLI of
> Airflow? This is the best place to share useful tools.
>
> When I missed the DAG preview in CLI, I added them to CLI.
>
> https://airflow.readthedocs.io/en/latest/usage-cli.html#exporting-dags-structure-to-images
>
> Now I'm working on previewing the status of tasks after the dag execution.
> https://github.com/apache/airflow/pull/7776
>
> Best regards,
> Kamil
>
> On Thu, Mar 26, 2020 at 10:22 PM Javier Llorente Mañas
> <ja...@gmail.com> wrote:
> >
> > Hi all! it's Javier.
> >
> > I am a Data Engineer, I have been working with Airflow and it's
> Kubernetes
> > integration for almost one year and a half. It's great and it has helped
> a
> > lot to me and the data engineering team I have been working with.
> >
> > Although I and my colleagues find a recurring issue sometimes whenever we
> > were creating new dags using the KubernetesPodOperator as sometimes the
> pod
> > created was not as we expected. As an example arguments of the container
> > were not parsing as we expected or Secrets or Kubernetes resources were
> not
> > referenced as we want.
> >
> > I just created this library
> > https://github.com/Javier162380/AirflowKuberentesDebugger.
> >
> > The idea is that it can be a simple interface that can generate k8s pod
> > YAML files before deploying dags into a production environment so we can
> > test if the dag is going to generate all the k8s pods as we want or
> > something is wrong. The idea is to have a kind of helm debugger for
> > Airflow. Also, it can be really useful to recover historical data for
> > recurrent dags. just changing the pod resources and the container
> arguments
> > or entry points.
> >
> > All your feedback is appreciated.
> >
> > Cheers,
> >
> > Javier
>

Re: Airflow Kubernetes pod Operator debug library

Posted by Kamil Breguła <ka...@polidea.com>.
Hello,

The idea is fantastic. I like it very much and it will facilitate the
work with Kubernethes. I'm just afraid that this tool will not be
available when I need it. Did you think to add this tool to CLI of
Airflow? This is the best place to share useful tools.

When I missed the DAG preview in CLI, I added them to CLI.
https://airflow.readthedocs.io/en/latest/usage-cli.html#exporting-dags-structure-to-images

Now I'm working on previewing the status of tasks after the dag execution.
https://github.com/apache/airflow/pull/7776

Best regards,
Kamil

On Thu, Mar 26, 2020 at 10:22 PM Javier Llorente Mañas
<ja...@gmail.com> wrote:
>
> Hi all! it's Javier.
>
> I am a Data Engineer, I have been working with Airflow and it's Kubernetes
> integration for almost one year and a half. It's great and it has helped a
> lot to me and the data engineering team I have been working with.
>
> Although I and my colleagues find a recurring issue sometimes whenever we
> were creating new dags using the KubernetesPodOperator as sometimes the pod
> created was not as we expected. As an example arguments of the container
> were not parsing as we expected or Secrets or Kubernetes resources were not
> referenced as we want.
>
> I just created this library
> https://github.com/Javier162380/AirflowKuberentesDebugger.
>
> The idea is that it can be a simple interface that can generate k8s pod
> YAML files before deploying dags into a production environment so we can
> test if the dag is going to generate all the k8s pods as we want or
> something is wrong. The idea is to have a kind of helm debugger for
> Airflow. Also, it can be really useful to recover historical data for
> recurrent dags. just changing the pod resources and the container arguments
> or entry points.
>
> All your feedback is appreciated.
>
> Cheers,
>
> Javier