You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by "Ferruzzi, Dennis" <fe...@amazon.com.INVALID> on 2023/01/04 00:28:01 UTC

Re: [PROPOSAL] Switching our CI runners to K8S controller open-sourced for Apache Arrow

Sounds like a new ASF project: K8S-CI-AAS? :P   Joking aside, I'm all for consolidating effort if they have a solution that works for us and are willing to share the fruits of their labour.


________________________________
From: Jarek Potiuk <ja...@potiuk.com>
Sent: Sunday, December 18, 2022 8:03 AM
To: dev@airflow.apache.org
Subject: [EXTERNAL] [PROPOSAL] Switching our CI runners to K8S controller open-sourced for Apache Arrow


CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.


Hello everyone,

TL;DR: I wanted to make a proposal to move our CI runners from our own "custom" implementation developed mostly by Ash and based on VMs to a newly released Auto-scaling K8S controller that was developed for Apache Arrow by Voltron Data.

I was in contact with Jacob Wujciak who lead the effort in Arrow - and we were also discussing it at the latest ASF build meeting (BTW. Jacob was just approved as an Arrow committer) and I think they have a solid and proven solution, very well documented and working together with the ASF GitHub application that was implemented to distribute ephemeral tokens needed to run the runners.  We would likely keep using Ash's runner for security but this can be easily done in the solution from Voltron Data.

Why would we want to do it?

We wanted to switch from our implementation for quite some time already as what we have is somewhat brittle and rather complex - including multiple AWS-specific technologies (and is our code that we have to maintain in https://github.com/apache/airflow-ci-infra). Actually the fact that we use AWS-specific technologies, was one of the reasons we could not use easily Google Cloud Platform Credits for CI even if they were offered to us in the past.
[https://opengraph.githubassets.com/65c4300bf22c7f627561db56568d836eaae374c94992d69b1ffa12753f658fc9/apache/airflow-ci-infra]<https://github.com/apache/airflow-ci-infra>

GitHub - apache/airflow-ci-infra: Automation around CI infrastructure for Apache Airflow<https://github.com/apache/airflow-ci-infra>
github.com
Automation around CI infrastructure for Apache Airflow - GitHub - apache/airflow-ci-infra: Automation around CI infrastructure for Apache Airflow


I am afraid only Ash knows most of the ins-outs of the scaling code (though both myself and Kaxil were able to fix some stuff and I added a lot of stuff in packer-based installation).

While the current solution is very stable, we sometimes get "job not started" problems and sometimes we have to manually "push" Auto-scaling to work. K8S-based auto-scaling controller is as good as it gets, and we have good relationship with Arrow team and Jacob so we can expect a decent help and cooperation - they will also implement them in very similar setup to ours (with ASF tokens) so our use case will be handled well. Also choosing K8S controller makes it easy to move between clouds or even possible to run it on multiple clouds.

The Discussion on Arrow devlist about it:

https://lists.apache.org/thread/mskpqwpdq65t1wpj4f5klfq9217ljodw

If this will seem like a good idea, I will work on it likely around the end of year and if anyone would like ot help with it, I will be more than happy for others to join me - volunteers are most welcome - so that we will have more hands and eyes knowledgeable about the setup.

J.