You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Gerardo Curiel <ge...@gerar.do> on 2019/01/02 04:10:45 UTC

AIP-7 Simplified development workflow

Hi folks,

I've created an AIP for simplifying Airflow's development workflow:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow

The goal of this proposal is to outline the work needed to make local
testing significantly easier and standardise the best practices to
contribute to the Airflow project.

Any input on it would be greatly appreciated.

Cheers,

--
Gerardo Curiel // https://gerar.do

Re: AIP-7 Simplified development workflow

Posted by Gerardo Curiel <ge...@gerar.do>.
Hi Fokko,

On 2 January 2019 at 6:46:00 pm, Driesprong, Fokko (fokko@driesprong.frl)
wrote:

Hi Gerardo,

Very valid points. I'm fully in favor of your proposal. To simplify the
stack, I strongly believe we should also strip out tox and fully rely on
Docker. Using tox will add another layer that doesn't add a lot of value
from my perspective. Also, we should bake all the *.sh bootstrap scripts
<https://github.com/apache/incubator-airflow/tree/master/scripts/ci> in the
Docker container, instead of having to set this up before running the
tests.


Sounds great! I'll add this to the proposal.

Cheers,

--
Gerardo Curiel // https://gerar.do

Re: AIP-7 Simplified development workflow

Posted by Gerardo Curiel <ge...@gerar.do>.
I'll start adding some comments and updates to the proposal and
sub-projects.

On 3 January 2019 at 8:57:41 am, Daniel Imberman (daniel.imberman@gmail.com)
wrote:

Hi guys, I've set up a few sub-projects for this. @gerardo @fokko Lemme
know what you guys think

https://cwiki.apache.org/confluence/display/AIRFLOW/Optimizing+Docker+Image+Workflow
https://cwiki.apache.org/confluence/display/AIRFLOW/Kubernetes+Testing%3A+Using+GKE+instead+of+Minikube

On Tue, Jan 1, 2019 at 11:45 PM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Hi Gerardo,
>
> Very valid points. I'm fully in favor of your proposal. To simplify the
> stack, I strongly believe we should also strip out tox and fully rely on
> Docker. Using tox will add another layer that doesn't add a lot of value
> from my perspective. Also, we should bake all the *.sh bootstrap scripts
> <https://github.com/apache/incubator-airflow/tree/master/scripts/ci> in
> the
> Docker container, instead of having to set this up before running the
> tests.
>
> In the upcoming months, I might have a bit more time to spend on Airflow,
> I'm happy to assist you on this one.
>
> Cheers, Fokko
>
> Op wo 2 jan. 2019 om 06:51 schreef Daniel Imberman <
> daniel.imberman@gmail.com>:
>
> > @gerardo thank you for setting this up.
> >
> > I've also been extremely interested in this as well. I've been messing
> with
> > GCP VM instances in the past few weeks to try to simplify my local
build
> as
> > well. Would definitely be interested in helping with the AIP +
> > implementation.
> >
> > One thing I believe we should do is set up the ci base-image with all
of
> > the pip dependencies pre-loaded. A lot of time is wasted pip installing
> > dependencies. We can auto-generate new images whenever a PR is
submitted
> to
> > this repository and then specify the tag in the .travis.yml when
> building.
> >
> > On the k8s side, I think we need to move away from minikube for k8s
> > testing. I discussed in a previous email setting travis to work with
GKE.
> > I'd be careful about coupling k8s stuff too tightly with a docker
> > infrastructure. That can get pretty dicey. I think as long as we're
> using a
> > separate k8s cluster the k8s executor tests only need to gather the IP
> > addresses + have access to the kubeconfig.
> >
> >
> > On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <ge...@gerar.do> wrote:
> >
> > > Hi folks,
> > >
> > > I've created an AIP for simplifying Airflow's development workflow:
> > >
> > >
> >
>
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
> > >
> > > The goal of this proposal is to outline the work needed to make local
> > > testing significantly easier and standardise the best practices to
> > > contribute to the Airflow project.
> > >
> > > Any input on it would be greatly appreciated.
> > >
> > > Cheers,
> > >
> > > --
> > > Gerardo Curiel // https://gerar.do
> >
> >
> > On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <ge...@gerar.do> wrote:
> >
> > > Hi folks,
> > >
> > > I've created an AIP for simplifying Airflow's development workflow:
> > >
> > >
> >
>
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
> > >
> > > The goal of this proposal is to outline the work needed to make local
> > > testing significantly easier and standardise the best practices to
> > > contribute to the Airflow project.
> > >
> > > Any input on it would be greatly appreciated.
> > >
> > > Cheers,
> > >
> > > --
> > > Gerardo Curiel // https://gerar.do
> > >
> >
>

--
Gerardo Curiel // https://gerar.do

Re: AIP-7 Simplified development workflow

Posted by Gerardo Curiel <ge...@gerar.do>.
Hi Jarek,

On 3 January 2019 at 9:14:29 pm, Jarek Potiuk (jarek.potiuk@polidea.com)
wrote:

You can find our environment here:
https://github.com/PolideaInternal/airflow-breeze - we call it '*Airflow
breeze*' like in *"it's a breeze to work with Aiflow and GCP"*. t's
targeted to make our work easier for Google Cloud Platform operators
development but it has many things implemented that you are talking about:


Thanks for making this available. It should be interesting it see how you
guys tackled some of these problems. I'll have a look.

I would be supper happy if we can contribute what we've done there.
Currently we have some very small commit that we cherry-pick in our
branches to be able to use Automated Cloud Build (namely cloudbuild.yaml
file - similar to .travis.yml) but if we can modify it and make it part of
the main Apache project - we would be more than happy to do it!


I'd prefer if we could add changes to what we have right now,
progressively. It makes it easier for committers to review and we can get
small gains pretty quickly. I'll add this to the proposal.

Cheers,

--
Gerardo Curiel // https://gerar.do

Re: AIP-7 Simplified development workflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
One more comment: I also updated the AIP-4 proposal: Support for System
Tests for external systems
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+System+Tests+for+external+systems>
that
you referred to, to better reflect of what we have done so far for GCP
operators.

Side comment: We are using the automated System Tests (that's the name we
found is better than Integration Tests) for quite some time now for our GCP
development and we found it super useful to detect some obscure errors
(usually related to Python version incompatibilities) before they hit
someone else. Here are some example bugs we detected and mostly fixed
thanks to that: AIRFLOW-3615
<https://issues.apache.org/jira/browse/AIRFLOW-3615>
- AIRFLOW-3527 <https://issues.apache.org/jira/browse/AIRFLOW-3527>
- AIRFLOW-3416 <https://issues.apache.org/jira/browse/AIRFLOW-3416>,
AIRFLOW-3263 <https://issues.apache.org/jira/browse/AIRFLOW-3263> (but
there were many more that never made it to JIRA because we detected it
before the code was merged)

J.

On Thu, Jan 3, 2019 at 11:14 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Hello everyone,
>
> I am really, really happy to help with that as it has been focus of my
> attention for the last couple of months in our team at Polidea.
> Maybe we can use what we have done for our own development environment for
> Airflow for Google Cloud Platform.
>
> We are ready to share what we have done and contribute to Apache in
> whatever form is appropriate. Either incorporating parts of what we've done
> or (possibly) using what we've done as starting point and adding what's
> missing from the current TravisCI setup. I think the latter will be far
> easier and faster - but it's just my opinion as I know it very well now :).
>
> Last few months in Polidea (my company) we developed (and contributed to
> Airflow's contrib) more than 30 Google Cloud Platform related operators and
> a number of bugfixes to the core Airflow. We worked as a team (3 people)
> and we created pretty complete and sophisticated, very well documented
> development environment to be more productive and to work as a team. We are
> going to add 40 more operators and add new team members in the coming
> months so we had to be productive :).
>
> You can find our environment here:
> https://github.com/PolideaInternal/airflow-breeze - we call it '*Airflow
> breeze*' like in *"it's a breeze to work with Aiflow and GCP"*. t's
> targeted to make our work easier for Google Cloud Platform operators
> development but it has many things implemented that you are talking about:
>
> *Supported features:*
>
>    - Simplified, nicely layered and optimised for speed (especially
>    cassandra driver) of building Dockerfile
>    <https://github.com/PolideaInternal/airflow-breeze/blob/master/Dockerfile>
>    that supports three python versions - 2.7, 3.5 (used in Google Composer)
>    and 3.6.  Note that there are many problems with compatibility between 3.5
>    and 3.6 so we introduced all three versions.
>    - Google Cloud Build CI scripts for cloud build are already part of
>    the image (similarly as suggested for Travis CI ones).
>    - We dropped *tox* support in favour of Google Cloud Build parallel
>    builds with separate docker containers.
>    - We have a built-in support for unique naming of resources so that
>    multiple builds
>    - We have automation of local environment (virtualenvs) for running
>    some unit and system tests locally - not only via docker container (which
>    makes it far easier for debugging) - for example using local IDE
>    - Documentation how to work with unit tests
>    <https://github.com/PolideaInternal/airflow-breeze/blob/master/README.unittests.md>
>     and system tests
>    <https://github.com/PolideaInternal/airflow-breeze/blob/master/README.unittests.md> (see
>    below for system tests description) - including description on how to
>    integrate with IntelliJ/Pycharm and work efficiently with debugging -
>    including remote debugging of environment (includes some screenshots).
>    - Support for automated Cloud Build and system tests
>    - Nice, documented ./run_environment.sh
>    <https://github.com/PolideaInternal/airflow-breeze#appendix-current-run_environment-flags> script
>    that supports image building/uplod/download from registry, choosing GCP
>    project id and Service account keys, support for multiple workspaces,
>    -  Prerequisites, setting up and bootstrapping the local project frpm
>    scratch
>    <https://github.com/PolideaInternal/airflow-breeze/blob/master/README.setup.md> -
>    documentation + automation of checkout of the project and shared team
>    configuration - that includes documentation on how to configure your local
>    virtualenvs and manage docker image and the whole environment
>    - The Dockerfile and ./run_environment.sh is built in the way that
>    local sources are shared with the Docker container so you can edit your
>    sources while running the tests in the container. Super helpful for fast
>    development cycle.
>    - A number of nice development nice small features - such as bash
>    history support in docker, automated setting of common configuration
>    variables shared between the team etc
>
> *What's missing:*
>
>    - What is missing comparing to the current Travis CI is docker compose
>    to support external dependencies (mysql etc.) - this does not play well
>    with Google Cloud Build with their docker-in-docker approach but if we run
>    in Travis CI this should be perfectly fine to run the airflow-breeze image
>    there through docker compose, or it might turn easier to install mysql
>    within the image itself rather than docker compose - it will make it much
>    easier to multiply docker instances and run them in paralel. In our
>    environment we start Postgres DB in docker and run all system tests using
>    local executor + Postgres and it's super easy to run tests on multiple
>    environments this way even running them on the same machine (this will be
>    more complex with docker compose)
>    - Also Breeze is closely tied with Google Cloud for Cloud Build - but
>    we can, fairly easily make it an optional component. We also have not
>    focused on Kubernetes workers but as I understand we want to go to GKE -
>    which would make it even better as we will need Google Cloud Platform
>    integration baked in - and we already have it and we could use the same
>    mechanisms. We can also leverage our contacts with Google team and maybe we
>    can ask Google to donate some recurring credits to make a shared Google
>    Cloud Platform project so that we can have a shared Airflow GCP project to
>    integrate everything there.
>
>
> *Some more information about Airflow Breeze's Cloud Build support and
> System Tests. *
>
> We have a design doc
> <https://docs.google.com/document/d/15hdqL4bWU0646nAvxsEjIEr0gHOhMu6OByDWI1oiE7w/edit?usp=drive_web&ouid=112320280470690058978> that
> describes the whole environment. A number of things there are GCP related -
> we have integration with Google Cloud services (Cloud Build, Functions,
> PubSub, Repositories) to run our automated System Tests. One interesting
> feature of Airflow Breeze's  is to be able to easily configure and run
> System Tests with Google Cloud Platform (
> https://github.com/PolideaInternal/airflow-breeze/blob/master/README.systemtests.md).
> We also have really nice Slack notifications
> <https://github.com/PolideaInternal/airflow-breeze/blob/master/images/slack_notification.png>
> after build is complete + automated summary
> <https://storage.googleapis.com/polidea-airflow-builds/6ed0e876-2fe3-41b4-90d0-4fa839901085/index.html>
> showing result of automated system tests + automatically generated
> documentation
> <https://storage.googleapis.com/polidea-airflow-builds/6ed0e876-2fe3-41b4-90d0-4fa839901085/docs/index.html>
> + logs from system tests
> <https://console.cloud.google.com/storage/browser/polidea-airflow-builds/6ed0e876-2fe3-41b4-90d0-4fa839901085/logs?project=polidea-airflow>.
> We do not aim for it to replace Travis CI (which we also run) - it's
> complementary to Travis and it runs only relevant GCP unit tests and System
> Tests with the real GCP project of ours.
>
> I initially described our intentions in AIP-4
> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+Integration+Tests> (which
> is also mentioned in AIP=7) - but I will soon change AIP-4 description to
> match what we've actually develop for our own usage - which is GCP-specific
> and not aimed to replace the Travis CI testing.
>
> *Few more words about Google Cloud Platform integration of Airflow Breeze*
>
> Currently it is implemented in this way that each team can have it's own
> Google Project ID to work on (or even several projects because we support
> multiple workspaces) and we have the way to easily bootstrap the project in
> the GCP project from the scratch - that includes automated setup of all the
> required permissions, service accounts, service APIs, creating and filling
> test buckets, preparing Google Cloud Build triggers and so one - so
> literally in 20 minutes you can have a new GCP project up and running -
> ready to run your system tests.
>
> I would be supper happy if we can contribute what we've done there.
> Currently we have some very small commit that we cherry-pick in our
> branches to be able to use Automated Cloud Build (namely cloudbuild.yaml
> file - similar to .travis.yml) but if we can modify it and make it part of
> the main Apache project - we would be more than happy to do it!
>
> Let me know what you think !
>
> J.
>
>
> On Wed, Jan 2, 2019 at 10:57 PM Daniel Imberman <da...@gmail.com>
> wrote:
>
>> Hi guys, I've set up a few sub-projects for this. @gerardo @fokko Lemme
>> know what you guys think
>>
>>
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Optimizing+Docker+Image+Workflow
>>
>> https://cwiki.apache.org/confluence/display/AIRFLOW/Kubernetes+Testing%3A+Using+GKE+instead+of+Minikube
>>
>> On Tue, Jan 1, 2019 at 11:45 PM Driesprong, Fokko <fo...@driesprong.frl>
>> wrote:
>>
>> > Hi Gerardo,
>> >
>> > Very valid points. I'm fully in favor of your proposal. To simplify the
>> > stack, I strongly believe we should also strip out tox and fully rely on
>> > Docker. Using tox will add another layer that doesn't add a lot of value
>> > from my perspective. Also, we should bake all the *.sh bootstrap scripts
>> > <https://github.com/apache/incubator-airflow/tree/master/scripts/ci> in
>> > the
>> > Docker container, instead of having to set this up before running the
>> > tests.
>> >
>> > In the upcoming months, I might have a bit more time to spend on
>> Airflow,
>> > I'm happy to assist you on this one.
>> >
>> > Cheers, Fokko
>> >
>> > Op wo 2 jan. 2019 om 06:51 schreef Daniel Imberman <
>> > daniel.imberman@gmail.com>:
>> >
>> > > @gerardo thank you for setting this up.
>> > >
>> > > I've also been extremely interested in this as well. I've been messing
>> > with
>> > > GCP VM instances in the past few weeks to try to simplify my local
>> build
>> > as
>> > > well. Would definitely be interested in helping with the AIP +
>> > > implementation.
>> > >
>> > > One thing I believe we should do is set up the ci base-image with all
>> of
>> > > the pip dependencies pre-loaded. A lot of time is wasted pip
>> installing
>> > > dependencies. We can auto-generate new images whenever a PR is
>> submitted
>> > to
>> > > this repository and then specify the tag in the .travis.yml when
>> > building.
>> > >
>> > > On the k8s side, I think we need to move away from minikube for k8s
>> > > testing. I discussed in a previous email setting travis to work with
>> GKE.
>> > > I'd be careful about coupling k8s stuff too tightly with a docker
>> > > infrastructure. That can get pretty dicey. I think as long as we're
>> > using a
>> > > separate k8s cluster the k8s executor tests only need to gather the IP
>> > > addresses + have access to the kubeconfig.
>> > >
>> > >
>> > > On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <ge...@gerar.do>
>> wrote:
>> > >
>> > > > Hi folks,
>> > > >
>> > > > I've created an AIP for simplifying Airflow's development workflow:
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
>> > > >
>> > > > The goal of this proposal is to outline the work needed to make
>> local
>> > > > testing significantly easier and standardise the best practices to
>> > > > contribute to the Airflow project.
>> > > >
>> > > > Any input on it would be greatly appreciated.
>> > > >
>> > > > Cheers,
>> > > >
>> > > > --
>> > > > Gerardo Curiel // https://gerar.do
>> > >
>> > >
>> > > On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <ge...@gerar.do>
>> wrote:
>> > >
>> > > > Hi folks,
>> > > >
>> > > > I've created an AIP for simplifying Airflow's development workflow:
>> > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
>> > > >
>> > > > The goal of this proposal is to outline the work needed to make
>> local
>> > > > testing significantly easier and standardise the best practices to
>> > > > contribute to the Airflow project.
>> > > >
>> > > > Any input on it would be greatly appreciated.
>> > > >
>> > > > Cheers,
>> > > >
>> > > > --
>> > > > Gerardo Curiel // https://gerar.do
>> > > >
>> > >
>> >
>>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> E: jarek.potiuk@polidea.com
> [image: Polidea] <https://www.polidea.com/>
>
> We create human & business stories through technology.
> Check out our projects! <https://www.polidea.com/our-work>
> [image: Github] <https://github.com/Polidea> [image: Facebook]
> <https://www.facebook.com/Polidea.Software> [image: Twitter]
> <https://twitter.com/polidea> [image: Linkedin]
> <https://www.linkedin.com/company/polidea> [image: Instagram]
> <https://instagram.com/polidea> [image: Behance]
> <https://www.behance.net/polidea>
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
E: jarek.potiuk@polidea.com
[image: Polidea] <https://www.polidea.com/>

We create human & business stories through technology.
Check out our projects! <https://www.polidea.com/our-work>
[image: Github] <https://github.com/Polidea> [image: Facebook]
<https://www.facebook.com/Polidea.Software> [image: Twitter]
<https://twitter.com/polidea> [image: Linkedin]
<https://www.linkedin.com/company/polidea> [image: Instagram]
<https://instagram.com/polidea> [image: Behance]
<https://www.behance.net/polidea>

Re: AIP-7 Simplified development workflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
Hello everyone,

I am really, really happy to help with that as it has been focus of my
attention for the last couple of months in our team at Polidea.
Maybe we can use what we have done for our own development environment for
Airflow for Google Cloud Platform.

We are ready to share what we have done and contribute to Apache in
whatever form is appropriate. Either incorporating parts of what we've done
or (possibly) using what we've done as starting point and adding what's
missing from the current TravisCI setup. I think the latter will be far
easier and faster - but it's just my opinion as I know it very well now :).

Last few months in Polidea (my company) we developed (and contributed to
Airflow's contrib) more than 30 Google Cloud Platform related operators and
a number of bugfixes to the core Airflow. We worked as a team (3 people)
and we created pretty complete and sophisticated, very well documented
development environment to be more productive and to work as a team. We are
going to add 40 more operators and add new team members in the coming
months so we had to be productive :).

You can find our environment here:
https://github.com/PolideaInternal/airflow-breeze - we call it '*Airflow
breeze*' like in *"it's a breeze to work with Aiflow and GCP"*. t's
targeted to make our work easier for Google Cloud Platform operators
development but it has many things implemented that you are talking about:

*Supported features:*

   - Simplified, nicely layered and optimised for speed (especially
   cassandra driver) of building Dockerfile
   <https://github.com/PolideaInternal/airflow-breeze/blob/master/Dockerfile>
   that supports three python versions - 2.7, 3.5 (used in Google Composer)
   and 3.6.  Note that there are many problems with compatibility between 3.5
   and 3.6 so we introduced all three versions.
   - Google Cloud Build CI scripts for cloud build are already part of the
   image (similarly as suggested for Travis CI ones).
   - We dropped *tox* support in favour of Google Cloud Build parallel
   builds with separate docker containers.
   - We have a built-in support for unique naming of resources so that
   multiple builds
   - We have automation of local environment (virtualenvs) for running some
   unit and system tests locally - not only via docker container (which makes
   it far easier for debugging) - for example using local IDE
   - Documentation how to work with unit tests
   <https://github.com/PolideaInternal/airflow-breeze/blob/master/README.unittests.md>
    and system tests
   <https://github.com/PolideaInternal/airflow-breeze/blob/master/README.unittests.md>
(see
   below for system tests description) - including description on how to
   integrate with IntelliJ/Pycharm and work efficiently with debugging -
   including remote debugging of environment (includes some screenshots).
   - Support for automated Cloud Build and system tests
   - Nice, documented ./run_environment.sh
   <https://github.com/PolideaInternal/airflow-breeze#appendix-current-run_environment-flags>
script
   that supports image building/uplod/download from registry, choosing GCP
   project id and Service account keys, support for multiple workspaces,
   -  Prerequisites, setting up and bootstrapping the local project frpm
   scratch
   <https://github.com/PolideaInternal/airflow-breeze/blob/master/README.setup.md>
-
   documentation + automation of checkout of the project and shared team
   configuration - that includes documentation on how to configure your local
   virtualenvs and manage docker image and the whole environment
   - The Dockerfile and ./run_environment.sh is built in the way that local
   sources are shared with the Docker container so you can edit your sources
   while running the tests in the container. Super helpful for fast
   development cycle.
   - A number of nice development nice small features - such as bash
   history support in docker, automated setting of common configuration
   variables shared between the team etc

*What's missing:*

   - What is missing comparing to the current Travis CI is docker compose
   to support external dependencies (mysql etc.) - this does not play well
   with Google Cloud Build with their docker-in-docker approach but if we run
   in Travis CI this should be perfectly fine to run the airflow-breeze image
   there through docker compose, or it might turn easier to install mysql
   within the image itself rather than docker compose - it will make it much
   easier to multiply docker instances and run them in paralel. In our
   environment we start Postgres DB in docker and run all system tests using
   local executor + Postgres and it's super easy to run tests on multiple
   environments this way even running them on the same machine (this will be
   more complex with docker compose)
   - Also Breeze is closely tied with Google Cloud for Cloud Build - but we
   can, fairly easily make it an optional component. We also have not focused
   on Kubernetes workers but as I understand we want to go to GKE - which
   would make it even better as we will need Google Cloud Platform integration
   baked in - and we already have it and we could use the same mechanisms. We
   can also leverage our contacts with Google team and maybe we can ask Google
   to donate some recurring credits to make a shared Google Cloud Platform
   project so that we can have a shared Airflow GCP project to integrate
   everything there.


*Some more information about Airflow Breeze's Cloud Build support and
System Tests. *

We have a design doc
<https://docs.google.com/document/d/15hdqL4bWU0646nAvxsEjIEr0gHOhMu6OByDWI1oiE7w/edit?usp=drive_web&ouid=112320280470690058978>
that
describes the whole environment. A number of things there are GCP related -
we have integration with Google Cloud services (Cloud Build, Functions,
PubSub, Repositories) to run our automated System Tests. One interesting
feature of Airflow Breeze's  is to be able to easily configure and run
System Tests with Google Cloud Platform (
https://github.com/PolideaInternal/airflow-breeze/blob/master/README.systemtests.md).
We also have really nice Slack notifications
<https://github.com/PolideaInternal/airflow-breeze/blob/master/images/slack_notification.png>
after build is complete + automated summary
<https://storage.googleapis.com/polidea-airflow-builds/6ed0e876-2fe3-41b4-90d0-4fa839901085/index.html>
showing result of automated system tests + automatically generated
documentation
<https://storage.googleapis.com/polidea-airflow-builds/6ed0e876-2fe3-41b4-90d0-4fa839901085/docs/index.html>
+ logs from system tests
<https://console.cloud.google.com/storage/browser/polidea-airflow-builds/6ed0e876-2fe3-41b4-90d0-4fa839901085/logs?project=polidea-airflow>.
We do not aim for it to replace Travis CI (which we also run) - it's
complementary to Travis and it runs only relevant GCP unit tests and System
Tests with the real GCP project of ours.

I initially described our intentions in AIP-4
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+Integration+Tests>
(which
is also mentioned in AIP=7) - but I will soon change AIP-4 description to
match what we've actually develop for our own usage - which is GCP-specific
and not aimed to replace the Travis CI testing.

*Few more words about Google Cloud Platform integration of Airflow Breeze*

Currently it is implemented in this way that each team can have it's own
Google Project ID to work on (or even several projects because we support
multiple workspaces) and we have the way to easily bootstrap the project in
the GCP project from the scratch - that includes automated setup of all the
required permissions, service accounts, service APIs, creating and filling
test buckets, preparing Google Cloud Build triggers and so one - so
literally in 20 minutes you can have a new GCP project up and running -
ready to run your system tests.

I would be supper happy if we can contribute what we've done there.
Currently we have some very small commit that we cherry-pick in our
branches to be able to use Automated Cloud Build (namely cloudbuild.yaml
file - similar to .travis.yml) but if we can modify it and make it part of
the main Apache project - we would be more than happy to do it!

Let me know what you think !

J.


On Wed, Jan 2, 2019 at 10:57 PM Daniel Imberman <da...@gmail.com>
wrote:

> Hi guys, I've set up a few sub-projects for this. @gerardo @fokko Lemme
> know what you guys think
>
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/Optimizing+Docker+Image+Workflow
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/Kubernetes+Testing%3A+Using+GKE+instead+of+Minikube
>
> On Tue, Jan 1, 2019 at 11:45 PM Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
>
> > Hi Gerardo,
> >
> > Very valid points. I'm fully in favor of your proposal. To simplify the
> > stack, I strongly believe we should also strip out tox and fully rely on
> > Docker. Using tox will add another layer that doesn't add a lot of value
> > from my perspective. Also, we should bake all the *.sh bootstrap scripts
> > <https://github.com/apache/incubator-airflow/tree/master/scripts/ci> in
> > the
> > Docker container, instead of having to set this up before running the
> > tests.
> >
> > In the upcoming months, I might have a bit more time to spend on Airflow,
> > I'm happy to assist you on this one.
> >
> > Cheers, Fokko
> >
> > Op wo 2 jan. 2019 om 06:51 schreef Daniel Imberman <
> > daniel.imberman@gmail.com>:
> >
> > > @gerardo thank you for setting this up.
> > >
> > > I've also been extremely interested in this as well. I've been messing
> > with
> > > GCP VM instances in the past few weeks to try to simplify my local
> build
> > as
> > > well. Would definitely be interested in helping with the AIP +
> > > implementation.
> > >
> > > One thing I believe we should do is set up the ci base-image with all
> of
> > > the pip dependencies pre-loaded. A lot of time is wasted pip installing
> > > dependencies. We can auto-generate new images whenever a PR is
> submitted
> > to
> > > this repository and then specify the tag in the .travis.yml when
> > building.
> > >
> > > On the k8s side, I think we need to move away from minikube for k8s
> > > testing. I discussed in a previous email setting travis to work with
> GKE.
> > > I'd be careful about coupling k8s stuff too tightly with a docker
> > > infrastructure. That can get pretty dicey. I think as long as we're
> > using a
> > > separate k8s cluster the k8s executor tests only need to gather the IP
> > > addresses + have access to the kubeconfig.
> > >
> > >
> > > On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <ge...@gerar.do>
> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I've created an AIP for simplifying Airflow's development workflow:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
> > > >
> > > > The goal of this proposal is to outline the work needed to make local
> > > > testing significantly easier and standardise the best practices to
> > > > contribute to the Airflow project.
> > > >
> > > > Any input on it would be greatly appreciated.
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Gerardo Curiel // https://gerar.do
> > >
> > >
> > > On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <ge...@gerar.do>
> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I've created an AIP for simplifying Airflow's development workflow:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
> > > >
> > > > The goal of this proposal is to outline the work needed to make local
> > > > testing significantly easier and standardise the best practices to
> > > > contribute to the Airflow project.
> > > >
> > > > Any input on it would be greatly appreciated.
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Gerardo Curiel // https://gerar.do
> > > >
> > >
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
E: jarek.potiuk@polidea.com
[image: Polidea] <https://www.polidea.com/>

We create human & business stories through technology.
Check out our projects! <https://www.polidea.com/our-work>
[image: Github] <https://github.com/Polidea> [image: Facebook]
<https://www.facebook.com/Polidea.Software> [image: Twitter]
<https://twitter.com/polidea> [image: Linkedin]
<https://www.linkedin.com/company/polidea> [image: Instagram]
<https://instagram.com/polidea> [image: Behance]
<https://www.behance.net/polidea>

Re: AIP-7 Simplified development workflow

Posted by Daniel Imberman <da...@gmail.com>.
Hi guys, I've set up a few sub-projects for this. @gerardo @fokko Lemme
know what you guys think

https://cwiki.apache.org/confluence/display/AIRFLOW/Optimizing+Docker+Image+Workflow
https://cwiki.apache.org/confluence/display/AIRFLOW/Kubernetes+Testing%3A+Using+GKE+instead+of+Minikube

On Tue, Jan 1, 2019 at 11:45 PM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Hi Gerardo,
>
> Very valid points. I'm fully in favor of your proposal. To simplify the
> stack, I strongly believe we should also strip out tox and fully rely on
> Docker. Using tox will add another layer that doesn't add a lot of value
> from my perspective. Also, we should bake all the *.sh bootstrap scripts
> <https://github.com/apache/incubator-airflow/tree/master/scripts/ci> in
> the
> Docker container, instead of having to set this up before running the
> tests.
>
> In the upcoming months, I might have a bit more time to spend on Airflow,
> I'm happy to assist you on this one.
>
> Cheers, Fokko
>
> Op wo 2 jan. 2019 om 06:51 schreef Daniel Imberman <
> daniel.imberman@gmail.com>:
>
> > @gerardo thank you for setting this up.
> >
> > I've also been extremely interested in this as well. I've been messing
> with
> > GCP VM instances in the past few weeks to try to simplify my local build
> as
> > well. Would definitely be interested in helping with the AIP +
> > implementation.
> >
> > One thing I believe we should do is set up the ci base-image with all of
> > the pip dependencies pre-loaded. A lot of time is wasted pip installing
> > dependencies. We can auto-generate new images whenever a PR is submitted
> to
> > this repository and then specify the tag in the .travis.yml when
> building.
> >
> > On the k8s side, I think we need to move away from minikube for k8s
> > testing. I discussed in a previous email setting travis to work with GKE.
> > I'd be careful about coupling k8s stuff too tightly with a docker
> > infrastructure. That can get pretty dicey. I think as long as we're
> using a
> > separate k8s cluster the k8s executor tests only need to gather the IP
> > addresses + have access to the kubeconfig.
> >
> >
> > On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <ge...@gerar.do> wrote:
> >
> > > Hi folks,
> > >
> > > I've created an AIP for simplifying Airflow's development workflow:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
> > >
> > > The goal of this proposal is to outline the work needed to make local
> > > testing significantly easier and standardise the best practices to
> > > contribute to the Airflow project.
> > >
> > > Any input on it would be greatly appreciated.
> > >
> > > Cheers,
> > >
> > > --
> > > Gerardo Curiel // https://gerar.do
> >
> >
> > On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <ge...@gerar.do> wrote:
> >
> > > Hi folks,
> > >
> > > I've created an AIP for simplifying Airflow's development workflow:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
> > >
> > > The goal of this proposal is to outline the work needed to make local
> > > testing significantly easier and standardise the best practices to
> > > contribute to the Airflow project.
> > >
> > > Any input on it would be greatly appreciated.
> > >
> > > Cheers,
> > >
> > > --
> > > Gerardo Curiel // https://gerar.do
> > >
> >
>

Re: AIP-7 Simplified development workflow

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Hi Gerardo,

Very valid points. I'm fully in favor of your proposal. To simplify the
stack, I strongly believe we should also strip out tox and fully rely on
Docker. Using tox will add another layer that doesn't add a lot of value
from my perspective. Also, we should bake all the *.sh bootstrap scripts
<https://github.com/apache/incubator-airflow/tree/master/scripts/ci> in the
Docker container, instead of having to set this up before running the tests.

In the upcoming months, I might have a bit more time to spend on Airflow,
I'm happy to assist you on this one.

Cheers, Fokko

Op wo 2 jan. 2019 om 06:51 schreef Daniel Imberman <
daniel.imberman@gmail.com>:

> @gerardo thank you for setting this up.
>
> I've also been extremely interested in this as well. I've been messing with
> GCP VM instances in the past few weeks to try to simplify my local build as
> well. Would definitely be interested in helping with the AIP +
> implementation.
>
> One thing I believe we should do is set up the ci base-image with all of
> the pip dependencies pre-loaded. A lot of time is wasted pip installing
> dependencies. We can auto-generate new images whenever a PR is submitted to
> this repository and then specify the tag in the .travis.yml when building.
>
> On the k8s side, I think we need to move away from minikube for k8s
> testing. I discussed in a previous email setting travis to work with GKE.
> I'd be careful about coupling k8s stuff too tightly with a docker
> infrastructure. That can get pretty dicey. I think as long as we're using a
> separate k8s cluster the k8s executor tests only need to gather the IP
> addresses + have access to the kubeconfig.
>
>
> On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <ge...@gerar.do> wrote:
>
> > Hi folks,
> >
> > I've created an AIP for simplifying Airflow's development workflow:
> >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
> >
> > The goal of this proposal is to outline the work needed to make local
> > testing significantly easier and standardise the best practices to
> > contribute to the Airflow project.
> >
> > Any input on it would be greatly appreciated.
> >
> > Cheers,
> >
> > --
> > Gerardo Curiel // https://gerar.do
>
>
> On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <ge...@gerar.do> wrote:
>
> > Hi folks,
> >
> > I've created an AIP for simplifying Airflow's development workflow:
> >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
> >
> > The goal of this proposal is to outline the work needed to make local
> > testing significantly easier and standardise the best practices to
> > contribute to the Airflow project.
> >
> > Any input on it would be greatly appreciated.
> >
> > Cheers,
> >
> > --
> > Gerardo Curiel // https://gerar.do
> >
>

Re: AIP-7 Simplified development workflow

Posted by Gerardo Curiel <ge...@gerar.do>.
Hi Daniel,

On 2 January 2019 at 4:51:17 pm, Daniel Imberman (daniel.imberman@gmail.com)
wrote:

I've also been extremely interested in this as well. I've been messing with
GCP VM instances in the past few weeks to try to simplify my local build as
well. Would definitely be interested in helping with the AIP +
implementation.


That's good to hear :)

One thing I believe we should do is set up the ci base-image with all of
the pip dependencies pre-loaded. A lot of time is wasted pip installing
dependencies. We can auto-generate new images whenever a PR is submitted to
this repository and then specify the tag in the .travis.yml when building.


Sounds like a good idea. I've seen this pattern before, where there's a
`app` Dockerfile that depends on an `app-dependencies` image.

On the k8s side, I think we need to move away from minikube for k8s
testing. I discussed in a previous email setting travis to work with GKE.
I'd be careful about coupling k8s stuff too tightly with a docker
infrastructure. That can get pretty dicey. I think as long as we're using a
separate k8s cluster the k8s executor tests only need to gather the IP
addresses + have access to the kubeconfig.

I think we should mimic whatever the k8s community does regarding testing.
And if this means moving away from minikube and towards actual cloud
infrastructure, so be it. We just need to document this decision, and how
local testing would work for non-committers, which won't necessarily have
access to an actual k8s cluster. We might need to manage expectations here.

--
Gerardo Curiel // https://gerar.do

Re: AIP-7 Simplified development workflow

Posted by Daniel Imberman <da...@gmail.com>.
@gerardo thank you for setting this up.

I've also been extremely interested in this as well. I've been messing with
GCP VM instances in the past few weeks to try to simplify my local build as
well. Would definitely be interested in helping with the AIP +
implementation.

One thing I believe we should do is set up the ci base-image with all of
the pip dependencies pre-loaded. A lot of time is wasted pip installing
dependencies. We can auto-generate new images whenever a PR is submitted to
this repository and then specify the tag in the .travis.yml when building.

On the k8s side, I think we need to move away from minikube for k8s
testing. I discussed in a previous email setting travis to work with GKE.
I'd be careful about coupling k8s stuff too tightly with a docker
infrastructure. That can get pretty dicey. I think as long as we're using a
separate k8s cluster the k8s executor tests only need to gather the IP
addresses + have access to the kubeconfig.


On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <ge...@gerar.do> wrote:

> Hi folks,
>
> I've created an AIP for simplifying Airflow's development workflow:
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
>
> The goal of this proposal is to outline the work needed to make local
> testing significantly easier and standardise the best practices to
> contribute to the Airflow project.
>
> Any input on it would be greatly appreciated.
>
> Cheers,
>
> --
> Gerardo Curiel // https://gerar.do


On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <ge...@gerar.do> wrote:

> Hi folks,
>
> I've created an AIP for simplifying Airflow's development workflow:
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow
>
> The goal of this proposal is to outline the work needed to make local
> testing significantly easier and standardise the best practices to
> contribute to the Airflow project.
>
> Any input on it would be greatly appreciated.
>
> Cheers,
>
> --
> Gerardo Curiel // https://gerar.do
>