You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <Ja...@polidea.com> on 2020/10/25 20:17:36 UTC

Re: Separate Repo vs MonoRepo for Dockerfile & Helm Chart

Hello Everyone,

I would like to come back to the discussion as I have *JUST* implemented
the solution (very simple but 100% working) to this monorepo vs. separate
repos.

You can take a look at this repo of mine:
https://github.com/potiuk/airflow-docker. It is very simple and works like
a charm. I implemented it to solve the issue
https://github.com/apache/airflow/issues/11740

This is a separate repo that people can use to have a separate "read-only"
repository that **only** keeps our Dockerfile-related stuff - including the
full history of changes related (and only those), full traceability, and
incremental, automated synchronization from our "airflow" repo.

I can - any time - set it up as "apache/airflow-docker" and get it to
synchronize every day or every hour.

Here, how it works:

* The "master" and "v1-10-stable" branches are filtered to only contain
files that are needed to build Prod Docker image
* We keep history of all relevant commits in those branches
* In the "main" branch we only keep the "scheduled" Github Actions workflow
that does the synchronization and README.md which explains what needs to be
done to build the docker image
* I am using the excellent "git-filter-repo" tool which does the job really
well and fast. Git-filter-repo is recommended by Git maintainers over the
old, slow and much worse built-in git-filter-branch:
https://git-scm.com/docs/git-filter-branch#_warning
* the jobs to synchronize the repo takes 1m30 s to run - it is rather fast
despite analyzing 13500 commits :)
* it runs incrementally - just adding new commits when they appear
* it is very simple, few lines script + few steps in Github Action to
checkout/push the right branches
* we keep all the commit mapping in the repo as well, so we have 1-1
relationship between the commits in the "docker repo" and the original ones
in Airflow repo
* synchronization is 1-way - airflow -> airlfow-docker
* we can use a very similar approach for synchronizing:
    * Helm chart
    * Open API clients
    * other stuff

It also follows our source release strategy - it has the same "properties"
as our main repo - so it is merely a "convenience" way of accessing the
Docker customization options, but the same functionality is available in
our officially released sources.

Do you think we should turn it into the "apache/airflow-docker" repo?

J.



On Sun, Jul 5, 2020 at 8:12 PM Daniel Imberman <da...@gmail.com>
wrote:

> Worth noting that git has the ability to cherry-pick only specific
> directories. If we keep all of helm + tests in one directory, docker +
> tests in another, and core + tests in a third directory it would be pretty
> simple to automate splitting them.
>
>
> https://stackoverflow.com/questions/19821749/git-cherry-pick-or-merge-specific-directory-from-another-branch
>
> via Newton Mail [
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> ]
> On Sun, Jul 5, 2020 at 9:57 AM, Daniel Imberman <da...@gmail.com>
> wrote:
> I can’t agree with this enough :). I think writing a few bots to separate
> out sections will be MUCH easier in the long run than maintaining multiple
> repos. Will also prevent the difficulty of setting up a proper dev
> environment for new contributors.
> via Newton Mail [
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> ]
> On Sun, Jul 5, 2020 at 9:53 AM, Jarek Potiuk <Ja...@polidea.com>
> wrote:
> Yeah. I think that the "monorepo" is the only way for now - until (or if)
> we reach the size (and maturity) that different teams take care of the
> different projects. Which might even not happen.
>
> But I would love to try the separate repos to publish/release still (maybe
> not immediately, but it is a nice concept). I think it should be rather
> easy (I will try it on my own repo first). Also, I think it has another
> advantage - those separate repos might actually run other kinds of tests -
> for example, to test if there is "everything" in that repo to release it
> (for example build helm chart) and whether there are no accidental use of
> stuff from outside of those dirs.
>
> I already thought about how to do it - it should be rather easy. Of course
> - like most of the time - there is a ready-to-use git command doing it for
> us. We simply need a bot running for that rep executing a variant of this
> command:
>
> https://docs.github.com/en/github/using-git/splitting-a-subfolder-out-into-a-new-repository
> (it
> should only take commits from the commit merged last time). So level of
> automation here is rather minimal.
>
> And if have those repos and at some point of time we decide to split
> eventually - we will have already repos with all history as a starting
> point.
>
> J.
>
>
>
>
>
>
>
> J.
>
>
> On Sun, Jul 5, 2020 at 4:42 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> > Hmm.. I agree the git-sync would have been a difficult one to solve if we
> > had separate repositories.
> >
> > Well, in that case, the mono repo approach (like we have now) indeed
> makes
> > more sense.
> >
> > Regarding the Kubernetes approach, I feel the ones in staging (
> > https://github.com/kubernetes/kubernetes/tree/master/staging) are part
> of
> > the actual product itself but in our case we were discussing between Helm
> > chart and Dockerfile which are not actually part of the product. And we
> > will need a good deal of automation if we go down that route.
> > I think the plain mono-repo approach is better than that one.
> >
> > Regards,
> > Kaxil
> >
> >
> > On Sun, Jul 5, 2020 at 9:19 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > And one more perfect illustration of what I am talking about.
> > >
> > > A very good thing just happened. I was running the PR while writing the
> > > email (long time as you might imagine) and the new K8S tests with
> 1.10.11
> > > just failed. https://github.com/apache/airflow/pull/9663
> > >
> > > If had released the helm chart before we would've clear (small)
> > > incompatibility here. And by seeing the test failing we could make
> > decision
> > > what to do:
> > >
> > > 1) fix it differently
> > > 2) document it as a breaking Helm change, "1.10.12+ image" and make
> test
> > > work in both cases
> > > 3) revert ...
> > >
> > > But at least we have na early warning that something is wrong. This is
> > the
> > > clear value of running the tests at every commit.
> > >
> > > J.
> > >
> > > On Sun, Jul 5, 2020 at 10:08 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > > wrote:
> > >
> > > > I just have another example of a case where splitting the repos and
> > using
> > > > only "released versions" across repositories might be a complete
> > overkill
> > > > when it comes to development complexity.
> > > >
> > > > We have this change from Aneesh:
> > > > https://github.com/apache/airflow/pull/9371 about adding a git-sync
> > > > option to the helm chart.
> > > >
> > > > That's a new feature, but we would like to test both 1.10 and the
> > master
> > > > version of KubernetesExecutor with that. It should work for both of
> > them
> > > -
> > > > there is no coupling/dependency in the "airflow' code for it.
> > > >
> > > > However, there is a strong coupling in the tests. We have the
> > > > "kubernetes_tests" running tests using all three: chart, production
> > > docker,
> > > > and Airflow, Those tests will have to be likely adapted to work with
> > the
> > > > new git-sync option. They were disabled previously as we had problems
> > > with
> > > > them before the helm chart was used for tests but we can turn them
> back
> > > on
> > > > now when git-sync is added to the helm chart. Those tests are part of
> > > > airflow test suite and we discussed with Daniel that they should stay
> > > there
> > > > - those tests are importing airflow code, they are using latest
> example
> > > > dags which are also in the airflow code.
> > > >
> > > > So we have two ways how we can develop this -
> > > > A) monorepo (current)
> > > > B) separate repos.
> > > >
> > > > Just to remind - he goal is that our change is tested against:
> > > >
> > > > 1) Released Airflow version (say 1.10.11).
> > > > 2) Development airflow version (master - soon possibly development)
> > > > 3) Development docker image built with either "development" or
> > "1.10.11"
> > > > (we can release the Docker image for 1.10.11 independently from the
> > > current
> > > > development HEAD). The docker image is supposed to work with any
> > version
> > > of
> > > > airflow
> > > >
> > > > In the case of A) Monorepo we have all that as a given.
> > > >
> > > > I just sent this really small PR that should do the job:
> > > > https://github.com/apache/airflow/pull/9663. What it does, it takes
> > the
> > > > latest "development" docker image, "development" chart, bakes in the
> > > latest
> > > > "example dags" from "development branch". The image uses either
> > > > "development" or released (from PyPI) "1.10.11" Airflow version - and
> > run
> > > > the "development" tests against it. This is exactly what we want. If
> we
> > > add
> > > > new features to the helm chart, the Kubernetes tests will have to be
> > > > updated to include that - and this will happen in the airflow
> > > "development"
> > > > branch. The REALLY good thing in it - since we are running those
> tests
> > in
> > > > CI build of airflow development branch - we prevent anyone from
> making
> > > > breaking changes. It is a given that both - the "development" of
> > airflow
> > > > and the "1.10.11" version of airflow will continue to work with the
> > image
> > > > and chart.
> > > >
> > > >
> > > > In the case of B) where we split the repos:
> > > >
> > > > We have to decide where to keep the "kubernetes_tests" - should they
> be
> > > in
> > > > "Airflow" or in "Helm". They are testing BOTH so we can choose either
> > > way.
> > > > Together with Daniel we plan to expand those tests to cover all the
> > > > different options we have in the Chart - testing all of it -
> Kubernetes
> > > > Executor, Celery Executor running on Kubernetes, MySQL (once we add
> > it),
> > > > etc. etc. So we want to make sure we have a matrix of tests covering
> a
> > > > number of deployment options. Those tests do not exist yet, and they
> > will
> > > > have to be written. In principle - they can be moved to the "Helm"
> > > > repository. That's where they conceptually belong. However - there
> is a
> > > > Huge value in running the tests in airflow "development" - the value
> is
> > > > that no-one will be able to break the "development" airflow, because
> > > those
> > > > tests are run with every PR. I think we have no choice but to run
> those
> > > > tests always in development. Otherwise, people maintaining the helm
> > chart
> > > > will have to fix the problems introduced by people changing Airflow
> > > code. I
> > > > think this is a pretty bad idea to allow that. So if we move those
> > tests
> > > to
> > > > Helm Chart repo we have to figure out how to run those "kubernetes"
> > tests
> > > > in CI for every build. This is quite possible - by getting the latest
> > > > master from helm chart and running the build, but it has several
> > > problems:
> > > >
> > > > 1) The test code for CI will have to continue to stay in Airflow (to
> > run
> > > > CI builds) - this means that we already have coupling and some code
> > > related
> > > > to the execution of the helm tests has to be any way in Airflow.
> > > >
> > > > 2) Bigger problem. What happens if as "Airflow developer" you DO
> > > introduce
> > > > a change that breaks the helm chart? You will see a CI error and.....
> > You
> > > > will not know what to do. Do you involve people who maintain the helm
> > > chart
> > > > and wait for them? I think not. You should be able to reproduce the
> > > problem
> > > > locally and fix it yourself (maybe with the help of others - but you
> > > should
> > > > be able to fix your own commit). We would have to teach people how to
> > > bring
> > > > the docker image and helm chart code from the latest version and run
> > the
> > > > tests. We could do it automatically with Breeze (similarly as we do
> > with
> > > > other integrations - where we bring in Kerberos, Mongo, and a
> multitude
> > > of
> > > > others) without them even knowing it, but this might be fairly
> complex
> > > and
> > > > prone to errors. In Monorepo - we already have a simple way of
> > > reproducing
> > > > and running the tests locally and everything is in one place.
> > > >
> > > > 3) There is a chance that someone makes a change in Helm in parallel
> > to a
> > > > change in Airflow that breaks it. This could easily happen in the
> > > "git-sync
> > > > case" or when we add "MySQL" for example in the future. And there is
> no
> > > way
> > > > to prevent it.
> > > >
> > > > 4) If we only test against "released" Helm and Airflow (that was one
> of
> > > > the suggestions), the problem is even bigger. How do you know that
> you
> > do
> > > > not break the currently "developed" helm chart? Or how do you know
> that
> > > the
> > > > currently "developed" helm chart works with latest Airflow release?
> If
> > > you
> > > > do not do those checks at the "commit" time, then you defer this to
> > > > "release time" and only then you might find out that decisions you
> made
> > > > during development have to be reverted. This is a very, very bad idea
> > > IMHO
> > > > again leading to the case that the release manager will have to fix
> > > > problems introduced by others.
> > > >
> > > > J,
> > > >
> > > >
> > > >
> > > > On Fri, Jul 3, 2020 at 10:28 PM Ash Berlin-Taylor <as...@apache.org>
> > > wrote:
> > > >
> > > >> Monorepo FTW.
> > > >>
> > > >> Yes, it gets a little bit messier around release, but the approach
> of
> > > >> automatically extracting out the commits (or parts of commits) to a
> > > >> separate repo for releasing may be the solution to that problem
> > > >>
> > > >>
> > > >> -ash
> > > >>
> > > >> On Jul 3 2020, at 7:51 pm, Kaxil Naik <ka...@gmail.com> wrote:
> > > >>
> > > >> > I will take a look at the Kubernetes approach and get back to this
> > > >> thread.
> > > >> >
> > > >> > We had a discussion with Daniel yesterday and we are both
> concerned
> > > >> about
> > > >> >> all the overhead for people like us who work on all three
> > "entities"
> > > >> >> at the
> > > >> >> same time. Even just explaining how to work with Pull Requests
> and
> > in
> > > >> what
> > > >> >> sequence those PRs would have to be opened and merged in case of
> > > >> changes
> > > >> >> that are spanning across several "entities" - was a challenge. I
> > was
> > > >> unable
> > > >> >> to clearly explain the sequence and way of reviewing/merging the
> > PRs
> > > >> that
> > > >> >> will have to be made if we have submodules. This is a bad sign
> as I
> > > was
> > > >> >> using submodules in the past and know how it works but I was
> unable
> > > to
> > > >> >> explain it clearly.
> > > >> >
> > > >> >
> > > >> > We don't even need submodules tbh. We can just use Bash Script
> that
> > > >> > pulls a
> > > >> > pinned Helm Chart version.
> > > >> > We only need Helm chart to run integration test for k8s (atleast
> for
> > > >> now).
> > > >> > We already use tons of Bash scripts.
> > > >> >
> > > >> > One of the important benefits of separation that changes in one
> > > >> component
> > > >> > should not need change in other component, atleast
> > > >> > not immediately.
> > > >> >
> > > >> > Changes in Helm chart and Docker file should never need changes in
> > > >> Airflow
> > > >> > Changes in Airflow should only ever need a change in Dockerfile
> and
> > > Helm
> > > >> > Chart after a new version is released.
> > > >> >
> > > >> > I just had a talk with Daniel too and still didn't find a good
> > enough
> > > >> > reason to have them in the same repo.
> > > >> >
> > > >> > I will definitely look at the Kubernetes approach (maybe it is
> > better)
> > > >> and
> > > >> > get back to this thread. But as of now I don't see any major PROs
> > > >> > for having them in the same repo.
> > > >> >
> > > >> > Regards,
> > > >> > Kaxil
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Fri, Jul 3, 2020 at 5:00 PM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com
> > > >
> > > >> > wrote:
> > > >> >
> > > >> >> I think Ry's point is an important one - I thought about writing
> a
> > > >> longer
> > > >> >> post but I looked at the Kubernetes structure and I really like
> it
> > so
> > > >> just
> > > >> >> wanted to comment on this last one.
> > > >> >>
> > > >> >> Seems that it is simply one "authoritative" (or source of truth)
> > repo
> > > >> where
> > > >> >> everything is developed in monorepo fashion but then there is a
> bot
> > > >> >> that moves every commit related to subdirectories to those
> > > "split-out"
> > > >> >> repos. There are never direct commits of people or PRs in the
> > > >> "split-out"
> > > >> >> repositories. This is very similar to my original proposal to
> have
> > > >> >> dedicated repos used for releases - but with an automated way of
> > > >> publishing
> > > >> >> the commits to the "separated" repos at the moment, they are
> merged
> > > to
> > > >> >> master in the main repo. I love it.
> > > >> >>
> > > >> >> I think it's really good and "pragmatic" solution. The code is
> > > >> >> available in
> > > >> >> separate repos, including the history of commits related to each
> > > >> "entity"
> > > >> >> (so only chart-related commits in chart repo). Issues for
> > particular
> > > >> >> "entities" are in those separate repos as well (something that
> > Kaxil
> > > >> >> mentioned). Users (not developers!) who are interested only in
> > > >> Dockerfile
> > > >> >> or Helm Chart have separate repos they can look at - with only
> > > relevant
> > > >> >> changes and history of releases for that particular entity. They
> > can
> > > >> raise
> > > >> >> issues there (and in GitHub, we can easily refer to those issues
> > from
> > > >> the
> > > >> >> main "airflow" repo). All the discussion from "user issues" are
> > kept
> > > >> >> in the
> > > >> >> relevant repositories. Still - comments about development changes
> > > (and
> > > >> >> related issues) might still be kept in the main "airflow" repo -
> > next
> > > >> to
> > > >> >> other "development" changes.
> > > >> >>
> > > >> >> We can run separate releases from those linked repositories and
> > even
> > > >> >> publish sources directly from those repositories rather than from
> > the
> > > >> main
> > > >> >> one. At the same time - we avoid all the hassle of submodules.
> > > >> >>
> > > >> >> We had a discussion with Daniel yesterday and we are both
> concerned
> > > >> about
> > > >> >> all the overhead for people like us who work on all three
> > "entities"
> > > >> >> at the
> > > >> >> same time. Even just explaining how to work with Pull Requests
> and
> > in
> > > >> what
> > > >> >> sequence those PRs would have to be opened and merged in case of
> > > >> changes
> > > >> >> that are spanning across several "entities" - was a challenge. I
> > was
> > > >> unable
> > > >> >> to clearly explain the sequence and way of reviewing/merging the
> > PRs
> > > >> that
> > > >> >> will have to be made if we have submodules. This is a bad sign
> as I
> > > was
> > > >> >> using submodules in the past and know how it works but I was
> unable
> > > to
> > > >> >> explain it clearly.
> > > >> >>
> > > >> >> I really, really like Kubernetes approach - seems that it's one
> of
> > > the
> > > >> >> cases where we can "eat cake and have it too".
> > > >> >>
> > > >> >> J.
> > > >> >>
> > > >> >>
> > > >> >> On Thu, Jul 2, 2020 at 5:59 PM Ry Walker <ry...@rywalker.com>
> wrote:
> > > >> >>
> > > >> >> > One reason to have a monorepo is for project branding, and end
> > user
> > > >> >> > experience. But for component development experience, it's nice
> > to
> > > >> >> have a
> > > >> >> > small, dedicated repo.
> > > >> >> >
> > > >> >> > I think the git submodule approach is technically sound, but is
> > at
> > > >> odds
> > > >> >> > with making the project easy to consume/understand from the end
> > > user
> > > >> >> > perspective, especially if we expand the use of subprojects.
> And
> > > >> >> the main
> > > >> >> > Airflow commit graph would appear to be slowing down which is
> bad
> > > for
> > > >> >> > Airflow brand perception.
> > > >> >> >
> > > >> >> > Kubernetes has many sub-repos that are integrated into the main
> > > >> >> repo -
> > > >> >> > which I think could be the best of both worlds:
> > > >> >> > Example:
> > > >> https://github.com/kubernetes/kubernetes/tree/master/staging
> > > >> >> >
> > > >> >> > I haven't dug in very deeply, and I won't pretend to understand
> > how
> > > >> >> > challenging it may be to maintain this structure, but I'd
> support
> > > >> >> breaking
> > > >> >> > more components out of the main Airflow repo for dev purposes
> > (for
> > > >> >> example,
> > > >> >> > in the future, it'd be nice to have airflow-cli, airflow-api,
> > > >> >> > airflow-scheduler, individual provider repos that are cleanly
> > > >> separated)
> > > >> >> as
> > > >> >> > long as we bring the commits/contributions back into the
> monorepo
> > > >> with
> > > >> >> > automation.
> > > >> >> >
> > > >> >> > Maybe we could dive a little deeper into how K8s is operating,
> > > before
> > > >> >> going
> > > >> >> > with submodules?
> > > >> >> >
> > > >> >> > -Ry
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > On Thu, Jul 2, 2020 at 11:24 AM Kaxil Naik <
> kaxilnaik@gmail.com>
> > > >> wrote:
> > > >> >> >
> > > >> >> > > Let's come to a consensus first before we do anything :-)
> > > >> >> > >
> > > >> >> > > Is everyone happy with separate repo approach? Let's wait for
> > 72
> > > >> hours
> > > >> >> to
> > > >> >> > > hear from all and then have a plan on how we do it? WDYT?
> > > >> >> > >
> > > >> >> > > But indeed git submodules approach sounds good. We do it for
> > for
> > > >> >> *Airflow
> > > >> >> > > Site *(
> > > >> >> > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
> https://github.com/apache/airflow-site/tree/master/landing-pages/site/themes
> > > >> >> > > )
> > > >> >> > > too.
> > > >> >> > >
> > > >> >> > > Regards,
> > > >> >> > > Kaxil
> > > >> >> > >
> > > >> >> > > On Thu, Jul 2, 2020 at 4:15 PM Jarek Potiuk <
> > > >> Jarek.Potiuk@polidea.com>
> > > >> >> > > wrote:
> > > >> >> > >
> > > >> >> > > > Absolutely - I am happy to add "best practices" and short
> > > >> >> "howto do
> > > >> >> > stuff
> > > >> >> > > > with git submodules" - and this knowledge will only be
> > needed
> > > >> for
> > > >> >> > > > interacting with prod image/helmchart/running kubernetes
> > tests.
> > > >> For
> > > >> >> all
> > > >> >> > > the
> > > >> >> > > > other purposes it should be "business as usual".
> > > >> >> > > >
> > > >> >> > > > On Thu, Jul 2, 2020 at 4:53 PM Daniel Imberman <
> > > >> >> > > daniel.imberman@gmail.com>
> > > >> >> > > > wrote:
> > > >> >> > > >
> > > >> >> > > > > I think git submodules sounds like a great idea. We would
> > > >> >> need to
> > > >> >> > write
> > > >> >> > > > > this into the CONTRIBUTING.md to let people know how to
> do
> > it
> > > >> but
> > > >> >> > It’s
> > > >> >> > > a
> > > >> >> > > > > “teach once” situation.
> > > >> >> > > > >
> > > >> >> > > > > via Newton Mail [
> > > >> >> > > > >
> > > >> >> > > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > >> >> > > > > ]
> > > >> >> > > > > On Thu, Jul 2, 2020 at 2:44 AM, Tomasz Urbaszek <
> > > >> >> > turbaszek@apache.org>
> > > >> >> > > > > wrote:
> > > >> >> > > > > I support the idea of separate repos. The git submodules
> > > >> mentioned
> > > >> >> by
> > > >> >> > > > > Jarek sounds like an interesting solution. It may add
> some
> > > >> >> complexity
> > > >> >> > > > > for new contributors but it's not rocket science. If we
> > agree
> > > >> on
> > > >> >> > using
> > > >> >> > > > > this we should add small how-to in contributing.rst I
> think
> > > >> (i.e.
> > > >> >> do
> > > >> >> > I
> > > >> >> > > > > have to have fork of each repo?).
> > > >> >> > > > >
> > > >> >> > > > > As stressed previously if we go this route we should make
> > > >> >> sure we
> > > >> >> > have
> > > >> >> > > > > nice testing of all those three components. Regarding the
> > > >> >> versioning,
> > > >> >> > > > > I have no strong opinion but I fully support using
> separate
> > > >> issues
> > > >> >> > for
> > > >> >> > > > > airflow, docker, and helm.
> > > >> >> > > > >
> > > >> >> > > > > Tomek
> > > >> >> > > > >
> > > >> >> > > > >
> > > >> >> > > > > On Thu, Jul 2, 2020 at 9:26 AM Jarek Potiuk <
> > > >> >> > Jarek.Potiuk@polidea.com>
> > > >> >> > > > > wrote:
> > > >> >> > > > > >
> > > >> >> > > > > > On Thu, Jul 2, 2020 at 3:16 AM Daniel Imberman <
> > > >> >> > > > > daniel.imberman@gmail.com>
> > > >> >> > > > > > wrote:
> > > >> >> > > > > >
> > > >> >> > > > > > I’m fine with keeping it as three separate repos but
> > > merging
> > > >> >> > testing
> > > >> >> > > > > > > somehow (e.g. the source code chart would pull the
> > > >> helm/docker
> > > >> >> > > chart
> > > >> >> > > > > into
> > > >> >> > > > > > > .build) but we need to do it in a way that doesn’t
> make
> > > >> testing
> > > >> >> > too
> > > >> >> > > > > > > difficult.
> > > >> >> > > > > > >
> > > >> >> > > > > > > So for example: How do I test/integration test a
> change
> > > >> that
> > > >> >> > > > involves a
> > > >> >> > > > > > > change to all three and has to be done at the same
> > time?
> > > >> >> Perhaps
> > > >> >> > a
> > > >> >> > > > > user can
> > > >> >> > > > > > > “register” a branch of helm and docker when they
> start
> > up
> > > >> >> breeze?
> > > >> >> > > Or
> > > >> >> > > > > > > perhaps we create a “parent” integration test that
> uses
> > > the
> > > >> >> three
> > > >> >> > > > > together?
> > > >> >> > > > > > >
> > > >> >> > > > > >
> > > >> >> > > > > > Yes, those are exactly my concerns when splitting the
> > > repos.
> > > >> >> > > > > >
> > > >> >> > > > > > I think testing for development should remain in the
> > > >> "airflow"
> > > >> >> > repo.
> > > >> >> > > It
> > > >> >> > > > > is
> > > >> >> > > > > > the "central one" in fact. I slept it over and I think
> > > using
> > > >> >> > > "released"
> > > >> >> > > > > > versions for development testing will suffer from this
> > "we
> > > >> >> need a
> > > >> >> > > > change
> > > >> >> > > > > in
> > > >> >> > > > > > all three of those".
> > > >> >> > > > > >
> > > >> >> > > > > > But we have an easy solution I think.
> > > >> >> > > > > >
> > > >> >> > > > > > I think that simply setting submodules properly should
> do
> > > >> >> to the
> > > >> >> > job:
> > > >> >> > > > > > https://git-scm.com/book/en/v2/Git-Tools-Submodules.
> > They
> > > >> seem
> > > >> >> to
> > > >> >> > be
> > > >> >> > > > > > perfect for our case.
> > > >> >> > > > > >
> > > >> >> > > > > > For those who have not used it - in short - submodules
> > work
> > > >> in
> > > >> >> the
> > > >> >> > > way
> > > >> >> > > > > that
> > > >> >> > > > > > they register the "linked repos" and store related
> "hash"
> > > >> >> of the
> > > >> >> > > commit
> > > >> >> > > > > > from that linked repo. For example, the "chart" folder
> > will
> > > >> >> be a
> > > >> >> > link
> > > >> >> > > > to
> > > >> >> > > > > > "apache/airflow-helm-chart". We can also move the prod
> > > >> Dockerfile
> > > >> >> > to
> > > >> >> > > a
> > > >> >> > > > > > subfolder and link it to the separate repo. Git
> submodule
> > > >> >> has a
> > > >> >> > > > > > built-in mechanism to a) update to the latest version
> of
> > > the
> > > >> >> repo,
> > > >> >> > b)
> > > >> >> > > > > > commit your changes to the linked repo from there which
> > is
> > > >> >> all we
> > > >> >> > > > need. I
> > > >> >> > > > > > used those few times - I never liked submodules for
> > sharing
> > > >> >> > "library"
> > > >> >> > > > > code,
> > > >> >> > > > > > but for sharing helm/Docker It seems perfect.
> > > >> >> > > > > >
> > > >> >> > > > > > From the "regular" developer point of view - you do not
> > > >> >> need to
> > > >> >> > > > > get/update
> > > >> >> > > > > > submodules if you do not need to use them - so for all
> > the
> > > >> >> > > development
> > > >> >> > > > > > purposes if you only change the "airflow" code, you
> would
> > > not
> > > >> >> even
> > > >> >> > > need
> > > >> >> > > > > to
> > > >> >> > > > > > sync chart or Dockerfile. You do "git checkout" as
> usual
> > > >> >> and it
> > > >> >> > > should
> > > >> >> > > > > > work. So basically - no change for "regular" airflow
> > > >> development.
> > > >> >> > > > > >
> > > >> >> > > > > > However, if you do need to work on helm + Docker +
> code,
> > > >> >> then you
> > > >> >> > > > simply
> > > >> >> > > > > to
> > > >> >> > > > > > "git submodule update", go to the linked "helm" or
> > "docker"
> > > >> >> folder,
> > > >> >> > > > > > checkout the "master" version and you start making
> > changes.
> > > >> The
> > > >> >> > only
> > > >> >> > > > > thing
> > > >> >> > > > > > to remember when you want to push your changes is to do
> > > >> >> `git push
> > > >> >> > > > > > --recurse-sumbodules="check" ` and it will make sure
> that
> > > >> >> all the
> > > >> >> > > repos
> > > >> >> > > > > are
> > > >> >> > > > > > updated, It is a bit involved, but latest git version
> > have
> > > >> >> a very
> > > >> >> > > good
> > > >> >> > > > > > support and it must only be used by people who work on
> > > >> >> airflow +
> > > >> >> > > > docker +
> > > >> >> > > > > > helm - all the others are unaffected.
> > > >> >> > > > > >
> > > >> >> > > > > > From the CI perspective also nothing changes - when we
> > > >> checkout
> > > >> >> the
> > > >> >> > > > code
> > > >> >> > > > > we
> > > >> >> > > > > > will include submodules and our test harness will be
> > > largely
> > > >> >> > > unchanged.
> > > >> >> > > > > > Submodule provides us with the right mechanism for
> cross
> > > >> >> dependency
> > > >> >> > > > even
> > > >> >> > > > > if
> > > >> >> > > > > > we use branches.
> > > >> >> > > > > >
> > > >> >> > > > > > If everyone will be ok with that - I am happy to set it
> > up,
> > > >> With
> > > >> >> > > > > submodules
> > > >> >> > > > > > - we can switch to separate repos even without
> releasing
> > > >> >> helm and
> > > >> >> > > Prod
> > > >> >> > > > > > chart "officially".
> > > >> >> > > > > >
> > > >> >> > > > > > J.
> > > >> >> > > > > >
> > > >> >> > > > > >
> > > >> >> > > > > >
> > > >> >> > > > > > >
> > > >> >> > > > > > > via Newton Mail [
> > > >> >> > > > > > >
> > > >> >> > > > >
> > > >> >> > > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > >> >> > > > > > > ]
> > > >> >> > > > > > > On Wed, Jul 1, 2020 at 3:20 PM, Jarek Potiuk <
> > > >> >> > > > Jarek.Potiuk@polidea.com
> > > >> >> > > > > >
> > > >> >> > > > > > > wrote:
> > > >> >> > > > > > > Sure. We can work with such an approach. There will
> be
> > > some
> > > >> >> > > > > dependencies
> > > >> >> > > > > > > that we might find are problematic, but If we all see
> > > >> >> that it's
> > > >> >> > > > > > > worth trying, there is a clear benefit that it makes
> > for
> > > a
> > > >> >> > "clean"
> > > >> >> > > > > > > split between those different "entities". And
> possibly
> > > >> >> once we
> > > >> >> > > > release
> > > >> >> > > > > > > first versions of both image and chart, such problems
> > > >> >> will be
> > > >> >> > rare
> > > >> >> > > > and
> > > >> >> > > > > easy
> > > >> >> > > > > > > to fix.
> > > >> >> > > > > > >
> > > >> >> > > > > > > I personally think such split is inevitable
> eventually,
> > > >> it's
> > > >> >> > just a
> > > >> >> > > > > matter
> > > >> >> > > > > > > when to do it. If we decide to make this happen soon
> -
> > I
> > > am
> > > >> >> more
> > > >> >> > > than
> > > >> >> > > > > happy
> > > >> >> > > > > > > to work on making the split reality.
> > > >> >> > > > > > >
> > > >> >> > > > > > > One prerequisite to that is that all those - Helm
> > Chart,
> > > >> Prod
> > > >> >> > Image
> > > >> >> > > > and
> > > >> >> > > > > > > Airflow are released in stable versions separately
> > > >> >> "officially" -
> > > >> >> > > > from
> > > >> >> > > > > the
> > > >> >> > > > > > > current sources (otherwise there will be no way to
> test
> > > >> >> > > cross-repo).
> > > >> >> > > > > > >
> > > >> >> > > > > > > I think for that we will need to agree on the
> > versioning
> > > >> scheme
> > > >> >> > and
> > > >> >> > > > > cadence
> > > >> >> > > > > > > for the Image and Helm Chart, then copy sources from
> > > >> airflow
> > > >> >> and
> > > >> >> > > > > release
> > > >> >> > > > > > > them as "baseline" including setup the tests for all
> of
> > > >> >> those -
> > > >> >> > > then
> > > >> >> > > > we
> > > >> >> > > > > > > can remove both Helm and Dockerfile from the airflow
> > > repo.
> > > >> >> Happy
> > > >> >> > to
> > > >> >> > > > > help
> > > >> >> > > > > > > with that if that's the direction we choose as a
> > > >> >> community. It
> > > >> >> is
> > > >> >> > > > > important
> > > >> >> > > > > > > though that we keep the cross-repo testing working.
> We
> > > >> >> have it
> > > >> >> > > > working
> > > >> >> > > > > as
> > > >> >> > > > > > > of yesterday, so now the matter is - whatever we do
> we
> > > >> >> keep it
> > > >> >> > > > running
> > > >> >> > > > > and
> > > >> >> > > > > > > have development environment support easy development
> > and
> > > >> >> testing
> > > >> >> > > of
> > > >> >> > > > > > > either of the three (including CI testing
> cross-repos)
> > ,
> > > >> That's
> > > >> >> > the
> > > >> >> > > > > only
> > > >> >> > > > > > > really important thing to me - the rest is more of
> > > >> technicality
> > > >> >> > how
> > > >> >> > > > we
> > > >> >> > > > > link
> > > >> >> > > > > > > the repos, but principle remains.
> > > >> >> > > > > > >
> > > >> >> > > > > > > Do we have an idea for the versioning scheme that we
> > > >> >> would like
> > > >> >> > to
> > > >> >> > > > use
> > > >> >> > > > > for
> > > >> >> > > > > > > the Helm Chart and prod image ?
> > > >> >> > > > > > >
> > > >> >> > > > > > > Should we make it CalVer
> > > >> >> <https://calver.org/overview.html> or
> > > >> >> > > > SemVer
> > > >> >> > > > > > > <https://semver.org/> (or some other scheme)? And
> how
> > > >> should
> > > >> >> we
> > > >> >> > > > treat
> > > >> >> > > > > the
> > > >> >> > > > > > > combinations with Airflow?
> > > >> >> > > > > > >
> > > >> >> > > > > > > My thoughts (but I have no strong opinions as long as
> > > >> someone
> > > >> >> > > > proposes
> > > >> >> > > > > more
> > > >> >> > > > > > > sensible versioning schemes):
> > > >> >> > > > > > >
> > > >> >> > > > > > > 1) Airflow code - we continue the release scheme we
> > have
> > > >> (with
> > > >> >> > > > > deciding on
> > > >> >> > > > > > > 2.* scheme for the release). I expect in the future
> we
> > > >> might
> > > >> >> > decide
> > > >> >> > > > on
> > > >> >> > > > > > > doing branches or patches so for 2.* I'd opt for
> going
> > > full
> > > >> >> > SemVer
> > > >> >> > > > > approach
> > > >> >> > > > > > > and patches released from branches.
> > > >> >> > > > > > >
> > > >> >> > > > > > > 2) I believe that Helm Chart can be versioned with
> its
> > > own
> > > >> >> > version
> > > >> >> > > > > (then
> > > >> >> > > > > > > you specify the image version as helm parameter). For
> > the
> > > >> Helm
> > > >> >> > > Chart
> > > >> >> > > > I
> > > >> >> > > > > > > think CalVer might be OK as I do not expect any
> > > >> >> branching/patches
> > > >> >> > > in
> > > >> >> > > > > the
> > > >> >> > > > > > > future - I'd expect that there will be a single
> stream
> > of
> > > >> >> > releases.
> > > >> >> > > > > > >
> > > >> >> > > > > > > 3) Dockerfile (+ related files such as .dockerignore,
> > > empty
> > > >> >> dir,
> > > >> >> > > > > > > entrypoints etc). i do not imagine a lot of branching
> > for
> > > >> >> those -
> > > >> >> > > we
> > > >> >> > > > > > > should be able to release a new version of a
> Dockerfile
> > > (+
> > > >> >> > related
> > > >> >> > > > > files)
> > > >> >> > > > > > > working with nearly any earlier Airflow release, so
> > > CalVer
> > > >> >> seems
> > > >> >> > > > like a
> > > >> >> > > > > > > good choice.
> > > >> >> > > > > > >
> > > >> >> > > > > > > 4) Image versioning becomes a bit most complex
> because
> > > the
> > > >> >> image
> > > >> >> > > tag
> > > >> >> > > > is
> > > >> >> > > > > > > always combination of:
> > > >> >> > > > > > > * Dockerfile (+ related files) version
> > > >> >> > > > > > > * Airflow Version
> > > >> >> > > > > > > * Python Version
> > > >> >> > > > > > >
> > > >> >> > > > > > > An example versioning I can imagine:
> > > >> >> > > > > > >
> > > >> >> > > > > > > *Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 -
> > patch
> > > >> level
> > > >> >> > (if
> > > >> >> > > we
> > > >> >> > > > > > > decide to have patches).
> > > >> >> > > > > > > *Dockerfile: *2020.07.12, 2020.08.20...... ->
> depending
> > > >> >> when we
> > > >> >> > > > release
> > > >> >> > > > > > > them
> > > >> >> > > > > > > *Helm Chart*: 2020.07.10, 2020.08.09 ...... Each Helm
> > > Chart
> > > >> >> has a
> > > >> >> > > > > minimum
> > > >> >> > > > > > > version of both Dockerfile and Airflow versions it
> > works
> > > >> with.
> > > >> >> > > > > > >
> > > >> >> > > > > > > *Example Docker Image tags:*
> > > >> >> > > > > > >
> > > >> apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6
> > > >> >> > > > > > >
> > > >> >> > > > > > > WDYT?
> > > >> >> > > > > > >
> > > >> >> > > > > > > J,
> > > >> >> > > > > > >
> > > >> >> > > > > > >
> > > >> >> > > > > > > On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik <
> > > >> >> kaxilnaik@gmail.com>
> > > >> >> > > > > wrote:
> > > >> >> > > > > > >
> > > >> >> > > > > > > > I think we should have "separate repos for
> > development"
> > > >> too.
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > 3 Repos in total:
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > 1) apache/airflow
> > > >> >> > > > > > > > 2) apache/airflow-docker-image
> > > >> >> > > > > > > > 3) apache/airflow-helm-chart
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > (1) *apache/airflow* should use a pinned stable
> > version
> > > >> of
> > > >> >> > > Airflow
> > > >> >> > > > > Helm
> > > >> >> > > > > > > > chart to run Kubernetes tests
> > > >> >> > > > > > > > (2) *apache/airflow* already has *Dockerfile.ci*
> file
> > > >> which
> > > >> >> it
> > > >> >> > > can
> > > >> >> > > > > use to
> > > >> >> > > > > > > > run airflow tests on docker images.
> > > >> >> > > > > > > > (3) *apache/airflow-docker-image *should use the
> > latest
> > > >> >> > available
> > > >> >> > > > > stable
> > > >> >> > > > > > > > version of airflow
> > > >> >> > > > > > > > (4) *apache/airflow-helm-chart *should use the
> latest
> > > >> >> available
> > > >> >> > > > > stable
> > > >> >> > > > > > > > version of airflow
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > Having such split also makes some updates more
> > > >> >> difficult -
> > > >> >> for
> > > >> >> > > > > example if
> > > >> >> > > > > > > > > we add new "extra" to Airflow that will require
> to
> > > >> install
> > > >> >> > > "apt"
> > > >> >> > > > > > > > dependency
> > > >> >> > > > > > > > > in Dockerfile, we will have to split it into
> first
> > > >> adding
> > > >> >> the
> > > >> >> > > > > > > dependency
> > > >> >> > > > > > > > to
> > > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add the
> > > >> >> extra to
> > > >> >> > > > airflow
> > > >> >> > > > > with
> > > >> >> > > > > > > > > setup.py.
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > Adding a new extra to setup.py would not (and
> should
> > > not)
> > > >> >> > impact
> > > >> >> > > > the
> > > >> >> > > > > > > > development of *apache/airflow-docker-image*
> > > >> >> > > > > > > > Once an RC is cut for apache/airflow or after a new
> > > >> version
> > > >> >> is
> > > >> >> > > > > released
> > > >> >> > > > > > > for
> > > >> >> > > > > > > > apache/airflow, we can work on supporting the new
> > > airflow
> > > >> >> > version
> > > >> >> > > > in
> > > >> >> > > > > the
> > > >> >> > > > > > > > Production Docker Image.
> > > >> >> > > > > > > > While doing that we can add all the libraries that
> > are
> > > >> needed
> > > >> >> > by
> > > >> >> > > > the
> > > >> >> > > > > new
> > > >> >> > > > > > > > Airflow Version and we will have a clean commit
> > history
> > > >> and
> > > >> >> > > > > changelog for
> > > >> >> > > > > > > > Docker image.
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > We definitely do not need to work parallelly on
> both
> > > the
> > > >> >> repos.
> > > >> >> > > By
> > > >> >> > > > > doing
> > > >> >> > > > > > > > development in a separate repo we keep consistent
> > > >> "source"
> > > >> >> > files
> > > >> >> > > > and
> > > >> >> > > > > we
> > > >> >> > > > > > > can
> > > >> >> > > > > > > > release each artifact with a
> > > >> >> > > > > > > > separate cadence. If someone discovers bug in newly
> > > >> released
> > > >> >> > > > > Dockerimage,
> > > >> >> > > > > > > > we should be easily able to cut out a new release
> > with
> > > >> the
> > > >> >> > patch
> > > >> >> > > > > without
> > > >> >> > > > > > > > worrying about how development is
> > > >> >> > > > > > > > going in the apache/airflow repo.
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > *Apache Flink & Apache CoucheDB *does it in the
> > similar
> > > >> >> manner:
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > https://github.com/apache/flink &
> > > >> >> > > > > https://github.com/apache/flink-docker
> > > >> >> > > > > > > > https://github.com/apache/couchdb &
> > > >> >> > > > > > > > https://github.com/apache/couchdb-docker
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > Regards,
> > > >> >> > > > > > > > Kaxil
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk <
> > > >> >> > > > > Jarek.Potiuk@polidea.com>
> > > >> >> > > > > > > > wrote:
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > > I do not think it's only the question of
> Mono/Multi
> > > >> repos.
> > > >> >> > > While
> > > >> >> > > > I
> > > >> >> > > > > > > > clearly
> > > >> >> > > > > > > > > see the benefit of separate repos I also see some
> > > >> >> drawbacks.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > And if it bothers others, I am happy to follow
> the
> > > >> >> majority.
> > > >> >> > If
> > > >> >> > > > we
> > > >> >> > > > > > > think
> > > >> >> > > > > > > > > that a bit more complexity in testing justifies
> > > >> separating
> > > >> >> > > those
> > > >> >> > > > > three
> > > >> >> > > > > > > > > completely and having more "clean"- it's also
> > > >> >> workable but
> > > >> >> > IMHO
> > > >> >> > > > > > > > introduces
> > > >> >> > > > > > > > > certain complexity in development.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > However I think this is not 0/1 a kind of Hybrid
> > > >> approach
> > > >> >> in
> > > >> >> > my
> > > >> >> > > > > opinion
> > > >> >> > > > > > > > > might be best of both worlds - development and
> > > >> >> releases .
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > Let me explain what I mean by "Hybrid":
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > I think we definitely should have separate
> > > >> >> repositories to
> > > >> >> > > > release
> > > >> >> > > > > > > those
> > > >> >> > > > > > > > > artifacts and I think there is no doubt about it:
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > * airflow (apache/airflow)
> > > >> >> > > > > > > > > * prod docker image (apache/airflow-docker)
> > > >> >> > > > > > > > > * helm chart (apache/airflow-helm)
> > > >> >> > > > > > > > > * api clients (we already have separate repos for
> > > >> those)
> > > >> >> > > > > > > > > (apache/airflow-client-*)
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > I think the only question is where we develop all
> > > those
> > > >> >> > > (develop
> > > >> >> > > > !=
> > > >> >> > > > > > > > > release). There are certain benefits of having a
> > > single
> > > >> >> > > "master"
> > > >> >> > > > > (let's
> > > >> >> > > > > > > > > call it "development" further) for all those
> > > artifacts.
> > > >> >> > > Currently
> > > >> >> > > > > the
> > > >> >> > > > > > > > > "development" version for all of those is in one
> > repo
> > > >> >> - and
> > > >> >> > > while
> > > >> >> > > > > > > > > developing one depends on the other, we also test
> > all
> > > >> of
> > > >> >> > those
> > > >> >> > > > > together
> > > >> >> > > > > > > > and
> > > >> >> > > > > > > > > this means that "current best" set of airflow
> > sources
> > > >> >> > > (including
> > > >> >> > > > > > > > > dependencies in setup.py), Dockerfile and Helm
> > chart
> > > >> work.
> > > >> >> > This
> > > >> >> > > > > means
> > > >> >> > > > > > > for
> > > >> >> > > > > > > > > example that you will not be able to break the
> Helm
> > > >> Chart
> > > >> >> by
> > > >> >> > > > > changing
> > > >> >> > > > > > > > > anything that the helm chart depends on in
> airflow.
> > > For
> > > >> >> > example
> > > >> >> > > > if
> > > >> >> > > > > you
> > > >> >> > > > > > > > > change "airflow webserver" into "airflow server"
> > the
> > > >> >> current
> > > >> >> > > helm
> > > >> >> > > > > chart
> > > >> >> > > > > > > > > will break. Similarly if you change entrypoint,sh
> > in
> > > >> Docker
> > > >> >> > > image
> > > >> >> > > > > in a
> > > >> >> > > > > > > > way
> > > >> >> > > > > > > > > that is not compatible with Helm chart, we will
> not
> > > let
> > > >> >> that
> > > >> >> > > > > happen -
> > > >> >> > > > > > > the
> > > >> >> > > > > > > > > CI tests will break if either of those changes in
> > an
> > > >> >> > > incompatible
> > > >> >> > > > > way.
> > > >> >> > > > > > > > And
> > > >> >> > > > > > > > > we can have dependencies in any direction between
> > > those
> > > >> >> > three.
> > > >> >> > > > > When we
> > > >> >> > > > > > > > see
> > > >> >> > > > > > > > > a commit break either of the three - we can make
> a
> > > >> decision
> > > >> >> > > about
> > > >> >> > > > > what
> > > >> >> > > > > > > to
> > > >> >> > > > > > > > > do - either accept and document the
> incompatibility
> > > >> >> or fix
> > > >> >> > it.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > Of course keeping that property (testing it all
> > > >> together)
> > > >> >> is
> > > >> >> > > also
> > > >> >> > > > > > > > possible
> > > >> >> > > > > > > > > if they are in completely separate repos. There
> are
> > > >> several
> > > >> >> > > > > > > > > cross-dependencies - Docker image building
> depends
> > on
> > > >> >> > > > dependencies
> > > >> >> > > > > in
> > > >> >> > > > > > > > > setup.py for example, you cannot build Docker
> image
> > > >> from
> > > >> >> only
> > > >> >> > > > > > > Dockerfile
> > > >> >> > > > > > > > > without the sources of airflow nor build and test
> > > helm
> > > >> >> charts
> > > >> >> > > > > without
> > > >> >> > > > > > > the
> > > >> >> > > > > > > > > image (and sources - because that's where the
> > current
> > > >> >> > > kubernetes
> > > >> >> > > > > tests
> > > >> >> > > > > > > > > are). If we want to continue doing it for both
> Helm
> > > and
> > > >> >> > > > > Dockerfile, we
> > > >> >> > > > > > > > > would have to basically check out the latest
> > sources
> > > of
> > > >> >> > Airflow
> > > >> >> > > > > and run
> > > >> >> > > > > > > > the
> > > >> >> > > > > > > > > CI tests before merging any Docker or Helm Chart
> > > >> changes
> > > >> >> and
> > > >> >> > > the
> > > >> >> > > > > > > > opposite -
> > > >> >> > > > > > > > > we will have to download Dockerfile/Helm chart
> and
> > > >> build
> > > >> >> > > > > image/install
> > > >> >> > > > > > > > Helm
> > > >> >> > > > > > > > > chart when we are running CI tests for Airflow.
> > This
> > > is
> > > >> >> > > possible
> > > >> >> > > > > and we
> > > >> >> > > > > > > > > could do it, but it adds complexity to the
> build/CI
> > > >> >> process.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > Having such split also makes some updates more
> > > >> >> difficult -
> > > >> >> > for
> > > >> >> > > > > example
> > > >> >> > > > > > > if
> > > >> >> > > > > > > > > we add new "extra" to Airflow that will require
> to
> > > >> install
> > > >> >> > > "apt"
> > > >> >> > > > > > > > dependency
> > > >> >> > > > > > > > > in Dockerfile, we will have to split it into
> first
> > > >> adding
> > > >> >> the
> > > >> >> > > > > > > dependency
> > > >> >> > > > > > > > to
> > > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add the
> > > >> >> extra to
> > > >> >> > > > airflow
> > > >> >> > > > > with
> > > >> >> > > > > > > > > setup.py. This makes it quite difficult to test
> it
> > > >> together
> > > >> >> > > > though
> > > >> >> > > > > (the
> > > >> >> > > > > > > > > Dockerfile change can only be tested fully after
> > > >> >> merging it
> > > >> >> > to
> > > >> >> > > > > master).
> > > >> >> > > > > > > > Not
> > > >> >> > > > > > > > > mentioning complexity of managing different
> > versions
> > > >> >> - your
> > > >> >> > > local
> > > >> >> > > > > > > > > development Dockerfile version vs sources of
> > Airflow
> > > >> for
> > > >> >> > > example.
> > > >> >> > > > > > > Imagine
> > > >> >> > > > > > > > > switching between branches where you add two
> > > >> >> different apt
> > > >> >> > > > > dependencies
> > > >> >> > > > > > > > to
> > > >> >> > > > > > > > > the Dockerfile. There are more similar scenarios
> I
> > > can
> > > >> >> > imagine
> > > >> >> > > -
> > > >> >> > > > > > > > especially
> > > >> >> > > > > > > > > for parallel changes in those repos.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > This is of course doable to keep them separate,
> but
> > > >> >> it is
> > > >> >> > > quite a
> > > >> >> > > > > bit
> > > >> >> > > > > > > > more
> > > >> >> > > > > > > > > complex to set up (especially for a consistent
> > > >> development
> > > >> >> > > > > environment)
> > > >> >> > > > > > > > > when you have separate repos and prevent
> > > cross-breaking
> > > >> >> > changes
> > > >> >> > > > > might
> > > >> >> > > > > > > be
> > > >> >> > > > > > > > > more difficult.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > I believe that the best way is to continue
> > developing
> > > >> >> > airflow +
> > > >> >> > > > > image +
> > > >> >> > > > > > > > > chart in one repo - airflow, but release them
> from
> > > >> those
> > > >> >> > > separate
> > > >> >> > > > > > > repos.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > Airflow source release does not have to contain
> > > neither
> > > >> >> > chart,
> > > >> >> > > > nor
> > > >> >> > > > > > > image.
> > > >> >> > > > > > > > > And even if it contains sources for those, they
> are
> > > >> >> not the
> > > >> >> > > final
> > > >> >> > > > > > > > > "artifacts" (installable image and installable
> helm
> > > >> chart).
> > > >> >> > > > > > > > > Whenever we decide to release either of them - we
> > > >> >> test it
> > > >> >> in
> > > >> >> > > > > > > > "development".
> > > >> >> > > > > > > > > Then only when it is tested, we copy the sources
> to
> > > >> those
> > > >> >> > > > separate
> > > >> >> > > > > > > repos
> > > >> >> > > > > > > > > and release them.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > With git - we can even do it very easily while
> > > >> preserving
> > > >> >> > > history
> > > >> >> > > > > of
> > > >> >> > > > > > > > > commits easily (been there, done that). And then
> we
> > > >> could
> > > >> >> > > release
> > > >> >> > > > > Helm
> > > >> >> > > > > > > > and
> > > >> >> > > > > > > > > Docker image separately based on the commits and
> > tags
> > > >> in
> > > >> >> > those
> > > >> >> > > > > separate
> > > >> >> > > > > > > > > repositories.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > I agree that separate repos is a more "clean"
> > > approach.
> > > >> >> But I
> > > >> >> > > > > think it
> > > >> >> > > > > > > is
> > > >> >> > > > > > > > > less convenient for development consistency.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > J,
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <
> > > >> >> > kaxilnaik@gmail.com
> > > >> >> > > >
> > > >> >> > > > > wrote:
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > > Forgot to mention, having them in separate repo
> > > also
> > > >> >> helps
> > > >> >> > in
> > > >> >> > > > > better
> > > >> >> > > > > > > > > > managing each individual artifacts.
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > > > Each repo would have a separate Github Issue
> > where
> > > >> >> we can
> > > >> >> > > track
> > > >> >> > > > > the
> > > >> >> > > > > > > > issue
> > > >> >> > > > > > > > > > specific to Helm chart or Dockerfile.
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > > > Regards,
> > > >> >> > > > > > > > > > Kaxil
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <
> > > >> >> > > kaxilnaik@gmail.com
> > > >> >> > > > >
> > > >> >> > > > > > > wrote:
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > > > > The PMC also needs to agree if we want
> separate
> > > >> VOTING
> > > >> >> > for
> > > >> >> > > > > Docker
> > > >> >> > > > > > > > Image
> > > >> >> > > > > > > > > > > and Helm chart, I think we do.
> > > >> >> > > > > > > > > > >
> > > >> >> > > > > > > > > > > Regards,
> > > >> >> > > > > > > > > > > Kaxil
> > > >> >> > > > > > > > > > >
> > > >> >> > > > > > > > > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik <
> > > >> >> > > > kaxilnaik@gmail.com
> > > >> >> > > > > >
> > > >> >> > > > > > > > wrote:
> > > >> >> > > > > > > > > > >
> > > >> >> > > > > > > > > > >> Hi all,
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >> What do you all think about having
> Dockerfile
> > > >> >> and Helm
> > > >> >> > > chart
> > > >> >> > > > > in
> > > >> >> > > > > > > the
> > > >> >> > > > > > > > > same
> > > >> >> > > > > > > > > > >> "Airflow" Repo vs separate?
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >> I feel having a separate repo for Airflow
> > > >> Dockerfile
> > > >> >> and
> > > >> >> > > > Helm
> > > >> >> > > > > > > chart
> > > >> >> > > > > > > > > have
> > > >> >> > > > > > > > > > >> more benefits like easy to track changes
> (via
> > > >> >> > Changelog),
> > > >> >> > > > > easy for
> > > >> >> > > > > > > > new
> > > >> >> > > > > > > > > > >> contributors, separate release cadence.
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >> Currently, docker file and Helm Chart are
> > inside
> > > >> the
> > > >> >> > same
> > > >> >> > > > > repo and
> > > >> >> > > > > > > > > when
> > > >> >> > > > > > > > > > >> we release changelog for a new Airflow
> > version,
> > > it
> > > >> >> would
> > > >> >> > > > > include
> > > >> >> > > > > > > all
> > > >> >> > > > > > > > > > >> changes (Airflow + Dockerfile + Helm chart)
> > > >> >> which I
> > > >> >> > think
> > > >> >> > > is
> > > >> >> > > > > not
> > > >> >> > > > > > > > that
> > > >> >> > > > > > > > > > great.
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >> Also having them all inside a single repo
> > means
> > > >> >> changes
> > > >> >> > in
> > > >> >> > > > > Helm
> > > >> >> > > > > > > > Chart
> > > >> >> > > > > > > > > > and
> > > >> >> > > > > > > > > > >> Dockerfile can block Airflow release. We
> could
> > > use
> > > >> >> > stable
> > > >> >> > > > Helm
> > > >> >> > > > > > > Chart
> > > >> >> > > > > > > > > > >> version and Dockerfile version to test
> Airflow
> > > >> >> so that
> > > >> >> > > they
> > > >> >> > > > > are
> > > >> >> > > > > > > > > > blockers to
> > > >> >> > > > > > > > > > >> release too.
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >> Happy to hear the thoughts from the
> community.
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >> Regards,
> > > >> >> > > > > > > > > > >> Kaxil
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > --
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > Jarek Potiuk
> > > >> >> > > > > > > > > Polidea <https://www.polidea.com/> | Principal
> > > >> Software
> > > >> >> > > Engineer
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > M: +48 660 796 129 <+48660796129>
> > > >> >> > > > > > > > > [image: Polidea] <https://www.polidea.com/>
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > >
> > > >> >> > > > > > >
> > > >> >> > > > > > > --
> > > >> >> > > > > > >
> > > >> >> > > > > > > Jarek Potiuk
> > > >> >> > > > > > > Polidea <https://www.polidea.com/> | Principal
> > Software
> > > >> >> Engineer
> > > >> >> > > > > > >
> > > >> >> > > > > > > M: +48 660 796 129 <+48660796129>
> > > >> >> > > > > > > [image: Polidea] <https://www.polidea.com/>
> > > >> >> > > > > >
> > > >> >> > > > > >
> > > >> >> > > > > >
> > > >> >> > > > > > --
> > > >> >> > > > > >
> > > >> >> > > > > > Jarek Potiuk
> > > >> >> > > > > > Polidea <https://www.polidea.com/> | Principal
> Software
> > > >> Engineer
> > > >> >> > > > > >
> > > >> >> > > > > > M: +48 660 796 129 <+48660796129>
> > > >> >> > > > > > [image: Polidea] <https://www.polidea.com/>
> > > >> >> > > >
> > > >> >> > > >
> > > >> >> > > >
> > > >> >> > > > --
> > > >> >> > > >
> > > >> >> > > > Jarek Potiuk
> > > >> >> > > > Polidea <https://www.polidea.com/> | Principal Software
> > > Engineer
> > > >> >> > > >
> > > >> >> > > > M: +48 660 796 129 <+48660796129>
> > > >> >> > > > [image: Polidea] <https://www.polidea.com/>
> > > >> >> > > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >>
> > > >> >> Jarek Potiuk
> > > >> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >> >>
> > > >> >> M: +48 660 796 129 <+48660796129>
> > > >> >> [image: Polidea] <https://www.polidea.com/>
> > > >> >>
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> > > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>



-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Separate Repo vs MonoRepo for Dockerfile & Helm Chart

Posted by Jarek Potiuk <Ja...@polidea.com>.
Yep. Prod only and I already looked at integrating it slightly differently.
The only reason it is main is that I wanted then to make it cleanly
separated from "master" and main was the default one (And you need those
scripts in the main branch in order to have scheduled workflows run). But I
believe we can easily have scripts in the same branch as the one being
synchronized so the "main" branch will eventually contain both the
workflows to synchronize ("in .github" folder) and the "main" Prod docker
fille + scripts.

On Wed, Nov 11, 2020 at 8:37 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> Will this just have the prod dockerfile? That's what it looks like from
> your example repo. (I like that, just want to make sure)
>
> +1
>
> One point: your "main" branch has the merge scripts etc -- but we should
> leave main free for if we/when rename master on main on Airflow.
>
> -ash
> On Nov 11 2020, at 6:23 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
>
> Calling for Lazy consensus here. Unless someone objects in 72 hours
> (roughly end of this weekend) I will create an "airflow-docker" repo.
>
> For now, I want to focus only on building the docker image. Any other
> stuff (docker-compose, helm chart) might be a separate discussion after
> that.
>
> J.
>
> On Mon, Oct 26, 2020 at 6:26 AM Daniel Imberman <da...@gmail.com>
> wrote:
>
> I am all for this. This is how kubernetes does it and it has worked out
> really well for them.
>
> On Sun, Oct 25, 2020 at 10:23 PM, Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> Yep, that would be nice. Agree that this is not obvious where some files
> come from.
>
> Agree this could be done if everyone thinks it's a good idea. This would
> be perfectly doable, we could even make it works with the whole history
> maintained (we'd just need to include historical paths in the script).
>
> And if we make it in time before 1.10.13, we could even release it within
> 1.10.13.
>
> J
>
>
> On Sun, Oct 25, 2020 at 10:03 PM Kamil Breguła <ka...@polidea.com>
> wrote:
>
> I took a quick look and I like the overall concept, but I'm just wondering
> if it will be clear enough for users. Currently, these scripts copy
> different files from different directories and the mapping of the source to
> the destination is written in the scripts. This will make it difficult to
> contribute to this "sub-project". In my opinion, if we want to create new
> repositories from some files, we should only do it for one directory. If
> this directory has dependencies, we should try to break them down. The
> end-user should not get the impression that they are in contact with the
> copied repository at the first glance. Otherwise, we will not achieve our
> primary goal - to facilitate end-user use.
>
> In this case, it means that we should create a new directory in
> apache/airflow named "prod-docker-image" or similar and move to it the
> necessary Dockerfiles, documentation, scripts, and all other assets. In
> particular, this directory should contain README.md which actually
> describes the contents of that directory.
>
> A good example is /chart directory. It only has one dependency which is
> not is "/chart" directory - the "Contributing" section in README.md refers
> to the file in the root directory of the repository. This link will stop
> working if we create a new repository from the entire directory. It will be
> trivial to fix.
>
> On Sun, Oct 25, 2020 at 9:18 PM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> Hello Everyone,
>
> I would like to come back to the discussion as I have *JUST* implemented
> the solution (very simple but 100% working) to this monorepo vs. separate
> repos.
>
> You can take a look at this repo of mine:
> https://github.com/potiuk/airflow-docker. It is very simple and works
> like a charm. I implemented it to solve the issue
> https://github.com/apache/airflow/issues/11740
>
> This is a separate repo that people can use to have a separate "read-only"
> repository that **only** keeps our Dockerfile-related stuff - including the
> full history of changes related (and only those), full traceability, and
> incremental, automated synchronization from our "airflow" repo.
>
> I can - any time - set it up as "apache/airflow-docker" and get it to
> synchronize every day or every hour.
>
> Here, how it works:
>
> * The "master" and "v1-10-stable" branches are filtered to only contain
> files that are needed to build Prod Docker image
> * We keep history of all relevant commits in those branches
> * In the "main" branch we only keep the "scheduled" Github Actions
> workflow that does the synchronization and README.md which explains what
> needs to be done to build the docker image
> * I am using the excellent "git-filter-repo" tool which does the job
> really well and fast. Git-filter-repo is recommended by Git maintainers
> over the old, slow and much worse built-in git-filter-branch:
> https://git-scm.com/docs/git-filter-branch#_warning
> * the jobs to synchronize the repo takes 1m30 s to run - it is rather fast
> despite analyzing 13500 commits :)
> * it runs incrementally - just adding new commits when they appear
> * it is very simple, few lines script + few steps in Github Action to
> checkout/push the right branches
> * we keep all the commit mapping in the repo as well, so we have 1-1
> relationship between the commits in the "docker repo" and the original ones
> in Airflow repo
> * synchronization is 1-way - airflow -> airlfow-docker
> * we can use a very similar approach for synchronizing:
> * Helm chart
> * Open API clients
> * other stuff
>
> It also follows our source release strategy - it has the same "properties"
> as our main repo - so it is merely a "convenience" way of accessing the
> Docker customization options, but the same functionality is available in
> our officially released sources.
>
> Do you think we should turn it into the "apache/airflow-docker" repo?
>
> J.
>
>
>
> On Sun, Jul 5, 2020 at 8:12 PM Daniel Imberman <da...@gmail.com>
> wrote:
>
> Worth noting that git has the ability to cherry-pick only specific
> directories. If we keep all of helm + tests in one directory, docker +
> tests in another, and core + tests in a third directory it would be pretty
> simple to automate splitting them.
>
>
> https://stackoverflow.com/questions/19821749/git-cherry-pick-or-merge-specific-directory-from-another-branch
>
> via Newton Mail [
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> ]
> On Sun, Jul 5, 2020 at 9:57 AM, Daniel Imberman <da...@gmail.com>
> wrote:
> I can’t agree with this enough :). I think writing a few bots to separate
> out sections will be MUCH easier in the long run than maintaining multiple
> repos. Will also prevent the difficulty of setting up a proper dev
> environment for new contributors.
> via Newton Mail [
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> ]
> On Sun, Jul 5, 2020 at 9:53 AM, Jarek Potiuk <Ja...@polidea.com>
> wrote:
> Yeah. I think that the "monorepo" is the only way for now - until (or if)
> we reach the size (and maturity) that different teams take care of the
> different projects. Which might even not happen.
>
> But I would love to try the separate repos to publish/release still (maybe
> not immediately, but it is a nice concept). I think it should be rather
> easy (I will try it on my own repo first). Also, I think it has another
> advantage - those separate repos might actually run other kinds of tests -
> for example, to test if there is "everything" in that repo to release it
> (for example build helm chart) and whether there are no accidental use of
> stuff from outside of those dirs.
>
> I already thought about how to do it - it should be rather easy. Of course
> - like most of the time - there is a ready-to-use git command doing it for
> us. We simply need a bot running for that rep executing a variant of this
> command:
>
> https://docs.github.com/en/github/using-git/splitting-a-subfolder-out-into-a-new-repository
> (it
> should only take commits from the commit merged last time). So level of
> automation here is rather minimal.
>
> And if have those repos and at some point of time we decide to split
> eventually - we will have already repos with all history as a starting
> point.
>
> J.
>
>
>
>
>
>
>
> J.
>
>
> On Sun, Jul 5, 2020 at 4:42 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> > Hmm.. I agree the git-sync would have been a difficult one to solve if we
> > had separate repositories.
> >
> > Well, in that case, the mono repo approach (like we have now) indeed
> makes
> > more sense.
> >
> > Regarding the Kubernetes approach, I feel the ones in staging (
> > https://github.com/kubernetes/kubernetes/tree/master/staging) are part
> of
> > the actual product itself but in our case we were discussing between Helm
> > chart and Dockerfile which are not actually part of the product. And we
> > will need a good deal of automation if we go down that route.
> > I think the plain mono-repo approach is better than that one.
> >
> > Regards,
> > Kaxil
> >
> >
> > On Sun, Jul 5, 2020 at 9:19 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > And one more perfect illustration of what I am talking about.
> > >
> > > A very good thing just happened. I was running the PR while writing the
> > > email (long time as you might imagine) and the new K8S tests with
> 1.10.11
> > > just failed. https://github.com/apache/airflow/pull/9663
> > >
> > > If had released the helm chart before we would've clear (small)
> > > incompatibility here. And by seeing the test failing we could make
> > decision
> > > what to do:
> > >
> > > 1) fix it differently
> > > 2) document it as a breaking Helm change, "1.10.12+ image" and make
> test
> > > work in both cases
> > > 3) revert ...
> > >
> > > But at least we have na early warning that something is wrong. This is
> > the
> > > clear value of running the tests at every commit.
> > >
> > > J.
> > >
> > > On Sun, Jul 5, 2020 at 10:08 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > > wrote:
> > >
> > > > I just have another example of a case where splitting the repos and
> > using
> > > > only "released versions" across repositories might be a complete
> > overkill
> > > > when it comes to development complexity.
> > > >
> > > > We have this change from Aneesh:
> > > > https://github.com/apache/airflow/pull/9371 about adding a git-sync
> > > > option to the helm chart.
> > > >
> > > > That's a new feature, but we would like to test both 1.10 and the
> > master
> > > > version of KubernetesExecutor with that. It should work for both of
> > them
> > > -
> > > > there is no coupling/dependency in the "airflow' code for it.
> > > >
> > > > However, there is a strong coupling in the tests. We have the
> > > > "kubernetes_tests" running tests using all three: chart, production
> > > docker,
> > > > and Airflow, Those tests will have to be likely adapted to work with
> > the
> > > > new git-sync option. They were disabled previously as we had problems
> > > with
> > > > them before the helm chart was used for tests but we can turn them
> back
> > > on
> > > > now when git-sync is added to the helm chart. Those tests are part of
> > > > airflow test suite and we discussed with Daniel that they should stay
> > > there
> > > > - those tests are importing airflow code, they are using latest
> example
> > > > dags which are also in the airflow code.
> > > >
> > > > So we have two ways how we can develop this -
> > > > A) monorepo (current)
> > > > B) separate repos.
> > > >
> > > > Just to remind - he goal is that our change is tested against:
> > > >
> > > > 1) Released Airflow version (say 1.10.11).
> > > > 2) Development airflow version (master - soon possibly development)
> > > > 3) Development docker image built with either "development" or
> > "1.10.11"
> > > > (we can release the Docker image for 1.10.11 independently from the
> > > current
> > > > development HEAD). The docker image is supposed to work with any
> > version
> > > of
> > > > airflow
> > > >
> > > > In the case of A) Monorepo we have all that as a given.
> > > >
> > > > I just sent this really small PR that should do the job:
> > > > https://github.com/apache/airflow/pull/9663. What it does, it takes
> > the
> > > > latest "development" docker image, "development" chart, bakes in the
> > > latest
> > > > "example dags" from "development branch". The image uses either
> > > > "development" or released (from PyPI) "1.10.11" Airflow version - and
> > run
> > > > the "development" tests against it. This is exactly what we want. If
> we
> > > add
> > > > new features to the helm chart, the Kubernetes tests will have to be
> > > > updated to include that - and this will happen in the airflow
> > > "development"
> > > > branch. The REALLY good thing in it - since we are running those
> tests
> > in
> > > > CI build of airflow development branch - we prevent anyone from
> making
> > > > breaking changes. It is a given that both - the "development" of
> > airflow
> > > > and the "1.10.11" version of airflow will continue to work with the
> > image
> > > > and chart.
> > > >
> > > >
> > > > In the case of B) where we split the repos:
> > > >
> > > > We have to decide where to keep the "kubernetes_tests" - should they
> be
> > > in
> > > > "Airflow" or in "Helm". They are testing BOTH so we can choose either
> > > way.
> > > > Together with Daniel we plan to expand those tests to cover all the
> > > > different options we have in the Chart - testing all of it -
> Kubernetes
> > > > Executor, Celery Executor running on Kubernetes, MySQL (once we add
> > it),
> > > > etc. etc. So we want to make sure we have a matrix of tests covering
> a
> > > > number of deployment options. Those tests do not exist yet, and they
> > will
> > > > have to be written. In principle - they can be moved to the "Helm"
> > > > repository. That's where they conceptually belong. However - there
> is a
> > > > Huge value in running the tests in airflow "development" - the value
> is
> > > > that no-one will be able to break the "development" airflow, because
> > > those
> > > > tests are run with every PR. I think we have no choice but to run
> those
> > > > tests always in development. Otherwise, people maintaining the helm
> > chart
> > > > will have to fix the problems introduced by people changing Airflow
> > > code. I
> > > > think this is a pretty bad idea to allow that. So if we move those
> > tests
> > > to
> > > > Helm Chart repo we have to figure out how to run those "kubernetes"
> > tests
> > > > in CI for every build. This is quite possible - by getting the latest
> > > > master from helm chart and running the build, but it has several
> > > problems:
> > > >
> > > > 1) The test code for CI will have to continue to stay in Airflow (to
> > run
> > > > CI builds) - this means that we already have coupling and some code
> > > related
> > > > to the execution of the helm tests has to be any way in Airflow.
> > > >
> > > > 2) Bigger problem. What happens if as "Airflow developer" you DO
> > > introduce
> > > > a change that breaks the helm chart? You will see a CI error and.....
> > You
> > > > will not know what to do. Do you involve people who maintain the helm
> > > chart
> > > > and wait for them? I think not. You should be able to reproduce the
> > > problem
> > > > locally and fix it yourself (maybe with the help of others - but you
> > > should
> > > > be able to fix your own commit). We would have to teach people how to
> > > bring
> > > > the docker image and helm chart code from the latest version and run
> > the
> > > > tests. We could do it automatically with Breeze (similarly as we do
> > with
> > > > other integrations - where we bring in Kerberos, Mongo, and a
> multitude
> > > of
> > > > others) without them even knowing it, but this might be fairly
> complex
> > > and
> > > > prone to errors. In Monorepo - we already have a simple way of
> > > reproducing
> > > > and running the tests locally and everything is in one place.
> > > >
> > > > 3) There is a chance that someone makes a change in Helm in parallel
> > to a
> > > > change in Airflow that breaks it. This could easily happen in the
> > > "git-sync
> > > > case" or when we add "MySQL" for example in the future. And there is
> no
> > > way
> > > > to prevent it.
> > > >
> > > > 4) If we only test against "released" Helm and Airflow (that was one
> of
> > > > the suggestions), the problem is even bigger. How do you know that
> you
> > do
> > > > not break the currently "developed" helm chart? Or how do you know
> that
> > > the
> > > > currently "developed" helm chart works with latest Airflow release?
> If
> > > you
> > > > do not do those checks at the "commit" time, then you defer this to
> > > > "release time" and only then you might find out that decisions you
> made
> > > > during development have to be reverted. This is a very, very bad idea
> > > IMHO
> > > > again leading to the case that the release manager will have to fix
> > > > problems introduced by others.
> > > >
> > > > J,
> > > >
> > > >
> > > >
> > > > On Fri, Jul 3, 2020 at 10:28 PM Ash Berlin-Taylor <as...@apache.org>
> > > wrote:
> > > >
> > > >> Monorepo FTW.
> > > >>
> > > >> Yes, it gets a little bit messier around release, but the approach
> of
> > > >> automatically extracting out the commits (or parts of commits) to a
> > > >> separate repo for releasing may be the solution to that problem
> > > >>
> > > >>
> > > >> -ash
> > > >>
> > > >> On Jul 3 2020, at 7:51 pm, Kaxil Naik <ka...@gmail.com> wrote:
> > > >>
> > > >> > I will take a look at the Kubernetes approach and get back to this
> > > >> thread.
> > > >> >
> > > >> > We had a discussion with Daniel yesterday and we are both
> concerned
> > > >> about
> > > >> >> all the overhead for people like us who work on all three
> > "entities"
> > > >> >> at the
> > > >> >> same time. Even just explaining how to work with Pull Requests
> and
> > in
> > > >> what
> > > >> >> sequence those PRs would have to be opened and merged in case of
> > > >> changes
> > > >> >> that are spanning across several "entities" - was a challenge. I
> > was
> > > >> unable
> > > >> >> to clearly explain the sequence and way of reviewing/merging the
> > PRs
> > > >> that
> > > >> >> will have to be made if we have submodules. This is a bad sign
> as I
> > > was
> > > >> >> using submodules in the past and know how it works but I was
> unable
> > > to
> > > >> >> explain it clearly.
> > > >> >
> > > >> >
> > > >> > We don't even need submodules tbh. We can just use Bash Script
> that
> > > >> > pulls a
> > > >> > pinned Helm Chart version.
> > > >> > We only need Helm chart to run integration test for k8s (atleast
> for
> > > >> now).
> > > >> > We already use tons of Bash scripts.
> > > >> >
> > > >> > One of the important benefits of separation that changes in one
> > > >> component
> > > >> > should not need change in other component, atleast
> > > >> > not immediately.
> > > >> >
> > > >> > Changes in Helm chart and Docker file should never need changes in
> > > >> Airflow
> > > >> > Changes in Airflow should only ever need a change in Dockerfile
> and
> > > Helm
> > > >> > Chart after a new version is released.
> > > >> >
> > > >> > I just had a talk with Daniel too and still didn't find a good
> > enough
> > > >> > reason to have them in the same repo.
> > > >> >
> > > >> > I will definitely look at the Kubernetes approach (maybe it is
> > better)
> > > >> and
> > > >> > get back to this thread. But as of now I don't see any major PROs
> > > >> > for having them in the same repo.
> > > >> >
> > > >> > Regards,
> > > >> > Kaxil
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Fri, Jul 3, 2020 at 5:00 PM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com
> > > >
> > > >> > wrote:
> > > >> >
> > > >> >> I think Ry's point is an important one - I thought about writing
> a
> > > >> longer
> > > >> >> post but I looked at the Kubernetes structure and I really like
> it
> > so
> > > >> just
> > > >> >> wanted to comment on this last one.
> > > >> >>
> > > >> >> Seems that it is simply one "authoritative" (or source of truth)
> > repo
> > > >> where
> > > >> >> everything is developed in monorepo fashion but then there is a
> bot
> > > >> >> that moves every commit related to subdirectories to those
> > > "split-out"
> > > >> >> repos. There are never direct commits of people or PRs in the
> > > >> "split-out"
> > > >> >> repositories. This is very similar to my original proposal to
> have
> > > >> >> dedicated repos used for releases - but with an automated way of
> > > >> publishing
> > > >> >> the commits to the "separated" repos at the moment, they are
> merged
> > > to
> > > >> >> master in the main repo. I love it.
> > > >> >>
> > > >> >> I think it's really good and "pragmatic" solution. The code is
> > > >> >> available in
> > > >> >> separate repos, including the history of commits related to each
> > > >> "entity"
> > > >> >> (so only chart-related commits in chart repo). Issues for
> > particular
> > > >> >> "entities" are in those separate repos as well (something that
> > Kaxil
> > > >> >> mentioned). Users (not developers!) who are interested only in
> > > >> Dockerfile
> > > >> >> or Helm Chart have separate repos they can look at - with only
> > > relevant
> > > >> >> changes and history of releases for that particular entity. They
> > can
> > > >> raise
> > > >> >> issues there (and in GitHub, we can easily refer to those issues
> > from
> > > >> the
> > > >> >> main "airflow" repo). All the discussion from "user issues" are
> > kept
> > > >> >> in the
> > > >> >> relevant repositories. Still - comments about development changes
> > > (and
> > > >> >> related issues) might still be kept in the main "airflow" repo -
> > next
> > > >> to
> > > >> >> other "development" changes.
> > > >> >>
> > > >> >> We can run separate releases from those linked repositories and
> > even
> > > >> >> publish sources directly from those repositories rather than from
> > the
> > > >> main
> > > >> >> one. At the same time - we avoid all the hassle of submodules.
> > > >> >>
> > > >> >> We had a discussion with Daniel yesterday and we are both
> concerned
> > > >> about
> > > >> >> all the overhead for people like us who work on all three
> > "entities"
> > > >> >> at the
> > > >> >> same time. Even just explaining how to work with Pull Requests
> and
> > in
> > > >> what
> > > >> >> sequence those PRs would have to be opened and merged in case of
> > > >> changes
> > > >> >> that are spanning across several "entities" - was a challenge. I
> > was
> > > >> unable
> > > >> >> to clearly explain the sequence and way of reviewing/merging the
> > PRs
> > > >> that
> > > >> >> will have to be made if we have submodules. This is a bad sign
> as I
> > > was
> > > >> >> using submodules in the past and know how it works but I was
> unable
> > > to
> > > >> >> explain it clearly.
> > > >> >>
> > > >> >> I really, really like Kubernetes approach - seems that it's one
> of
> > > the
> > > >> >> cases where we can "eat cake and have it too".
> > > >> >>
> > > >> >> J.
> > > >> >>
> > > >> >>
> > > >> >> On Thu, Jul 2, 2020 at 5:59 PM Ry Walker <ry...@rywalker.com>
> wrote:
> > > >> >>
> > > >> >> > One reason to have a monorepo is for project branding, and end
> > user
> > > >> >> > experience. But for component development experience, it's nice
> > to
> > > >> >> have a
> > > >> >> > small, dedicated repo.
> > > >> >> >
> > > >> >> > I think the git submodule approach is technically sound, but is
> > at
> > > >> odds
> > > >> >> > with making the project easy to consume/understand from the end
> > > user
> > > >> >> > perspective, especially if we expand the use of subprojects.
> And
> > > >> >> the main
> > > >> >> > Airflow commit graph would appear to be slowing down which is
> bad
> > > for
> > > >> >> > Airflow brand perception.
> > > >> >> >
> > > >> >> > Kubernetes has many sub-repos that are integrated into the main
> > > >> >> repo -
> > > >> >> > which I think could be the best of both worlds:
> > > >> >> > Example:
> > > >> https://github.com/kubernetes/kubernetes/tree/master/staging
> > > >> >> >
> > > >> >> > I haven't dug in very deeply, and I won't pretend to understand
> > how
> > > >> >> > challenging it may be to maintain this structure, but I'd
> support
> > > >> >> breaking
> > > >> >> > more components out of the main Airflow repo for dev purposes
> > (for
> > > >> >> example,
> > > >> >> > in the future, it'd be nice to have airflow-cli, airflow-api,
> > > >> >> > airflow-scheduler, individual provider repos that are cleanly
> > > >> separated)
> > > >> >> as
> > > >> >> > long as we bring the commits/contributions back into the
> monorepo
> > > >> with
> > > >> >> > automation.
> > > >> >> >
> > > >> >> > Maybe we could dive a little deeper into how K8s is operating,
> > > before
> > > >> >> going
> > > >> >> > with submodules?
> > > >> >> >
> > > >> >> > -Ry
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > On Thu, Jul 2, 2020 at 11:24 AM Kaxil Naik <
> kaxilnaik@gmail.com>
> > > >> wrote:
> > > >> >> >
> > > >> >> > > Let's come to a consensus first before we do anything :-)
> > > >> >> > >
> > > >> >> > > Is everyone happy with separate repo approach? Let's wait for
> > 72
> > > >> hours
> > > >> >> to
> > > >> >> > > hear from all and then have a plan on how we do it? WDYT?
> > > >> >> > >
> > > >> >> > > But indeed git submodules approach sounds good. We do it for
> > for
> > > >> >> *Airflow
> > > >> >> > > Site *(
> > > >> >> > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
> https://github.com/apache/airflow-site/tree/master/landing-pages/site/themes
> > > >> >> > > )
> > > >> >> > > too.
> > > >> >> > >
> > > >> >> > > Regards,
> > > >> >> > > Kaxil
> > > >> >> > >
> > > >> >> > > On Thu, Jul 2, 2020 at 4:15 PM Jarek Potiuk <
> > > >> Jarek.Potiuk@polidea.com>
> > > >> >> > > wrote:
> > > >> >> > >
> > > >> >> > > > Absolutely - I am happy to add "best practices" and short
> > > >> >> "howto do
> > > >> >> > stuff
> > > >> >> > > > with git submodules" - and this knowledge will only be
> > needed
> > > >> for
> > > >> >> > > > interacting with prod image/helmchart/running kubernetes
> > tests.
> > > >> For
> > > >> >> all
> > > >> >> > > the
> > > >> >> > > > other purposes it should be "business as usual".
> > > >> >> > > >
> > > >> >> > > > On Thu, Jul 2, 2020 at 4:53 PM Daniel Imberman <
> > > >> >> > > daniel.imberman@gmail.com>
> > > >> >> > > > wrote:
> > > >> >> > > >
> > > >> >> > > > > I think git submodules sounds like a great idea. We would
> > > >> >> need to
> > > >> >> > write
> > > >> >> > > > > this into the CONTRIBUTING.md to let people know how to
> do
> > it
> > > >> but
> > > >> >> > It’s
> > > >> >> > > a
> > > >> >> > > > > “teach once” situation.
> > > >> >> > > > >
> > > >> >> > > > > via Newton Mail [
> > > >> >> > > > >
> > > >> >> > > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > >> >> > > > > ]
> > > >> >> > > > > On Thu, Jul 2, 2020 at 2:44 AM, Tomasz Urbaszek <
> > > >> >> > turbaszek@apache.org>
> > > >> >> > > > > wrote:
> > > >> >> > > > > I support the idea of separate repos. The git submodules
> > > >> mentioned
> > > >> >> by
> > > >> >> > > > > Jarek sounds like an interesting solution. It may add
> some
> > > >> >> complexity
> > > >> >> > > > > for new contributors but it's not rocket science. If we
> > agree
> > > >> on
> > > >> >> > using
> > > >> >> > > > > this we should add small how-to in contributing.rst I
> think
> > > >> (i.e.
> > > >> >> do
> > > >> >> > I
> > > >> >> > > > > have to have fork of each repo?).
> > > >> >> > > > >
> > > >> >> > > > > As stressed previously if we go this route we should make
> > > >> >> sure we
> > > >> >> > have
> > > >> >> > > > > nice testing of all those three components. Regarding the
> > > >> >> versioning,
> > > >> >> > > > > I have no strong opinion but I fully support using
> separate
> > > >> issues
> > > >> >> > for
> > > >> >> > > > > airflow, docker, and helm.
> > > >> >> > > > >
> > > >> >> > > > > Tomek
> > > >> >> > > > >
> > > >> >> > > > >
> > > >> >> > > > > On Thu, Jul 2, 2020 at 9:26 AM Jarek Potiuk <
> > > >> >> > Jarek.Potiuk@polidea.com>
> > > >> >> > > > > wrote:
> > > >> >> > > > > >
> > > >> >> > > > > > On Thu, Jul 2, 2020 at 3:16 AM Daniel Imberman <
> > > >> >> > > > > daniel.imberman@gmail.com>
> > > >> >> > > > > > wrote:
> > > >> >> > > > > >
> > > >> >> > > > > > I’m fine with keeping it as three separate repos but
> > > merging
> > > >> >> > testing
> > > >> >> > > > > > > somehow (e.g. the source code chart would pull the
> > > >> helm/docker
> > > >> >> > > chart
> > > >> >> > > > > into
> > > >> >> > > > > > > .build) but we need to do it in a way that doesn’t
> make
> > > >> testing
> > > >> >> > too
> > > >> >> > > > > > > difficult.
> > > >> >> > > > > > >
> > > >> >> > > > > > > So for example: How do I test/integration test a
> change
> > > >> that
> > > >> >> > > > involves a
> > > >> >> > > > > > > change to all three and has to be done at the same
> > time?
> > > >> >> Perhaps
> > > >> >> > a
> > > >> >> > > > > user can
> > > >> >> > > > > > > “register” a branch of helm and docker when they
> start
> > up
> > > >> >> breeze?
> > > >> >> > > Or
> > > >> >> > > > > > > perhaps we create a “parent” integration test that
> uses
> > > the
> > > >> >> three
> > > >> >> > > > > together?
> > > >> >> > > > > > >
> > > >> >> > > > > >
> > > >> >> > > > > > Yes, those are exactly my concerns when splitting the
> > > repos.
> > > >> >> > > > > >
> > > >> >> > > > > > I think testing for development should remain in the
> > > >> "airflow"
> > > >> >> > repo.
> > > >> >> > > It
> > > >> >> > > > > is
> > > >> >> > > > > > the "central one" in fact. I slept it over and I think
> > > using
> > > >> >> > > "released"
> > > >> >> > > > > > versions for development testing will suffer from this
> > "we
> > > >> >> need a
> > > >> >> > > > change
> > > >> >> > > > > in
> > > >> >> > > > > > all three of those".
> > > >> >> > > > > >
> > > >> >> > > > > > But we have an easy solution I think.
> > > >> >> > > > > >
> > > >> >> > > > > > I think that simply setting submodules properly should
> do
> > > >> >> to the
> > > >> >> > job:
> > > >> >> > > > > > https://git-scm.com/book/en/v2/Git-Tools-Submodules.
> > They
> > > >> seem
> > > >> >> to
> > > >> >> > be
> > > >> >> > > > > > perfect for our case.
> > > >> >> > > > > >
> > > >> >> > > > > > For those who have not used it - in short - submodules
> > work
> > > >> in
> > > >> >> the
> > > >> >> > > way
> > > >> >> > > > > that
> > > >> >> > > > > > they register the "linked repos" and store related
> "hash"
> > > >> >> of the
> > > >> >> > > commit
> > > >> >> > > > > > from that linked repo. For example, the "chart" folder
> > will
> > > >> >> be a
> > > >> >> > link
> > > >> >> > > > to
> > > >> >> > > > > > "apache/airflow-helm-chart". We can also move the prod
> > > >> Dockerfile
> > > >> >> > to
> > > >> >> > > a
> > > >> >> > > > > > subfolder and link it to the separate repo. Git
> submodule
> > > >> >> has a
> > > >> >> > > > > > built-in mechanism to a) update to the latest version
> of
> > > the
> > > >> >> repo,
> > > >> >> > b)
> > > >> >> > > > > > commit your changes to the linked repo from there which
> > is
> > > >> >> all we
> > > >> >> > > > need. I
> > > >> >> > > > > > used those few times - I never liked submodules for
> > sharing
> > > >> >> > "library"
> > > >> >> > > > > code,
> > > >> >> > > > > > but for sharing helm/Docker It seems perfect.
> > > >> >> > > > > >
> > > >> >> > > > > > From the "regular" developer point of view - you do not
> > > >> >> need to
> > > >> >> > > > > get/update
> > > >> >> > > > > > submodules if you do not need to use them - so for all
> > the
> > > >> >> > > development
> > > >> >> > > > > > purposes if you only change the "airflow" code, you
> would
> > > not
> > > >> >> even
> > > >> >> > > need
> > > >> >> > > > > to
> > > >> >> > > > > > sync chart or Dockerfile. You do "git checkout" as
> usual
> > > >> >> and it
> > > >> >> > > should
> > > >> >> > > > > > work. So basically - no change for "regular" airflow
> > > >> development.
> > > >> >> > > > > >
> > > >> >> > > > > > However, if you do need to work on helm + Docker +
> code,
> > > >> >> then you
> > > >> >> > > > simply
> > > >> >> > > > > to
> > > >> >> > > > > > "git submodule update", go to the linked "helm" or
> > "docker"
> > > >> >> folder,
> > > >> >> > > > > > checkout the "master" version and you start making
> > changes.
> > > >> The
> > > >> >> > only
> > > >> >> > > > > thing
> > > >> >> > > > > > to remember when you want to push your changes is to do
> > > >> >> `git push
> > > >> >> > > > > > --recurse-sumbodules="check" ` and it will make sure
> that
> > > >> >> all the
> > > >> >> > > repos
> > > >> >> > > > > are
> > > >> >> > > > > > updated, It is a bit involved, but latest git version
> > have
> > > >> >> a very
> > > >> >> > > good
> > > >> >> > > > > > support and it must only be used by people who work on
> > > >> >> airflow +
> > > >> >> > > > docker +
> > > >> >> > > > > > helm - all the others are unaffected.
> > > >> >> > > > > >
> > > >> >> > > > > > From the CI perspective also nothing changes - when we
> > > >> checkout
> > > >> >> the
> > > >> >> > > > code
> > > >> >> > > > > we
> > > >> >> > > > > > will include submodules and our test harness will be
> > > largely
> > > >> >> > > unchanged.
> > > >> >> > > > > > Submodule provides us with the right mechanism for
> cross
> > > >> >> dependency
> > > >> >> > > > even
> > > >> >> > > > > if
> > > >> >> > > > > > we use branches.
> > > >> >> > > > > >
> > > >> >> > > > > > If everyone will be ok with that - I am happy to set it
> > up,
> > > >> With
> > > >> >> > > > > submodules
> > > >> >> > > > > > - we can switch to separate repos even without
> releasing
> > > >> >> helm and
> > > >> >> > > Prod
> > > >> >> > > > > > chart "officially".
> > > >> >> > > > > >
> > > >> >> > > > > > J.
> > > >> >> > > > > >
> > > >> >> > > > > >
> > > >> >> > > > > >
> > > >> >> > > > > > >
> > > >> >> > > > > > > via Newton Mail [
> > > >> >> > > > > > >
> > > >> >> > > > >
> > > >> >> > > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > >> >> > > > > > > ]
> > > >> >> > > > > > > On Wed, Jul 1, 2020 at 3:20 PM, Jarek Potiuk <
> > > >> >> > > > Jarek.Potiuk@polidea.com
> > > >> >> > > > > >
> > > >> >> > > > > > > wrote:
> > > >> >> > > > > > > Sure. We can work with such an approach. There will
> be
> > > some
> > > >> >> > > > > dependencies
> > > >> >> > > > > > > that we might find are problematic, but If we all see
> > > >> >> that it's
> > > >> >> > > > > > > worth trying, there is a clear benefit that it makes
> > for
> > > a
> > > >> >> > "clean"
> > > >> >> > > > > > > split between those different "entities". And
> possibly
> > > >> >> once we
> > > >> >> > > > release
> > > >> >> > > > > > > first versions of both image and chart, such problems
> > > >> >> will be
> > > >> >> > rare
> > > >> >> > > > and
> > > >> >> > > > > easy
> > > >> >> > > > > > > to fix.
> > > >> >> > > > > > >
> > > >> >> > > > > > > I personally think such split is inevitable
> eventually,
> > > >> it's
> > > >> >> > just a
> > > >> >> > > > > matter
> > > >> >> > > > > > > when to do it. If we decide to make this happen soon
> -
> > I
> > > am
> > > >> >> more
> > > >> >> > > than
> > > >> >> > > > > happy
> > > >> >> > > > > > > to work on making the split reality.
> > > >> >> > > > > > >
> > > >> >> > > > > > > One prerequisite to that is that all those - Helm
> > Chart,
> > > >> Prod
> > > >> >> > Image
> > > >> >> > > > and
> > > >> >> > > > > > > Airflow are released in stable versions separately
> > > >> >> "officially" -
> > > >> >> > > > from
> > > >> >> > > > > the
> > > >> >> > > > > > > current sources (otherwise there will be no way to
> test
> > > >> >> > > cross-repo).
> > > >> >> > > > > > >
> > > >> >> > > > > > > I think for that we will need to agree on the
> > versioning
> > > >> scheme
> > > >> >> > and
> > > >> >> > > > > cadence
> > > >> >> > > > > > > for the Image and Helm Chart, then copy sources from
> > > >> airflow
> > > >> >> and
> > > >> >> > > > > release
> > > >> >> > > > > > > them as "baseline" including setup the tests for all
> of
> > > >> >> those -
> > > >> >> > > then
> > > >> >> > > > we
> > > >> >> > > > > > > can remove both Helm and Dockerfile from the airflow
> > > repo.
> > > >> >> Happy
> > > >> >> > to
> > > >> >> > > > > help
> > > >> >> > > > > > > with that if that's the direction we choose as a
> > > >> >> community. It
> > > >> >> is
> > > >> >> > > > > important
> > > >> >> > > > > > > though that we keep the cross-repo testing working.
> We
> > > >> >> have it
> > > >> >> > > > working
> > > >> >> > > > > as
> > > >> >> > > > > > > of yesterday, so now the matter is - whatever we do
> we
> > > >> >> keep it
> > > >> >> > > > running
> > > >> >> > > > > and
> > > >> >> > > > > > > have development environment support easy development
> > and
> > > >> >> testing
> > > >> >> > > of
> > > >> >> > > > > > > either of the three (including CI testing
> cross-repos)
> > ,
> > > >> That's
> > > >> >> > the
> > > >> >> > > > > only
> > > >> >> > > > > > > really important thing to me - the rest is more of
> > > >> technicality
> > > >> >> > how
> > > >> >> > > > we
> > > >> >> > > > > link
> > > >> >> > > > > > > the repos, but principle remains.
> > > >> >> > > > > > >
> > > >> >> > > > > > > Do we have an idea for the versioning scheme that we
> > > >> >> would like
> > > >> >> > to
> > > >> >> > > > use
> > > >> >> > > > > for
> > > >> >> > > > > > > the Helm Chart and prod image ?
> > > >> >> > > > > > >
> > > >> >> > > > > > > Should we make it CalVer
> > > >> >> <https://calver.org/overview.html> or
> > > >> >> > > > SemVer
> > > >> >> > > > > > > <https://semver.org/> (or some other scheme)? And
> how
> > > >> should
> > > >> >> we
> > > >> >> > > > treat
> > > >> >> > > > > the
> > > >> >> > > > > > > combinations with Airflow?
> > > >> >> > > > > > >
> > > >> >> > > > > > > My thoughts (but I have no strong opinions as long as
> > > >> someone
> > > >> >> > > > proposes
> > > >> >> > > > > more
> > > >> >> > > > > > > sensible versioning schemes):
> > > >> >> > > > > > >
> > > >> >> > > > > > > 1) Airflow code - we continue the release scheme we
> > have
> > > >> (with
> > > >> >> > > > > deciding on
> > > >> >> > > > > > > 2.* scheme for the release). I expect in the future
> we
> > > >> might
> > > >> >> > decide
> > > >> >> > > > on
> > > >> >> > > > > > > doing branches or patches so for 2.* I'd opt for
> going
> > > full
> > > >> >> > SemVer
> > > >> >> > > > > approach
> > > >> >> > > > > > > and patches released from branches.
> > > >> >> > > > > > >
> > > >> >> > > > > > > 2) I believe that Helm Chart can be versioned with
> its
> > > own
> > > >> >> > version
> > > >> >> > > > > (then
> > > >> >> > > > > > > you specify the image version as helm parameter). For
> > the
> > > >> Helm
> > > >> >> > > Chart
> > > >> >> > > > I
> > > >> >> > > > > > > think CalVer might be OK as I do not expect any
> > > >> >> branching/patches
> > > >> >> > > in
> > > >> >> > > > > the
> > > >> >> > > > > > > future - I'd expect that there will be a single
> stream
> > of
> > > >> >> > releases.
> > > >> >> > > > > > >
> > > >> >> > > > > > > 3) Dockerfile (+ related files such as .dockerignore,
> > > empty
> > > >> >> dir,
> > > >> >> > > > > > > entrypoints etc). i do not imagine a lot of branching
> > for
> > > >> >> those -
> > > >> >> > > we
> > > >> >> > > > > > > should be able to release a new version of a
> Dockerfile
> > > (+
> > > >> >> > related
> > > >> >> > > > > files)
> > > >> >> > > > > > > working with nearly any earlier Airflow release, so
> > > CalVer
> > > >> >> seems
> > > >> >> > > > like a
> > > >> >> > > > > > > good choice.
> > > >> >> > > > > > >
> > > >> >> > > > > > > 4) Image versioning becomes a bit most complex
> because
> > > the
> > > >> >> image
> > > >> >> > > tag
> > > >> >> > > > is
> > > >> >> > > > > > > always combination of:
> > > >> >> > > > > > > * Dockerfile (+ related files) version
> > > >> >> > > > > > > * Airflow Version
> > > >> >> > > > > > > * Python Version
> > > >> >> > > > > > >
> > > >> >> > > > > > > An example versioning I can imagine:
> > > >> >> > > > > > >
> > > >> >> > > > > > > *Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 -
> > patch
> > > >> level
> > > >> >> > (if
> > > >> >> > > we
> > > >> >> > > > > > > decide to have patches).
> > > >> >> > > > > > > *Dockerfile: *2020.07.12, 2020.08.20...... ->
> depending
> > > >> >> when we
> > > >> >> > > > release
> > > >> >> > > > > > > them
> > > >> >> > > > > > > *Helm Chart*: 2020.07.10, 2020.08.09 ...... Each Helm
> > > Chart
> > > >> >> has a
> > > >> >> > > > > minimum
> > > >> >> > > > > > > version of both Dockerfile and Airflow versions it
> > works
> > > >> with.
> > > >> >> > > > > > >
> > > >> >> > > > > > > *Example Docker Image tags:*
> > > >> >> > > > > > >
> > > >> apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6
> > > >> >> > > > > > >
> > > >> >> > > > > > > WDYT?
> > > >> >> > > > > > >
> > > >> >> > > > > > > J,
> > > >> >> > > > > > >
> > > >> >> > > > > > >
> > > >> >> > > > > > > On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik <
> > > >> >> kaxilnaik@gmail.com>
> > > >> >> > > > > wrote:
> > > >> >> > > > > > >
> > > >> >> > > > > > > > I think we should have "separate repos for
> > development"
> > > >> too.
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > 3 Repos in total:
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > 1) apache/airflow
> > > >> >> > > > > > > > 2) apache/airflow-docker-image
> > > >> >> > > > > > > > 3) apache/airflow-helm-chart
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > (1) *apache/airflow* should use a pinned stable
> > version
> > > >> of
> > > >> >> > > Airflow
> > > >> >> > > > > Helm
> > > >> >> > > > > > > > chart to run Kubernetes tests
> > > >> >> > > > > > > > (2) *apache/airflow* already has *Dockerfile.ci*
> file
> > > >> which
> > > >> >> it
> > > >> >> > > can
> > > >> >> > > > > use to
> > > >> >> > > > > > > > run airflow tests on docker images.
> > > >> >> > > > > > > > (3) *apache/airflow-docker-image *should use the
> > latest
> > > >> >> > available
> > > >> >> > > > > stable
> > > >> >> > > > > > > > version of airflow
> > > >> >> > > > > > > > (4) *apache/airflow-helm-chart *should use the
> latest
> > > >> >> available
> > > >> >> > > > > stable
> > > >> >> > > > > > > > version of airflow
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > Having such split also makes some updates more
> > > >> >> difficult -
> > > >> >> for
> > > >> >> > > > > example if
> > > >> >> > > > > > > > > we add new "extra" to Airflow that will require
> to
> > > >> install
> > > >> >> > > "apt"
> > > >> >> > > > > > > > dependency
> > > >> >> > > > > > > > > in Dockerfile, we will have to split it into
> first
> > > >> adding
> > > >> >> the
> > > >> >> > > > > > > dependency
> > > >> >> > > > > > > > to
> > > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add the
> > > >> >> extra to
> > > >> >> > > > airflow
> > > >> >> > > > > with
> > > >> >> > > > > > > > > setup.py.
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > Adding a new extra to setup.py would not (and
> should
> > > not)
> > > >> >> > impact
> > > >> >> > > > the
> > > >> >> > > > > > > > development of *apache/airflow-docker-image*
> > > >> >> > > > > > > > Once an RC is cut for apache/airflow or after a new
> > > >> version
> > > >> >> is
> > > >> >> > > > > released
> > > >> >> > > > > > > for
> > > >> >> > > > > > > > apache/airflow, we can work on supporting the new
> > > airflow
> > > >> >> > version
> > > >> >> > > > in
> > > >> >> > > > > the
> > > >> >> > > > > > > > Production Docker Image.
> > > >> >> > > > > > > > While doing that we can add all the libraries that
> > are
> > > >> needed
> > > >> >> > by
> > > >> >> > > > the
> > > >> >> > > > > new
> > > >> >> > > > > > > > Airflow Version and we will have a clean commit
> > history
> > > >> and
> > > >> >> > > > > changelog for
> > > >> >> > > > > > > > Docker image.
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > We definitely do not need to work parallelly on
> both
> > > the
> > > >> >> repos.
> > > >> >> > > By
> > > >> >> > > > > doing
> > > >> >> > > > > > > > development in a separate repo we keep consistent
> > > >> "source"
> > > >> >> > files
> > > >> >> > > > and
> > > >> >> > > > > we
> > > >> >> > > > > > > can
> > > >> >> > > > > > > > release each artifact with a
> > > >> >> > > > > > > > separate cadence. If someone discovers bug in newly
> > > >> released
> > > >> >> > > > > Dockerimage,
> > > >> >> > > > > > > > we should be easily able to cut out a new release
> > with
> > > >> the
> > > >> >> > patch
> > > >> >> > > > > without
> > > >> >> > > > > > > > worrying about how development is
> > > >> >> > > > > > > > going in the apache/airflow repo.
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > *Apache Flink & Apache CoucheDB *does it in the
> > similar
> > > >> >> manner:
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > https://github.com/apache/flink &
> > > >> >> > > > > https://github.com/apache/flink-docker
> > > >> >> > > > > > > > https://github.com/apache/couchdb &
> > > >> >> > > > > > > > https://github.com/apache/couchdb-docker
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > Regards,
> > > >> >> > > > > > > > Kaxil
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk <
> > > >> >> > > > > Jarek.Potiuk@polidea.com>
> > > >> >> > > > > > > > wrote:
> > > >> >> > > > > > > >
> > > >> >> > > > > > > > > I do not think it's only the question of
> Mono/Multi
> > > >> repos.
> > > >> >> > > While
> > > >> >> > > > I
> > > >> >> > > > > > > > clearly
> > > >> >> > > > > > > > > see the benefit of separate repos I also see some
> > > >> >> drawbacks.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > And if it bothers others, I am happy to follow
> the
> > > >> >> majority.
> > > >> >> > If
> > > >> >> > > > we
> > > >> >> > > > > > > think
> > > >> >> > > > > > > > > that a bit more complexity in testing justifies
> > > >> separating
> > > >> >> > > those
> > > >> >> > > > > three
> > > >> >> > > > > > > > > completely and having more "clean"- it's also
> > > >> >> workable but
> > > >> >> > IMHO
> > > >> >> > > > > > > > introduces
> > > >> >> > > > > > > > > certain complexity in development.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > However I think this is not 0/1 a kind of Hybrid
> > > >> approach
> > > >> >> in
> > > >> >> > my
> > > >> >> > > > > opinion
> > > >> >> > > > > > > > > might be best of both worlds - development and
> > > >> >> releases .
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > Let me explain what I mean by "Hybrid":
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > I think we definitely should have separate
> > > >> >> repositories to
> > > >> >> > > > release
> > > >> >> > > > > > > those
> > > >> >> > > > > > > > > artifacts and I think there is no doubt about it:
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > * airflow (apache/airflow)
> > > >> >> > > > > > > > > * prod docker image (apache/airflow-docker)
> > > >> >> > > > > > > > > * helm chart (apache/airflow-helm)
> > > >> >> > > > > > > > > * api clients (we already have separate repos for
> > > >> those)
> > > >> >> > > > > > > > > (apache/airflow-client-*)
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > I think the only question is where we develop all
> > > those
> > > >> >> > > (develop
> > > >> >> > > > !=
> > > >> >> > > > > > > > > release). There are certain benefits of having a
> > > single
> > > >> >> > > "master"
> > > >> >> > > > > (let's
> > > >> >> > > > > > > > > call it "development" further) for all those
> > > artifacts.
> > > >> >> > > Currently
> > > >> >> > > > > the
> > > >> >> > > > > > > > > "development" version for all of those is in one
> > repo
> > > >> >> - and
> > > >> >> > > while
> > > >> >> > > > > > > > > developing one depends on the other, we also test
> > all
> > > >> of
> > > >> >> > those
> > > >> >> > > > > together
> > > >> >> > > > > > > > and
> > > >> >> > > > > > > > > this means that "current best" set of airflow
> > sources
> > > >> >> > > (including
> > > >> >> > > > > > > > > dependencies in setup.py), Dockerfile and Helm
> > chart
> > > >> work.
> > > >> >> > This
> > > >> >> > > > > means
> > > >> >> > > > > > > for
> > > >> >> > > > > > > > > example that you will not be able to break the
> Helm
> > > >> Chart
> > > >> >> by
> > > >> >> > > > > changing
> > > >> >> > > > > > > > > anything that the helm chart depends on in
> airflow.
> > > For
> > > >> >> > example
> > > >> >> > > > if
> > > >> >> > > > > you
> > > >> >> > > > > > > > > change "airflow webserver" into "airflow server"
> > the
> > > >> >> current
> > > >> >> > > helm
> > > >> >> > > > > chart
> > > >> >> > > > > > > > > will break. Similarly if you change entrypoint,sh
> > in
> > > >> Docker
> > > >> >> > > image
> > > >> >> > > > > in a
> > > >> >> > > > > > > > way
> > > >> >> > > > > > > > > that is not compatible with Helm chart, we will
> not
> > > let
> > > >> >> that
> > > >> >> > > > > happen -
> > > >> >> > > > > > > the
> > > >> >> > > > > > > > > CI tests will break if either of those changes in
> > an
> > > >> >> > > incompatible
> > > >> >> > > > > way.
> > > >> >> > > > > > > > And
> > > >> >> > > > > > > > > we can have dependencies in any direction between
> > > those
> > > >> >> > three.
> > > >> >> > > > > When we
> > > >> >> > > > > > > > see
> > > >> >> > > > > > > > > a commit break either of the three - we can make
> a
> > > >> decision
> > > >> >> > > about
> > > >> >> > > > > what
> > > >> >> > > > > > > to
> > > >> >> > > > > > > > > do - either accept and document the
> incompatibility
> > > >> >> or fix
> > > >> >> > it.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > Of course keeping that property (testing it all
> > > >> together)
> > > >> >> is
> > > >> >> > > also
> > > >> >> > > > > > > > possible
> > > >> >> > > > > > > > > if they are in completely separate repos. There
> are
> > > >> several
> > > >> >> > > > > > > > > cross-dependencies - Docker image building
> depends
> > on
> > > >> >> > > > dependencies
> > > >> >> > > > > in
> > > >> >> > > > > > > > > setup.py for example, you cannot build Docker
> image
> > > >> from
> > > >> >> only
> > > >> >> > > > > > > Dockerfile
> > > >> >> > > > > > > > > without the sources of airflow nor build and test
> > > helm
> > > >> >> charts
> > > >> >> > > > > without
> > > >> >> > > > > > > the
> > > >> >> > > > > > > > > image (and sources - because that's where the
> > current
> > > >> >> > > kubernetes
> > > >> >> > > > > tests
> > > >> >> > > > > > > > > are). If we want to continue doing it for both
> Helm
> > > and
> > > >> >> > > > > Dockerfile, we
> > > >> >> > > > > > > > > would have to basically check out the latest
> > sources
> > > of
> > > >> >> > Airflow
> > > >> >> > > > > and run
> > > >> >> > > > > > > > the
> > > >> >> > > > > > > > > CI tests before merging any Docker or Helm Chart
> > > >> changes
> > > >> >> and
> > > >> >> > > the
> > > >> >> > > > > > > > opposite -
> > > >> >> > > > > > > > > we will have to download Dockerfile/Helm chart
> and
> > > >> build
> > > >> >> > > > > image/install
> > > >> >> > > > > > > > Helm
> > > >> >> > > > > > > > > chart when we are running CI tests for Airflow.
> > This
> > > is
> > > >> >> > > possible
> > > >> >> > > > > and we
> > > >> >> > > > > > > > > could do it, but it adds complexity to the
> build/CI
> > > >> >> process.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > Having such split also makes some updates more
> > > >> >> difficult -
> > > >> >> > for
> > > >> >> > > > > example
> > > >> >> > > > > > > if
> > > >> >> > > > > > > > > we add new "extra" to Airflow that will require
> to
> > > >> install
> > > >> >> > > "apt"
> > > >> >> > > > > > > > dependency
> > > >> >> > > > > > > > > in Dockerfile, we will have to split it into
> first
> > > >> adding
> > > >> >> the
> > > >> >> > > > > > > dependency
> > > >> >> > > > > > > > to
> > > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add the
> > > >> >> extra to
> > > >> >> > > > airflow
> > > >> >> > > > > with
> > > >> >> > > > > > > > > setup.py. This makes it quite difficult to test
> it
> > > >> together
> > > >> >> > > > though
> > > >> >> > > > > (the
> > > >> >> > > > > > > > > Dockerfile change can only be tested fully after
> > > >> >> merging it
> > > >> >> > to
> > > >> >> > > > > master).
> > > >> >> > > > > > > > Not
> > > >> >> > > > > > > > > mentioning complexity of managing different
> > versions
> > > >> >> - your
> > > >> >> > > local
> > > >> >> > > > > > > > > development Dockerfile version vs sources of
> > Airflow
> > > >> for
> > > >> >> > > example.
> > > >> >> > > > > > > Imagine
> > > >> >> > > > > > > > > switching between branches where you add two
> > > >> >> different apt
> > > >> >> > > > > dependencies
> > > >> >> > > > > > > > to
> > > >> >> > > > > > > > > the Dockerfile. There are more similar scenarios
> I
> > > can
> > > >> >> > imagine
> > > >> >> > > -
> > > >> >> > > > > > > > especially
> > > >> >> > > > > > > > > for parallel changes in those repos.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > This is of course doable to keep them separate,
> but
> > > >> >> it is
> > > >> >> > > quite a
> > > >> >> > > > > bit
> > > >> >> > > > > > > > more
> > > >> >> > > > > > > > > complex to set up (especially for a consistent
> > > >> development
> > > >> >> > > > > environment)
> > > >> >> > > > > > > > > when you have separate repos and prevent
> > > cross-breaking
> > > >> >> > changes
> > > >> >> > > > > might
> > > >> >> > > > > > > be
> > > >> >> > > > > > > > > more difficult.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > I believe that the best way is to continue
> > developing
> > > >> >> > airflow +
> > > >> >> > > > > image +
> > > >> >> > > > > > > > > chart in one repo - airflow, but release them
> from
> > > >> those
> > > >> >> > > separate
> > > >> >> > > > > > > repos.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > Airflow source release does not have to contain
> > > neither
> > > >> >> > chart,
> > > >> >> > > > nor
> > > >> >> > > > > > > image.
> > > >> >> > > > > > > > > And even if it contains sources for those, they
> are
> > > >> >> not the
> > > >> >> > > final
> > > >> >> > > > > > > > > "artifacts" (installable image and installable
> helm
> > > >> chart).
> > > >> >> > > > > > > > > Whenever we decide to release either of them - we
> > > >> >> test it
> > > >> >> in
> > > >> >> > > > > > > > "development".
> > > >> >> > > > > > > > > Then only when it is tested, we copy the sources
> to
> > > >> those
> > > >> >> > > > separate
> > > >> >> > > > > > > repos
> > > >> >> > > > > > > > > and release them.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > With git - we can even do it very easily while
> > > >> preserving
> > > >> >> > > history
> > > >> >> > > > > of
> > > >> >> > > > > > > > > commits easily (been there, done that). And then
> we
> > > >> could
> > > >> >> > > release
> > > >> >> > > > > Helm
> > > >> >> > > > > > > > and
> > > >> >> > > > > > > > > Docker image separately based on the commits and
> > tags
> > > >> in
> > > >> >> > those
> > > >> >> > > > > separate
> > > >> >> > > > > > > > > repositories.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > I agree that separate repos is a more "clean"
> > > approach.
> > > >> >> But I
> > > >> >> > > > > think it
> > > >> >> > > > > > > is
> > > >> >> > > > > > > > > less convenient for development consistency.
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > J,
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <
> > > >> >> > kaxilnaik@gmail.com
> > > >> >> > > >
> > > >> >> > > > > wrote:
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > > Forgot to mention, having them in separate repo
> > > also
> > > >> >> helps
> > > >> >> > in
> > > >> >> > > > > better
> > > >> >> > > > > > > > > > managing each individual artifacts.
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > > > Each repo would have a separate Github Issue
> > where
> > > >> >> we can
> > > >> >> > > track
> > > >> >> > > > > the
> > > >> >> > > > > > > > issue
> > > >> >> > > > > > > > > > specific to Helm chart or Dockerfile.
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > > > Regards,
> > > >> >> > > > > > > > > > Kaxil
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <
> > > >> >> > > kaxilnaik@gmail.com
> > > >> >> > > > >
> > > >> >> > > > > > > wrote:
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > > > > The PMC also needs to agree if we want
> separate
> > > >> VOTING
> > > >> >> > for
> > > >> >> > > > > Docker
> > > >> >> > > > > > > > Image
> > > >> >> > > > > > > > > > > and Helm chart, I think we do.
> > > >> >> > > > > > > > > > >
> > > >> >> > > > > > > > > > > Regards,
> > > >> >> > > > > > > > > > > Kaxil
> > > >> >> > > > > > > > > > >
> > > >> >> > > > > > > > > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik <
> > > >> >> > > > kaxilnaik@gmail.com
> > > >> >> > > > > >
> > > >> >> > > > > > > > wrote:
> > > >> >> > > > > > > > > > >
> > > >> >> > > > > > > > > > >> Hi all,
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >> What do you all think about having
> Dockerfile
> > > >> >> and Helm
> > > >> >> > > chart
> > > >> >> > > > > in
> > > >> >> > > > > > > the
> > > >> >> > > > > > > > > same
> > > >> >> > > > > > > > > > >> "Airflow" Repo vs separate?
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >> I feel having a separate repo for Airflow
> > > >> Dockerfile
> > > >> >> and
> > > >> >> > > > Helm
> > > >> >> > > > > > > chart
> > > >> >> > > > > > > > > have
> > > >> >> > > > > > > > > > >> more benefits like easy to track changes
> (via
> > > >> >> > Changelog),
> > > >> >> > > > > easy for
> > > >> >> > > > > > > > new
> > > >> >> > > > > > > > > > >> contributors, separate release cadence.
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >> Currently, docker file and Helm Chart are
> > inside
> > > >> the
> > > >> >> > same
> > > >> >> > > > > repo and
> > > >> >> > > > > > > > > when
> > > >> >> > > > > > > > > > >> we release changelog for a new Airflow
> > version,
> > > it
> > > >> >> would
> > > >> >> > > > > include
> > > >> >> > > > > > > all
> > > >> >> > > > > > > > > > >> changes (Airflow + Dockerfile + Helm chart)
> > > >> >> which I
> > > >> >> > think
> > > >> >> > > is
> > > >> >> > > > > not
> > > >> >> > > > > > > > that
> > > >> >> > > > > > > > > > great.
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >> Also having them all inside a single repo
> > means
> > > >> >> changes
> > > >> >> > in
> > > >> >> > > > > Helm
> > > >> >> > > > > > > > Chart
> > > >> >> > > > > > > > > > and
> > > >> >> > > > > > > > > > >> Dockerfile can block Airflow release. We
> could
> > > use
> > > >> >> > stable
> > > >> >> > > > Helm
> > > >> >> > > > > > > Chart
> > > >> >> > > > > > > > > > >> version and Dockerfile version to test
> Airflow
> > > >> >> so that
> > > >> >> > > they
> > > >> >> > > > > are
> > > >> >> > > > > > > > > > blockers to
> > > >> >> > > > > > > > > > >> release too.
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >> Happy to hear the thoughts from the
> community.
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >> Regards,
> > > >> >> > > > > > > > > > >> Kaxil
> > > >> >> > > > > > > > > > >>
> > > >> >> > > > > > > > > > >
> > > >> >> > > > > > > > > >
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > --
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > Jarek Potiuk
> > > >> >> > > > > > > > > Polidea <https://www.polidea.com/> | Principal
> > > >> Software
> > > >> >> > > Engineer
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > > > M: +48 660 796 129 <+48660796129>
> > > >> >> > > > > > > > > [image: Polidea] <https://www.polidea.com/>
> > > >> >> > > > > > > > >
> > > >> >> > > > > > > >
> > > >> >> > > > > > >
> > > >> >> > > > > > >
> > > >> >> > > > > > > --
> > > >> >> > > > > > >
> > > >> >> > > > > > > Jarek Potiuk
> > > >> >> > > > > > > Polidea <https://www.polidea.com/> | Principal
> > Software
> > > >> >> Engineer
> > > >> >> > > > > > >
> > > >> >> > > > > > > M: +48 660 796 129 <+48660796129>
> > > >> >> > > > > > > [image: Polidea] <https://www.polidea.com/>
> > > >> >> > > > > >
> > > >> >> > > > > >
> > > >> >> > > > > >
> > > >> >> > > > > > --
> > > >> >> > > > > >
> > > >> >> > > > > > Jarek Potiuk
> > > >> >> > > > > > Polidea <https://www.polidea.com/> | Principal
> Software
> > > >> Engineer
> > > >> >> > > > > >
> > > >> >> > > > > > M: +48 660 796 129 <+48660796129>
> > > >> >> > > > > > [image: Polidea] <https://www.polidea.com/>
> > > >> >> > > >
> > > >> >> > > >
> > > >> >> > > >
> > > >> >> > > > --
> > > >> >> > > >
> > > >> >> > > > Jarek Potiuk
> > > >> >> > > > Polidea <https://www.polidea.com/> | Principal Software
> > > Engineer
> > > >> >> > > >
> > > >> >> > > > M: +48 660 796 129 <+48660796129>
> > > >> >> > > > [image: Polidea] <https://www.polidea.com/>
> > > >> >> > > >
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >> >>
> > > >> >> --
> > > >> >>
> > > >> >> Jarek Potiuk
> > > >> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >> >>
> > > >> >> M: +48 660 796 129 <+48660796129>
> > > >> >> [image: Polidea] <https://www.polidea.com/>
> > > >> >>
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> > > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Separate Repo vs MonoRepo for Dockerfile & Helm Chart

Posted by Ash Berlin-Taylor <as...@apache.org>.
Will this just have the prod dockerfile? That's what it looks like from your example repo. (I like that, just want to make sure)

+1
One point: your "main" branch has the merge scripts etc -- but we should leave main free for if we/when rename master on main on Airflow.
-ash
On Nov 11 2020, at 6:23 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
> Calling for Lazy consensus here. Unless someone objects in 72 hours (roughly end of this weekend) I will create an "airflow-docker" repo.
>
> For now, I want to focus only on building the docker image. Any other stuff (docker-compose, helm chart) might be a separate discussion after that.
>
> J.
> On Mon, Oct 26, 2020 at 6:26 AM Daniel Imberman <daniel.imberman@gmail.com (mailto:daniel.imberman@gmail.com)> wrote:
> > I am all for this. This is how kubernetes does it and it has worked out really well for them.
> >
> > On Sun, Oct 25, 2020 at 10:23 PM, Jarek Potiuk <Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)> wrote:
> > > Yep, that would be nice. Agree that this is not obvious where some files come from.
> > >
> > > Agree this could be done if everyone thinks it's a good idea. This would be perfectly doable, we could even make it works with the whole history maintained (we'd just need to include historical paths in the script).
> > >
> > > And if we make it in time before 1.10.13, we could even release it within 1.10.13.
> > >
> > > J
> > >
> > >
> > > On Sun, Oct 25, 2020 at 10:03 PM Kamil Breguła <kamil.bregula@polidea.com (mailto:kamil.bregula@polidea.com)> wrote:
> > > > I took a quick look and I like the overall concept, but I'm just wondering if it will be clear enough for users. Currently, these scripts copy different files from different directories and the mapping of the source to the destination is written in the scripts. This will make it difficult to contribute to this "sub-project". In my opinion, if we want to create new repositories from some files, we should only do it for one directory. If this directory has dependencies, we should try to break them down. The end-user should not get the impression that they are in contact with the copied repository at the first glance. Otherwise, we will not achieve our primary goal - to facilitate end-user use.
> > > >
> > > >
> > > > In this case, it means that we should create a new directory in apache/airflow named "prod-docker-image" or similar and move to it the necessary Dockerfiles, documentation, scripts, and all other assets. In particular, this directory should contain README.md which actually describes the contents of that directory.
> > > > A good example is /chart directory. It only has one dependency which is not is "/chart" directory - the "Contributing" section in README.md refers to the file in the root directory of the repository. This link will stop working if we create a new repository from the entire directory. It will be trivial to fix.
> > > > On Sun, Oct 25, 2020 at 9:18 PM Jarek Potiuk <Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)> wrote:
> > > > > Hello Everyone,
> > > > >
> > > > > I would like to come back to the discussion as I have *JUST* implemented the solution (very simple but 100% working) to this monorepo vs. separate repos.
> > > > >
> > > > > You can take a look at this repo of mine: https://github.com/potiuk/airflow-docker. It is very simple and works like a charm. I implemented it to solve the issue https://github.com/apache/airflow/issues/11740
> > > > >
> > > > > This is a separate repo that people can use to have a separate "read-only" repository that **only** keeps our Dockerfile-related stuff - including the full history of changes related (and only those), full traceability, and incremental, automated synchronization from our "airflow" repo.
> > > > >
> > > > > I can - any time - set it up as "apache/airflow-docker" and get it to synchronize every day or every hour.
> > > > >
> > > > > Here, how it works:
> > > > >
> > > > > * The "master" and "v1-10-stable" branches are filtered to only contain files that are needed to build Prod Docker image
> > > > > * We keep history of all relevant commits in those branches
> > > > > * In the "main" branch we only keep the "scheduled" Github Actions workflow that does the synchronization and README.md which explains what needs to be done to build the docker image
> > > > > * I am using the excellent "git-filter-repo" tool which does the job really well and fast. Git-filter-repo is recommended by Git maintainers over the old, slow and much worse built-in git-filter-branch: https://git-scm.com/docs/git-filter-branch#_warning
> > > > > * the jobs to synchronize the repo takes 1m30 s to run - it is rather fast despite analyzing 13500 commits :)
> > > > > * it runs incrementally - just adding new commits when they appear
> > > > > * it is very simple, few lines script + few steps in Github Action to checkout/push the right branches
> > > > > * we keep all the commit mapping in the repo as well, so we have 1-1 relationship between the commits in the "docker repo" and the original ones in Airflow repo
> > > > > * synchronization is 1-way - airflow -> airlfow-docker
> > > > > * we can use a very similar approach for synchronizing:
> > > > > * Helm chart
> > > > > * Open API clients
> > > > > * other stuff
> > > > >
> > > > > It also follows our source release strategy - it has the same "properties" as our main repo - so it is merely a "convenience" way of accessing the Docker customization options, but the same functionality is available in our officially released sources.
> > > > >
> > > > > Do you think we should turn it into the "apache/airflow-docker" repo?
> > > > >
> > > > > J.
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jul 5, 2020 at 8:12 PM Daniel Imberman <daniel.imberman@gmail.com (mailto:daniel.imberman@gmail.com)> wrote:
> > > > > > Worth noting that git has the ability to cherry-pick only specific directories. If we keep all of helm + tests in one directory, docker + tests in another, and core + tests in a third directory it would be pretty simple to automate splitting them.
> > > > > >
> > > > > > https://stackoverflow.com/questions/19821749/git-cherry-pick-or-merge-specific-directory-from-another-branch
> > > > > > via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2]
> > > > > > On Sun, Jul 5, 2020 at 9:57 AM, Daniel Imberman <daniel.imberman@gmail.com (mailto:daniel.imberman@gmail.com)> wrote:
> > > > > > I can’t agree with this enough :). I think writing a few bots to separate out sections will be MUCH easier in the long run than maintaining multiple repos. Will also prevent the difficulty of setting up a proper dev environment for new contributors.
> > > > > > via Newton Mail [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2]
> > > > > > On Sun, Jul 5, 2020 at 9:53 AM, Jarek Potiuk <Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)> wrote:
> > > > > > Yeah. I think that the "monorepo" is the only way for now - until (or if)
> > > > > > we reach the size (and maturity) that different teams take care of the
> > > > > > different projects. Which might even not happen.
> > > > > >
> > > > > > But I would love to try the separate repos to publish/release still (maybe
> > > > > > not immediately, but it is a nice concept). I think it should be rather
> > > > > > easy (I will try it on my own repo first). Also, I think it has another
> > > > > > advantage - those separate repos might actually run other kinds of tests -
> > > > > > for example, to test if there is "everything" in that repo to release it
> > > > > > (for example build helm chart) and whether there are no accidental use of
> > > > > > stuff from outside of those dirs.
> > > > > >
> > > > > > I already thought about how to do it - it should be rather easy. Of course
> > > > > > - like most of the time - there is a ready-to-use git command doing it for
> > > > > > us. We simply need a bot running for that rep executing a variant of this
> > > > > > command:
> > > > > > https://docs.github.com/en/github/using-git/splitting-a-subfolder-out-into-a-new-repository
> > > > > > (it
> > > > > > should only take commits from the commit merged last time). So level of
> > > > > > automation here is rather minimal.
> > > > > >
> > > > > > And if have those repos and at some point of time we decide to split
> > > > > > eventually - we will have already repos with all history as a starting
> > > > > > point.
> > > > > >
> > > > > > J.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > J.
> > > > > >
> > > > > > On Sun, Jul 5, 2020 at 4:42 PM Kaxil Naik <kaxilnaik@gmail.com (mailto:kaxilnaik@gmail.com)> wrote:
> > > > > > > Hmm.. I agree the git-sync would have been a difficult one to solve if we
> > > > > > > had separate repositories.
> > > > > > >
> > > > > > > Well, in that case, the mono repo approach (like we have now) indeed makes
> > > > > > > more sense.
> > > > > > >
> > > > > > > Regarding the Kubernetes approach, I feel the ones in staging (
> > > > > > > https://github.com/kubernetes/kubernetes/tree/master/staging) are part of
> > > > > > > the actual product itself but in our case we were discussing between Helm
> > > > > > > chart and Dockerfile which are not actually part of the product. And we
> > > > > > > will need a good deal of automation if we go down that route.
> > > > > > > I think the plain mono-repo approach is better than that one.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Kaxil
> > > > > > >
> > > > > > >
> > > > > > > On Sun, Jul 5, 2020 at 9:19 AM Jarek Potiuk <Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > And one more perfect illustration of what I am talking about.
> > > > > > > >
> > > > > > > > A very good thing just happened. I was running the PR while writing the
> > > > > > > > email (long time as you might imagine) and the new K8S tests with 1.10.11
> > > > > > > > just failed. https://github.com/apache/airflow/pull/9663
> > > > > > > >
> > > > > > > > If had released the helm chart before we would've clear (small)
> > > > > > > > incompatibility here. And by seeing the test failing we could make
> > > > > > > decision
> > > > > > > > what to do:
> > > > > > > >
> > > > > > > > 1) fix it differently
> > > > > > > > 2) document it as a breaking Helm change, "1.10.12+ image" and make test
> > > > > > > > work in both cases
> > > > > > > > 3) revert ...
> > > > > > > >
> > > > > > > > But at least we have na early warning that something is wrong. This is
> > > > > > > the
> > > > > > > > clear value of running the tests at every commit.
> > > > > > > >
> > > > > > > > J.
> > > > > > > >
> > > > > > > > On Sun, Jul 5, 2020 at 10:08 AM Jarek Potiuk <Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I just have another example of a case where splitting the repos and
> > > > > > > using
> > > > > > > > > only "released versions" across repositories might be a complete
> > > > > > > overkill
> > > > > > > > > when it comes to development complexity.
> > > > > > > > >
> > > > > > > > > We have this change from Aneesh:
> > > > > > > > > https://github.com/apache/airflow/pull/9371 about adding a git-sync
> > > > > > > > > option to the helm chart.
> > > > > > > > >
> > > > > > > > > That's a new feature, but we would like to test both 1.10 and the
> > > > > > > master
> > > > > > > > > version of KubernetesExecutor with that. It should work for both of
> > > > > > > them
> > > > > > > > -
> > > > > > > > > there is no coupling/dependency in the "airflow' code for it.
> > > > > > > > >
> > > > > > > > > However, there is a strong coupling in the tests. We have the
> > > > > > > > > "kubernetes_tests" running tests using all three: chart, production
> > > > > > > > docker,
> > > > > > > > > and Airflow, Those tests will have to be likely adapted to work with
> > > > > > > the
> > > > > > > > > new git-sync option. They were disabled previously as we had problems
> > > > > > > > with
> > > > > > > > > them before the helm chart was used for tests but we can turn them back
> > > > > > > > on
> > > > > > > > > now when git-sync is added to the helm chart. Those tests are part of
> > > > > > > > > airflow test suite and we discussed with Daniel that they should stay
> > > > > > > > there
> > > > > > > > > - those tests are importing airflow code, they are using latest example
> > > > > > > > > dags which are also in the airflow code.
> > > > > > > > >
> > > > > > > > > So we have two ways how we can develop this -
> > > > > > > > > A) monorepo (current)
> > > > > > > > > B) separate repos.
> > > > > > > > >
> > > > > > > > > Just to remind - he goal is that our change is tested against:
> > > > > > > > >
> > > > > > > > > 1) Released Airflow version (say 1.10.11).
> > > > > > > > > 2) Development airflow version (master - soon possibly development)
> > > > > > > > > 3) Development docker image built with either "development" or
> > > > > > > "1.10.11"
> > > > > > > > > (we can release the Docker image for 1.10.11 independently from the
> > > > > > > > current
> > > > > > > > > development HEAD). The docker image is supposed to work with any
> > > > > > > version
> > > > > > > > of
> > > > > > > > > airflow
> > > > > > > > >
> > > > > > > > > In the case of A) Monorepo we have all that as a given.
> > > > > > > > >
> > > > > > > > > I just sent this really small PR that should do the job:
> > > > > > > > > https://github.com/apache/airflow/pull/9663. What it does, it takes
> > > > > > > the
> > > > > > > > > latest "development" docker image, "development" chart, bakes in the
> > > > > > > > latest
> > > > > > > > > "example dags" from "development branch". The image uses either
> > > > > > > > > "development" or released (from PyPI) "1.10.11" Airflow version - and
> > > > > > > run
> > > > > > > > > the "development" tests against it. This is exactly what we want. If we
> > > > > > > > add
> > > > > > > > > new features to the helm chart, the Kubernetes tests will have to be
> > > > > > > > > updated to include that - and this will happen in the airflow
> > > > > > > > "development"
> > > > > > > > > branch. The REALLY good thing in it - since we are running those tests
> > > > > > > in
> > > > > > > > > CI build of airflow development branch - we prevent anyone from making
> > > > > > > > > breaking changes. It is a given that both - the "development" of
> > > > > > > airflow
> > > > > > > > > and the "1.10.11" version of airflow will continue to work with the
> > > > > > > image
> > > > > > > > > and chart.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > In the case of B) where we split the repos:
> > > > > > > > >
> > > > > > > > > We have to decide where to keep the "kubernetes_tests" - should they be
> > > > > > > > in
> > > > > > > > > "Airflow" or in "Helm". They are testing BOTH so we can choose either
> > > > > > > > way.
> > > > > > > > > Together with Daniel we plan to expand those tests to cover all the
> > > > > > > > > different options we have in the Chart - testing all of it - Kubernetes
> > > > > > > > > Executor, Celery Executor running on Kubernetes, MySQL (once we add
> > > > > > > it),
> > > > > > > > > etc. etc. So we want to make sure we have a matrix of tests covering a
> > > > > > > > > number of deployment options. Those tests do not exist yet, and they
> > > > > > > will
> > > > > > > > > have to be written. In principle - they can be moved to the "Helm"
> > > > > > > > > repository. That's where they conceptually belong. However - there is a
> > > > > > > > > Huge value in running the tests in airflow "development" - the value is
> > > > > > > > > that no-one will be able to break the "development" airflow, because
> > > > > > > > those
> > > > > > > > > tests are run with every PR. I think we have no choice but to run those
> > > > > > > > > tests always in development. Otherwise, people maintaining the helm
> > > > > > > chart
> > > > > > > > > will have to fix the problems introduced by people changing Airflow
> > > > > > > > code. I
> > > > > > > > > think this is a pretty bad idea to allow that. So if we move those
> > > > > > > tests
> > > > > > > > to
> > > > > > > > > Helm Chart repo we have to figure out how to run those "kubernetes"
> > > > > > > tests
> > > > > > > > > in CI for every build. This is quite possible - by getting the latest
> > > > > > > > > master from helm chart and running the build, but it has several
> > > > > > > > problems:
> > > > > > > > >
> > > > > > > > > 1) The test code for CI will have to continue to stay in Airflow (to
> > > > > > > run
> > > > > > > > > CI builds) - this means that we already have coupling and some code
> > > > > > > > related
> > > > > > > > > to the execution of the helm tests has to be any way in Airflow.
> > > > > > > > >
> > > > > > > > > 2) Bigger problem. What happens if as "Airflow developer" you DO
> > > > > > > > introduce
> > > > > > > > > a change that breaks the helm chart? You will see a CI error and.....
> > > > > > > You
> > > > > > > > > will not know what to do. Do you involve people who maintain the helm
> > > > > > > > chart
> > > > > > > > > and wait for them? I think not. You should be able to reproduce the
> > > > > > > > problem
> > > > > > > > > locally and fix it yourself (maybe with the help of others - but you
> > > > > > > > should
> > > > > > > > > be able to fix your own commit). We would have to teach people how to
> > > > > > > > bring
> > > > > > > > > the docker image and helm chart code from the latest version and run
> > > > > > > the
> > > > > > > > > tests. We could do it automatically with Breeze (similarly as we do
> > > > > > > with
> > > > > > > > > other integrations - where we bring in Kerberos, Mongo, and a multitude
> > > > > > > > of
> > > > > > > > > others) without them even knowing it, but this might be fairly complex
> > > > > > > > and
> > > > > > > > > prone to errors. In Monorepo - we already have a simple way of
> > > > > > > > reproducing
> > > > > > > > > and running the tests locally and everything is in one place.
> > > > > > > > >
> > > > > > > > > 3) There is a chance that someone makes a change in Helm in parallel
> > > > > > > to a
> > > > > > > > > change in Airflow that breaks it. This could easily happen in the
> > > > > > > > "git-sync
> > > > > > > > > case" or when we add "MySQL" for example in the future. And there is no
> > > > > > > > way
> > > > > > > > > to prevent it.
> > > > > > > > >
> > > > > > > > > 4) If we only test against "released" Helm and Airflow (that was one of
> > > > > > > > > the suggestions), the problem is even bigger. How do you know that you
> > > > > > > do
> > > > > > > > > not break the currently "developed" helm chart? Or how do you know that
> > > > > > > > the
> > > > > > > > > currently "developed" helm chart works with latest Airflow release? If
> > > > > > > > you
> > > > > > > > > do not do those checks at the "commit" time, then you defer this to
> > > > > > > > > "release time" and only then you might find out that decisions you made
> > > > > > > > > during development have to be reverted. This is a very, very bad idea
> > > > > > > > IMHO
> > > > > > > > > again leading to the case that the release manager will have to fix
> > > > > > > > > problems introduced by others.
> > > > > > > > >
> > > > > > > > > J,
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, Jul 3, 2020 at 10:28 PM Ash Berlin-Taylor <ash@apache.org (mailto:ash@apache.org)>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> Monorepo FTW.
> > > > > > > > >>
> > > > > > > > >> Yes, it gets a little bit messier around release, but the approach of
> > > > > > > > >> automatically extracting out the commits (or parts of commits) to a
> > > > > > > > >> separate repo for releasing may be the solution to that problem
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> -ash
> > > > > > > > >>
> > > > > > > > >> On Jul 3 2020, at 7:51 pm, Kaxil Naik <kaxilnaik@gmail.com (mailto:kaxilnaik@gmail.com)> wrote:
> > > > > > > > >>
> > > > > > > > >> > I will take a look at the Kubernetes approach and get back to this
> > > > > > > > >> thread.
> > > > > > > > >> >
> > > > > > > > >> > We had a discussion with Daniel yesterday and we are both concerned
> > > > > > > > >> about
> > > > > > > > >> >> all the overhead for people like us who work on all three
> > > > > > > "entities"
> > > > > > > > >> >> at the
> > > > > > > > >> >> same time. Even just explaining how to work with Pull Requests and
> > > > > > > in
> > > > > > > > >> what
> > > > > > > > >> >> sequence those PRs would have to be opened and merged in case of
> > > > > > > > >> changes
> > > > > > > > >> >> that are spanning across several "entities" - was a challenge. I
> > > > > > > was
> > > > > > > > >> unable
> > > > > > > > >> >> to clearly explain the sequence and way of reviewing/merging the
> > > > > > > PRs
> > > > > > > > >> that
> > > > > > > > >> >> will have to be made if we have submodules. This is a bad sign as I
> > > > > > > > was
> > > > > > > > >> >> using submodules in the past and know how it works but I was unable
> > > > > > > > to
> > > > > > > > >> >> explain it clearly.
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > We don't even need submodules tbh. We can just use Bash Script that
> > > > > > > > >> > pulls a
> > > > > > > > >> > pinned Helm Chart version.
> > > > > > > > >> > We only need Helm chart to run integration test for k8s (atleast for
> > > > > > > > >> now).
> > > > > > > > >> > We already use tons of Bash scripts.
> > > > > > > > >> >
> > > > > > > > >> > One of the important benefits of separation that changes in one
> > > > > > > > >> component
> > > > > > > > >> > should not need change in other component, atleast
> > > > > > > > >> > not immediately.
> > > > > > > > >> >
> > > > > > > > >> > Changes in Helm chart and Docker file should never need changes in
> > > > > > > > >> Airflow
> > > > > > > > >> > Changes in Airflow should only ever need a change in Dockerfile and
> > > > > > > > Helm
> > > > > > > > >> > Chart after a new version is released.
> > > > > > > > >> >
> > > > > > > > >> > I just had a talk with Daniel too and still didn't find a good
> > > > > > > enough
> > > > > > > > >> > reason to have them in the same repo.
> > > > > > > > >> >
> > > > > > > > >> > I will definitely look at the Kubernetes approach (maybe it is
> > > > > > > better)
> > > > > > > > >> and
> > > > > > > > >> > get back to this thread. But as of now I don't see any major PROs
> > > > > > > > >> > for having them in the same repo.
> > > > > > > > >> >
> > > > > > > > >> > Regards,
> > > > > > > > >> > Kaxil
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > On Fri, Jul 3, 2020 at 5:00 PM Jarek Potiuk <
> > > > > > > Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)
> > > > > > > > >
> > > > > > > > >> > wrote:
> > > > > > > > >> >
> > > > > > > > >> >> I think Ry's point is an important one - I thought about writing a
> > > > > > > > >> longer
> > > > > > > > >> >> post but I looked at the Kubernetes structure and I really like it
> > > > > > > so
> > > > > > > > >> just
> > > > > > > > >> >> wanted to comment on this last one.
> > > > > > > > >> >>
> > > > > > > > >> >> Seems that it is simply one "authoritative" (or source of truth)
> > > > > > > repo
> > > > > > > > >> where
> > > > > > > > >> >> everything is developed in monorepo fashion but then there is a bot
> > > > > > > > >> >> that moves every commit related to subdirectories to those
> > > > > > > > "split-out"
> > > > > > > > >> >> repos. There are never direct commits of people or PRs in the
> > > > > > > > >> "split-out"
> > > > > > > > >> >> repositories. This is very similar to my original proposal to have
> > > > > > > > >> >> dedicated repos used for releases - but with an automated way of
> > > > > > > > >> publishing
> > > > > > > > >> >> the commits to the "separated" repos at the moment, they are merged
> > > > > > > > to
> > > > > > > > >> >> master in the main repo. I love it.
> > > > > > > > >> >>
> > > > > > > > >> >> I think it's really good and "pragmatic" solution. The code is
> > > > > > > > >> >> available in
> > > > > > > > >> >> separate repos, including the history of commits related to each
> > > > > > > > >> "entity"
> > > > > > > > >> >> (so only chart-related commits in chart repo). Issues for
> > > > > > > particular
> > > > > > > > >> >> "entities" are in those separate repos as well (something that
> > > > > > > Kaxil
> > > > > > > > >> >> mentioned). Users (not developers!) who are interested only in
> > > > > > > > >> Dockerfile
> > > > > > > > >> >> or Helm Chart have separate repos they can look at - with only
> > > > > > > > relevant
> > > > > > > > >> >> changes and history of releases for that particular entity. They
> > > > > > > can
> > > > > > > > >> raise
> > > > > > > > >> >> issues there (and in GitHub, we can easily refer to those issues
> > > > > > > from
> > > > > > > > >> the
> > > > > > > > >> >> main "airflow" repo). All the discussion from "user issues" are
> > > > > > > kept
> > > > > > > > >> >> in the
> > > > > > > > >> >> relevant repositories. Still - comments about development changes
> > > > > > > > (and
> > > > > > > > >> >> related issues) might still be kept in the main "airflow" repo -
> > > > > > > next
> > > > > > > > >> to
> > > > > > > > >> >> other "development" changes.
> > > > > > > > >> >>
> > > > > > > > >> >> We can run separate releases from those linked repositories and
> > > > > > > even
> > > > > > > > >> >> publish sources directly from those repositories rather than from
> > > > > > > the
> > > > > > > > >> main
> > > > > > > > >> >> one. At the same time - we avoid all the hassle of submodules.
> > > > > > > > >> >>
> > > > > > > > >> >> We had a discussion with Daniel yesterday and we are both concerned
> > > > > > > > >> about
> > > > > > > > >> >> all the overhead for people like us who work on all three
> > > > > > > "entities"
> > > > > > > > >> >> at the
> > > > > > > > >> >> same time. Even just explaining how to work with Pull Requests and
> > > > > > > in
> > > > > > > > >> what
> > > > > > > > >> >> sequence those PRs would have to be opened and merged in case of
> > > > > > > > >> changes
> > > > > > > > >> >> that are spanning across several "entities" - was a challenge. I
> > > > > > > was
> > > > > > > > >> unable
> > > > > > > > >> >> to clearly explain the sequence and way of reviewing/merging the
> > > > > > > PRs
> > > > > > > > >> that
> > > > > > > > >> >> will have to be made if we have submodules. This is a bad sign as I
> > > > > > > > was
> > > > > > > > >> >> using submodules in the past and know how it works but I was unable
> > > > > > > > to
> > > > > > > > >> >> explain it clearly.
> > > > > > > > >> >>
> > > > > > > > >> >> I really, really like Kubernetes approach - seems that it's one of
> > > > > > > > the
> > > > > > > > >> >> cases where we can "eat cake and have it too".
> > > > > > > > >> >>
> > > > > > > > >> >> J.
> > > > > > > > >> >>
> > > > > > > > >> >>
> > > > > > > > >> >> On Thu, Jul 2, 2020 at 5:59 PM Ry Walker <ry@rywalker.com (mailto:ry@rywalker.com)> wrote:
> > > > > > > > >> >>
> > > > > > > > >> >> > One reason to have a monorepo is for project branding, and end
> > > > > > > user
> > > > > > > > >> >> > experience. But for component development experience, it's nice
> > > > > > > to
> > > > > > > > >> >> have a
> > > > > > > > >> >> > small, dedicated repo.
> > > > > > > > >> >> >
> > > > > > > > >> >> > I think the git submodule approach is technically sound, but is
> > > > > > > at
> > > > > > > > >> odds
> > > > > > > > >> >> > with making the project easy to consume/understand from the end
> > > > > > > > user
> > > > > > > > >> >> > perspective, especially if we expand the use of subprojects. And
> > > > > > > > >> >> the main
> > > > > > > > >> >> > Airflow commit graph would appear to be slowing down which is bad
> > > > > > > > for
> > > > > > > > >> >> > Airflow brand perception.
> > > > > > > > >> >> >
> > > > > > > > >> >> > Kubernetes has many sub-repos that are integrated into the main
> > > > > > > > >> >> repo -
> > > > > > > > >> >> > which I think could be the best of both worlds:
> > > > > > > > >> >> > Example:
> > > > > > > > >> https://github.com/kubernetes/kubernetes/tree/master/staging
> > > > > > > > >> >> >
> > > > > > > > >> >> > I haven't dug in very deeply, and I won't pretend to understand
> > > > > > > how
> > > > > > > > >> >> > challenging it may be to maintain this structure, but I'd support
> > > > > > > > >> >> breaking
> > > > > > > > >> >> > more components out of the main Airflow repo for dev purposes
> > > > > > > (for
> > > > > > > > >> >> example,
> > > > > > > > >> >> > in the future, it'd be nice to have airflow-cli, airflow-api,
> > > > > > > > >> >> > airflow-scheduler, individual provider repos that are cleanly
> > > > > > > > >> separated)
> > > > > > > > >> >> as
> > > > > > > > >> >> > long as we bring the commits/contributions back into the monorepo
> > > > > > > > >> with
> > > > > > > > >> >> > automation.
> > > > > > > > >> >> >
> > > > > > > > >> >> > Maybe we could dive a little deeper into how K8s is operating,
> > > > > > > > before
> > > > > > > > >> >> going
> > > > > > > > >> >> > with submodules?
> > > > > > > > >> >> >
> > > > > > > > >> >> > -Ry
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >> > On Thu, Jul 2, 2020 at 11:24 AM Kaxil Naik <kaxilnaik@gmail.com (mailto:kaxilnaik@gmail.com)>
> > > > > > > > >> wrote:
> > > > > > > > >> >> >
> > > > > > > > >> >> > > Let's come to a consensus first before we do anything :-)
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > Is everyone happy with separate repo approach? Let's wait for
> > > > > > > 72
> > > > > > > > >> hours
> > > > > > > > >> >> to
> > > > > > > > >> >> > > hear from all and then have a plan on how we do it? WDYT?
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > But indeed git submodules approach sounds good. We do it for
> > > > > > > for
> > > > > > > > >> >> *Airflow
> > > > > > > > >> >> > > Site *(
> > > > > > > > >> >> > >
> > > > > > > > >> >> > >
> > > > > > > > >> >> >
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > > https://github.com/apache/airflow-site/tree/master/landing-pages/site/themes
> > > > > > > > >> >> > > )
> > > > > > > > >> >> > > too.
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > Regards,
> > > > > > > > >> >> > > Kaxil
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > On Thu, Jul 2, 2020 at 4:15 PM Jarek Potiuk <
> > > > > > > > >> Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)>
> > > > > > > > >> >> > > wrote:
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > > Absolutely - I am happy to add "best practices" and short
> > > > > > > > >> >> "howto do
> > > > > > > > >> >> > stuff
> > > > > > > > >> >> > > > with git submodules" - and this knowledge will only be
> > > > > > > needed
> > > > > > > > >> for
> > > > > > > > >> >> > > > interacting with prod image/helmchart/running kubernetes
> > > > > > > tests.
> > > > > > > > >> For
> > > > > > > > >> >> all
> > > > > > > > >> >> > > the
> > > > > > > > >> >> > > > other purposes it should be "business as usual".
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > > On Thu, Jul 2, 2020 at 4:53 PM Daniel Imberman <
> > > > > > > > >> >> > > daniel.imberman@gmail.com (mailto:daniel.imberman@gmail.com)>
> > > > > > > > >> >> > > > wrote:
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > > > I think git submodules sounds like a great idea. We would
> > > > > > > > >> >> need to
> > > > > > > > >> >> > write
> > > > > > > > >> >> > > > > this into the CONTRIBUTING.md to let people know how to do
> > > > > > > it
> > > > > > > > >> but
> > > > > > > > >> >> > It’s
> > > > > > > > >> >> > > a
> > > > > > > > >> >> > > > > “teach once” situation.
> > > > > > > > >> >> > > > >
> > > > > > > > >> >> > > > > via Newton Mail [
> > > > > > > > >> >> > > > >
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > >
> > > > > > > > >> >> >
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > > > > >> >> > > > > ]
> > > > > > > > >> >> > > > > On Thu, Jul 2, 2020 at 2:44 AM, Tomasz Urbaszek <
> > > > > > > > >> >> > turbaszek@apache.org (mailto:turbaszek@apache.org)>
> > > > > > > > >> >> > > > > wrote:
> > > > > > > > >> >> > > > > I support the idea of separate repos. The git submodules
> > > > > > > > >> mentioned
> > > > > > > > >> >> by
> > > > > > > > >> >> > > > > Jarek sounds like an interesting solution. It may add some
> > > > > > > > >> >> complexity
> > > > > > > > >> >> > > > > for new contributors but it's not rocket science. If we
> > > > > > > agree
> > > > > > > > >> on
> > > > > > > > >> >> > using
> > > > > > > > >> >> > > > > this we should add small how-to in contributing.rst I think
> > > > > > > > >> (i.e.
> > > > > > > > >> >> do
> > > > > > > > >> >> > I
> > > > > > > > >> >> > > > > have to have fork of each repo?).
> > > > > > > > >> >> > > > >
> > > > > > > > >> >> > > > > As stressed previously if we go this route we should make
> > > > > > > > >> >> sure we
> > > > > > > > >> >> > have
> > > > > > > > >> >> > > > > nice testing of all those three components. Regarding the
> > > > > > > > >> >> versioning,
> > > > > > > > >> >> > > > > I have no strong opinion but I fully support using separate
> > > > > > > > >> issues
> > > > > > > > >> >> > for
> > > > > > > > >> >> > > > > airflow, docker, and helm.
> > > > > > > > >> >> > > > >
> > > > > > > > >> >> > > > > Tomek
> > > > > > > > >> >> > > > >
> > > > > > > > >> >> > > > >
> > > > > > > > >> >> > > > > On Thu, Jul 2, 2020 at 9:26 AM Jarek Potiuk <
> > > > > > > > >> >> > Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)>
> > > > > > > > >> >> > > > > wrote:
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > On Thu, Jul 2, 2020 at 3:16 AM Daniel Imberman <
> > > > > > > > >> >> > > > > daniel.imberman@gmail.com (mailto:daniel.imberman@gmail.com)>
> > > > > > > > >> >> > > > > > wrote:
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > I’m fine with keeping it as three separate repos but
> > > > > > > > merging
> > > > > > > > >> >> > testing
> > > > > > > > >> >> > > > > > > somehow (e.g. the source code chart would pull the
> > > > > > > > >> helm/docker
> > > > > > > > >> >> > > chart
> > > > > > > > >> >> > > > > into
> > > > > > > > >> >> > > > > > > .build) but we need to do it in a way that doesn’t make
> > > > > > > > >> testing
> > > > > > > > >> >> > too
> > > > > > > > >> >> > > > > > > difficult.
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > So for example: How do I test/integration test a change
> > > > > > > > >> that
> > > > > > > > >> >> > > > involves a
> > > > > > > > >> >> > > > > > > change to all three and has to be done at the same
> > > > > > > time?
> > > > > > > > >> >> Perhaps
> > > > > > > > >> >> > a
> > > > > > > > >> >> > > > > user can
> > > > > > > > >> >> > > > > > > “register” a branch of helm and docker when they start
> > > > > > > up
> > > > > > > > >> >> breeze?
> > > > > > > > >> >> > > Or
> > > > > > > > >> >> > > > > > > perhaps we create a “parent” integration test that uses
> > > > > > > > the
> > > > > > > > >> >> three
> > > > > > > > >> >> > > > > together?
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > Yes, those are exactly my concerns when splitting the
> > > > > > > > repos.
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > I think testing for development should remain in the
> > > > > > > > >> "airflow"
> > > > > > > > >> >> > repo.
> > > > > > > > >> >> > > It
> > > > > > > > >> >> > > > > is
> > > > > > > > >> >> > > > > > the "central one" in fact. I slept it over and I think
> > > > > > > > using
> > > > > > > > >> >> > > "released"
> > > > > > > > >> >> > > > > > versions for development testing will suffer from this
> > > > > > > "we
> > > > > > > > >> >> need a
> > > > > > > > >> >> > > > change
> > > > > > > > >> >> > > > > in
> > > > > > > > >> >> > > > > > all three of those".
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > But we have an easy solution I think.
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > I think that simply setting submodules properly should do
> > > > > > > > >> >> to the
> > > > > > > > >> >> > job:
> > > > > > > > >> >> > > > > > https://git-scm.com/book/en/v2/Git-Tools-Submodules.
> > > > > > > They
> > > > > > > > >> seem
> > > > > > > > >> >> to
> > > > > > > > >> >> > be
> > > > > > > > >> >> > > > > > perfect for our case.
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > For those who have not used it - in short - submodules
> > > > > > > work
> > > > > > > > >> in
> > > > > > > > >> >> the
> > > > > > > > >> >> > > way
> > > > > > > > >> >> > > > > that
> > > > > > > > >> >> > > > > > they register the "linked repos" and store related "hash"
> > > > > > > > >> >> of the
> > > > > > > > >> >> > > commit
> > > > > > > > >> >> > > > > > from that linked repo. For example, the "chart" folder
> > > > > > > will
> > > > > > > > >> >> be a
> > > > > > > > >> >> > link
> > > > > > > > >> >> > > > to
> > > > > > > > >> >> > > > > > "apache/airflow-helm-chart". We can also move the prod
> > > > > > > > >> Dockerfile
> > > > > > > > >> >> > to
> > > > > > > > >> >> > > a
> > > > > > > > >> >> > > > > > subfolder and link it to the separate repo. Git submodule
> > > > > > > > >> >> has a
> > > > > > > > >> >> > > > > > built-in mechanism to a) update to the latest version of
> > > > > > > > the
> > > > > > > > >> >> repo,
> > > > > > > > >> >> > b)
> > > > > > > > >> >> > > > > > commit your changes to the linked repo from there which
> > > > > > > is
> > > > > > > > >> >> all we
> > > > > > > > >> >> > > > need. I
> > > > > > > > >> >> > > > > > used those few times - I never liked submodules for
> > > > > > > sharing
> > > > > > > > >> >> > "library"
> > > > > > > > >> >> > > > > code,
> > > > > > > > >> >> > > > > > but for sharing helm/Docker It seems perfect.
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > From the "regular" developer point of view - you do not
> > > > > > > > >> >> need to
> > > > > > > > >> >> > > > > get/update
> > > > > > > > >> >> > > > > > submodules if you do not need to use them - so for all
> > > > > > > the
> > > > > > > > >> >> > > development
> > > > > > > > >> >> > > > > > purposes if you only change the "airflow" code, you would
> > > > > > > > not
> > > > > > > > >> >> even
> > > > > > > > >> >> > > need
> > > > > > > > >> >> > > > > to
> > > > > > > > >> >> > > > > > sync chart or Dockerfile. You do "git checkout" as usual
> > > > > > > > >> >> and it
> > > > > > > > >> >> > > should
> > > > > > > > >> >> > > > > > work. So basically - no change for "regular" airflow
> > > > > > > > >> development.
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > However, if you do need to work on helm + Docker + code,
> > > > > > > > >> >> then you
> > > > > > > > >> >> > > > simply
> > > > > > > > >> >> > > > > to
> > > > > > > > >> >> > > > > > "git submodule update", go to the linked "helm" or
> > > > > > > "docker"
> > > > > > > > >> >> folder,
> > > > > > > > >> >> > > > > > checkout the "master" version and you start making
> > > > > > > changes.
> > > > > > > > >> The
> > > > > > > > >> >> > only
> > > > > > > > >> >> > > > > thing
> > > > > > > > >> >> > > > > > to remember when you want to push your changes is to do
> > > > > > > > >> >> `git push
> > > > > > > > >> >> > > > > > --recurse-sumbodules="check" ` and it will make sure that
> > > > > > > > >> >> all the
> > > > > > > > >> >> > > repos
> > > > > > > > >> >> > > > > are
> > > > > > > > >> >> > > > > > updated, It is a bit involved, but latest git version
> > > > > > > have
> > > > > > > > >> >> a very
> > > > > > > > >> >> > > good
> > > > > > > > >> >> > > > > > support and it must only be used by people who work on
> > > > > > > > >> >> airflow +
> > > > > > > > >> >> > > > docker +
> > > > > > > > >> >> > > > > > helm - all the others are unaffected.
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > From the CI perspective also nothing changes - when we
> > > > > > > > >> checkout
> > > > > > > > >> >> the
> > > > > > > > >> >> > > > code
> > > > > > > > >> >> > > > > we
> > > > > > > > >> >> > > > > > will include submodules and our test harness will be
> > > > > > > > largely
> > > > > > > > >> >> > > unchanged.
> > > > > > > > >> >> > > > > > Submodule provides us with the right mechanism for cross
> > > > > > > > >> >> dependency
> > > > > > > > >> >> > > > even
> > > > > > > > >> >> > > > > if
> > > > > > > > >> >> > > > > > we use branches.
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > If everyone will be ok with that - I am happy to set it
> > > > > > > up,
> > > > > > > > >> With
> > > > > > > > >> >> > > > > submodules
> > > > > > > > >> >> > > > > > - we can switch to separate repos even without releasing
> > > > > > > > >> >> helm and
> > > > > > > > >> >> > > Prod
> > > > > > > > >> >> > > > > > chart "officially".
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > J.
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > via Newton Mail [
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > >
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > >
> > > > > > > > >> >> >
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > > https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > > > > >> >> > > > > > > ]
> > > > > > > > >> >> > > > > > > On Wed, Jul 1, 2020 at 3:20 PM, Jarek Potiuk <
> > > > > > > > >> >> > > > Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > > wrote:
> > > > > > > > >> >> > > > > > > Sure. We can work with such an approach. There will be
> > > > > > > > some
> > > > > > > > >> >> > > > > dependencies
> > > > > > > > >> >> > > > > > > that we might find are problematic, but If we all see
> > > > > > > > >> >> that it's
> > > > > > > > >> >> > > > > > > worth trying, there is a clear benefit that it makes
> > > > > > > for
> > > > > > > > a
> > > > > > > > >> >> > "clean"
> > > > > > > > >> >> > > > > > > split between those different "entities". And possibly
> > > > > > > > >> >> once we
> > > > > > > > >> >> > > > release
> > > > > > > > >> >> > > > > > > first versions of both image and chart, such problems
> > > > > > > > >> >> will be
> > > > > > > > >> >> > rare
> > > > > > > > >> >> > > > and
> > > > > > > > >> >> > > > > easy
> > > > > > > > >> >> > > > > > > to fix.
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > I personally think such split is inevitable eventually,
> > > > > > > > >> it's
> > > > > > > > >> >> > just a
> > > > > > > > >> >> > > > > matter
> > > > > > > > >> >> > > > > > > when to do it. If we decide to make this happen soon -
> > > > > > > I
> > > > > > > > am
> > > > > > > > >> >> more
> > > > > > > > >> >> > > than
> > > > > > > > >> >> > > > > happy
> > > > > > > > >> >> > > > > > > to work on making the split reality.
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > One prerequisite to that is that all those - Helm
> > > > > > > Chart,
> > > > > > > > >> Prod
> > > > > > > > >> >> > Image
> > > > > > > > >> >> > > > and
> > > > > > > > >> >> > > > > > > Airflow are released in stable versions separately
> > > > > > > > >> >> "officially" -
> > > > > > > > >> >> > > > from
> > > > > > > > >> >> > > > > the
> > > > > > > > >> >> > > > > > > current sources (otherwise there will be no way to test
> > > > > > > > >> >> > > cross-repo).
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > I think for that we will need to agree on the
> > > > > > > versioning
> > > > > > > > >> scheme
> > > > > > > > >> >> > and
> > > > > > > > >> >> > > > > cadence
> > > > > > > > >> >> > > > > > > for the Image and Helm Chart, then copy sources from
> > > > > > > > >> airflow
> > > > > > > > >> >> and
> > > > > > > > >> >> > > > > release
> > > > > > > > >> >> > > > > > > them as "baseline" including setup the tests for all of
> > > > > > > > >> >> those -
> > > > > > > > >> >> > > then
> > > > > > > > >> >> > > > we
> > > > > > > > >> >> > > > > > > can remove both Helm and Dockerfile from the airflow
> > > > > > > > repo.
> > > > > > > > >> >> Happy
> > > > > > > > >> >> > to
> > > > > > > > >> >> > > > > help
> > > > > > > > >> >> > > > > > > with that if that's the direction we choose as a
> > > > > > > > >> >> community. It
> > > > > > > > >> >> is
> > > > > > > > >> >> > > > > important
> > > > > > > > >> >> > > > > > > though that we keep the cross-repo testing working. We
> > > > > > > > >> >> have it
> > > > > > > > >> >> > > > working
> > > > > > > > >> >> > > > > as
> > > > > > > > >> >> > > > > > > of yesterday, so now the matter is - whatever we do we
> > > > > > > > >> >> keep it
> > > > > > > > >> >> > > > running
> > > > > > > > >> >> > > > > and
> > > > > > > > >> >> > > > > > > have development environment support easy development
> > > > > > > and
> > > > > > > > >> >> testing
> > > > > > > > >> >> > > of
> > > > > > > > >> >> > > > > > > either of the three (including CI testing cross-repos)
> > > > > > > ,
> > > > > > > > >> That's
> > > > > > > > >> >> > the
> > > > > > > > >> >> > > > > only
> > > > > > > > >> >> > > > > > > really important thing to me - the rest is more of
> > > > > > > > >> technicality
> > > > > > > > >> >> > how
> > > > > > > > >> >> > > > we
> > > > > > > > >> >> > > > > link
> > > > > > > > >> >> > > > > > > the repos, but principle remains.
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > Do we have an idea for the versioning scheme that we
> > > > > > > > >> >> would like
> > > > > > > > >> >> > to
> > > > > > > > >> >> > > > use
> > > > > > > > >> >> > > > > for
> > > > > > > > >> >> > > > > > > the Helm Chart and prod image ?
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > Should we make it CalVer
> > > > > > > > >> >> <https://calver.org/overview.html> or
> > > > > > > > >> >> > > > SemVer
> > > > > > > > >> >> > > > > > > <https://semver.org/> (or some other scheme)? And how
> > > > > > > > >> should
> > > > > > > > >> >> we
> > > > > > > > >> >> > > > treat
> > > > > > > > >> >> > > > > the
> > > > > > > > >> >> > > > > > > combinations with Airflow?
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > My thoughts (but I have no strong opinions as long as
> > > > > > > > >> someone
> > > > > > > > >> >> > > > proposes
> > > > > > > > >> >> > > > > more
> > > > > > > > >> >> > > > > > > sensible versioning schemes):
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > 1) Airflow code - we continue the release scheme we
> > > > > > > have
> > > > > > > > >> (with
> > > > > > > > >> >> > > > > deciding on
> > > > > > > > >> >> > > > > > > 2.* scheme for the release). I expect in the future we
> > > > > > > > >> might
> > > > > > > > >> >> > decide
> > > > > > > > >> >> > > > on
> > > > > > > > >> >> > > > > > > doing branches or patches so for 2.* I'd opt for going
> > > > > > > > full
> > > > > > > > >> >> > SemVer
> > > > > > > > >> >> > > > > approach
> > > > > > > > >> >> > > > > > > and patches released from branches.
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > 2) I believe that Helm Chart can be versioned with its
> > > > > > > > own
> > > > > > > > >> >> > version
> > > > > > > > >> >> > > > > (then
> > > > > > > > >> >> > > > > > > you specify the image version as helm parameter). For
> > > > > > > the
> > > > > > > > >> Helm
> > > > > > > > >> >> > > Chart
> > > > > > > > >> >> > > > I
> > > > > > > > >> >> > > > > > > think CalVer might be OK as I do not expect any
> > > > > > > > >> >> branching/patches
> > > > > > > > >> >> > > in
> > > > > > > > >> >> > > > > the
> > > > > > > > >> >> > > > > > > future - I'd expect that there will be a single stream
> > > > > > > of
> > > > > > > > >> >> > releases.
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > 3) Dockerfile (+ related files such as .dockerignore,
> > > > > > > > empty
> > > > > > > > >> >> dir,
> > > > > > > > >> >> > > > > > > entrypoints etc). i do not imagine a lot of branching
> > > > > > > for
> > > > > > > > >> >> those -
> > > > > > > > >> >> > > we
> > > > > > > > >> >> > > > > > > should be able to release a new version of a Dockerfile
> > > > > > > > (+
> > > > > > > > >> >> > related
> > > > > > > > >> >> > > > > files)
> > > > > > > > >> >> > > > > > > working with nearly any earlier Airflow release, so
> > > > > > > > CalVer
> > > > > > > > >> >> seems
> > > > > > > > >> >> > > > like a
> > > > > > > > >> >> > > > > > > good choice.
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > 4) Image versioning becomes a bit most complex because
> > > > > > > > the
> > > > > > > > >> >> image
> > > > > > > > >> >> > > tag
> > > > > > > > >> >> > > > is
> > > > > > > > >> >> > > > > > > always combination of:
> > > > > > > > >> >> > > > > > > * Dockerfile (+ related files) version
> > > > > > > > >> >> > > > > > > * Airflow Version
> > > > > > > > >> >> > > > > > > * Python Version
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > An example versioning I can imagine:
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > *Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 -
> > > > > > > patch
> > > > > > > > >> level
> > > > > > > > >> >> > (if
> > > > > > > > >> >> > > we
> > > > > > > > >> >> > > > > > > decide to have patches).
> > > > > > > > >> >> > > > > > > *Dockerfile: *2020.07.12, 2020.08.20...... -> depending
> > > > > > > > >> >> when we
> > > > > > > > >> >> > > > release
> > > > > > > > >> >> > > > > > > them
> > > > > > > > >> >> > > > > > > *Helm Chart*: 2020.07.10, 2020.08.09 ...... Each Helm
> > > > > > > > Chart
> > > > > > > > >> >> has a
> > > > > > > > >> >> > > > > minimum
> > > > > > > > >> >> > > > > > > version of both Dockerfile and Airflow versions it
> > > > > > > works
> > > > > > > > >> with.
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > *Example Docker Image tags:*
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > WDYT?
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > J,
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik <
> > > > > > > > >> >> kaxilnaik@gmail.com (mailto:kaxilnaik@gmail.com)>
> > > > > > > > >> >> > > > > wrote:
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > > I think we should have "separate repos for
> > > > > > > development"
> > > > > > > > >> too.
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > > 3 Repos in total:
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > > 1) apache/airflow
> > > > > > > > >> >> > > > > > > > 2) apache/airflow-docker-image
> > > > > > > > >> >> > > > > > > > 3) apache/airflow-helm-chart
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > > (1) *apache/airflow* should use a pinned stable
> > > > > > > version
> > > > > > > > >> of
> > > > > > > > >> >> > > Airflow
> > > > > > > > >> >> > > > > Helm
> > > > > > > > >> >> > > > > > > > chart to run Kubernetes tests
> > > > > > > > >> >> > > > > > > > (2) *apache/airflow* already has *Dockerfile.ci* file
> > > > > > > > >> which
> > > > > > > > >> >> it
> > > > > > > > >> >> > > can
> > > > > > > > >> >> > > > > use to
> > > > > > > > >> >> > > > > > > > run airflow tests on docker images.
> > > > > > > > >> >> > > > > > > > (3) *apache/airflow-docker-image *should use the
> > > > > > > latest
> > > > > > > > >> >> > available
> > > > > > > > >> >> > > > > stable
> > > > > > > > >> >> > > > > > > > version of airflow
> > > > > > > > >> >> > > > > > > > (4) *apache/airflow-helm-chart *should use the latest
> > > > > > > > >> >> available
> > > > > > > > >> >> > > > > stable
> > > > > > > > >> >> > > > > > > > version of airflow
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > > Having such split also makes some updates more
> > > > > > > > >> >> difficult -
> > > > > > > > >> >> for
> > > > > > > > >> >> > > > > example if
> > > > > > > > >> >> > > > > > > > > we add new "extra" to Airflow that will require to
> > > > > > > > >> install
> > > > > > > > >> >> > > "apt"
> > > > > > > > >> >> > > > > > > > dependency
> > > > > > > > >> >> > > > > > > > > in Dockerfile, we will have to split it into first
> > > > > > > > >> adding
> > > > > > > > >> >> the
> > > > > > > > >> >> > > > > > > dependency
> > > > > > > > >> >> > > > > > > > to
> > > > > > > > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add the
> > > > > > > > >> >> extra to
> > > > > > > > >> >> > > > airflow
> > > > > > > > >> >> > > > > with
> > > > > > > > >> >> > > > > > > > > setup.py.
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > > Adding a new extra to setup.py would not (and should
> > > > > > > > not)
> > > > > > > > >> >> > impact
> > > > > > > > >> >> > > > the
> > > > > > > > >> >> > > > > > > > development of *apache/airflow-docker-image*
> > > > > > > > >> >> > > > > > > > Once an RC is cut for apache/airflow or after a new
> > > > > > > > >> version
> > > > > > > > >> >> is
> > > > > > > > >> >> > > > > released
> > > > > > > > >> >> > > > > > > for
> > > > > > > > >> >> > > > > > > > apache/airflow, we can work on supporting the new
> > > > > > > > airflow
> > > > > > > > >> >> > version
> > > > > > > > >> >> > > > in
> > > > > > > > >> >> > > > > the
> > > > > > > > >> >> > > > > > > > Production Docker Image.
> > > > > > > > >> >> > > > > > > > While doing that we can add all the libraries that
> > > > > > > are
> > > > > > > > >> needed
> > > > > > > > >> >> > by
> > > > > > > > >> >> > > > the
> > > > > > > > >> >> > > > > new
> > > > > > > > >> >> > > > > > > > Airflow Version and we will have a clean commit
> > > > > > > history
> > > > > > > > >> and
> > > > > > > > >> >> > > > > changelog for
> > > > > > > > >> >> > > > > > > > Docker image.
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > > We definitely do not need to work parallelly on both
> > > > > > > > the
> > > > > > > > >> >> repos.
> > > > > > > > >> >> > > By
> > > > > > > > >> >> > > > > doing
> > > > > > > > >> >> > > > > > > > development in a separate repo we keep consistent
> > > > > > > > >> "source"
> > > > > > > > >> >> > files
> > > > > > > > >> >> > > > and
> > > > > > > > >> >> > > > > we
> > > > > > > > >> >> > > > > > > can
> > > > > > > > >> >> > > > > > > > release each artifact with a
> > > > > > > > >> >> > > > > > > > separate cadence. If someone discovers bug in newly
> > > > > > > > >> released
> > > > > > > > >> >> > > > > Dockerimage,
> > > > > > > > >> >> > > > > > > > we should be easily able to cut out a new release
> > > > > > > with
> > > > > > > > >> the
> > > > > > > > >> >> > patch
> > > > > > > > >> >> > > > > without
> > > > > > > > >> >> > > > > > > > worrying about how development is
> > > > > > > > >> >> > > > > > > > going in the apache/airflow repo.
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > > *Apache Flink & Apache CoucheDB *does it in the
> > > > > > > similar
> > > > > > > > >> >> manner:
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > > https://github.com/apache/flink &
> > > > > > > > >> >> > > > > https://github.com/apache/flink-docker
> > > > > > > > >> >> > > > > > > > https://github.com/apache/couchdb &
> > > > > > > > >> >> > > > > > > > https://github.com/apache/couchdb-docker
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > > Regards,
> > > > > > > > >> >> > > > > > > > Kaxil
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > > On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk <
> > > > > > > > >> >> > > > > Jarek.Potiuk@polidea.com (mailto:Jarek.Potiuk@polidea.com)>
> > > > > > > > >> >> > > > > > > > wrote:
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > > > > I do not think it's only the question of Mono/Multi
> > > > > > > > >> repos.
> > > > > > > > >> >> > > While
> > > > > > > > >> >> > > > I
> > > > > > > > >> >> > > > > > > > clearly
> > > > > > > > >> >> > > > > > > > > see the benefit of separate repos I also see some
> > > > > > > > >> >> drawbacks.
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > And if it bothers others, I am happy to follow the
> > > > > > > > >> >> majority.
> > > > > > > > >> >> > If
> > > > > > > > >> >> > > > we
> > > > > > > > >> >> > > > > > > think
> > > > > > > > >> >> > > > > > > > > that a bit more complexity in testing justifies
> > > > > > > > >> separating
> > > > > > > > >> >> > > those
> > > > > > > > >> >> > > > > three
> > > > > > > > >> >> > > > > > > > > completely and having more "clean"- it's also
> > > > > > > > >> >> workable but
> > > > > > > > >> >> > IMHO
> > > > > > > > >> >> > > > > > > > introduces
> > > > > > > > >> >> > > > > > > > > certain complexity in development.
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > However I think this is not 0/1 a kind of Hybrid
> > > > > > > > >> approach
> > > > > > > > >> >> in
> > > > > > > > >> >> > my
> > > > > > > > >> >> > > > > opinion
> > > > > > > > >> >> > > > > > > > > might be best of both worlds - development and
> > > > > > > > >> >> releases .
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > Let me explain what I mean by "Hybrid":
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > I think we definitely should have separate
> > > > > > > > >> >> repositories to
> > > > > > > > >> >> > > > release
> > > > > > > > >> >> > > > > > > those
> > > > > > > > >> >> > > > > > > > > artifacts and I think there is no doubt about it:
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > * airflow (apache/airflow)
> > > > > > > > >> >> > > > > > > > > * prod docker image (apache/airflow-docker)
> > > > > > > > >> >> > > > > > > > > * helm chart (apache/airflow-helm)
> > > > > > > > >> >> > > > > > > > > * api clients (we already have separate repos for
> > > > > > > > >> those)
> > > > > > > > >> >> > > > > > > > > (apache/airflow-client-*)
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > I think the only question is where we develop all
> > > > > > > > those
> > > > > > > > >> >> > > (develop
> > > > > > > > >> >> > > > !=
> > > > > > > > >> >> > > > > > > > > release). There are certain benefits of having a
> > > > > > > > single
> > > > > > > > >> >> > > "master"
> > > > > > > > >> >> > > > > (let's
> > > > > > > > >> >> > > > > > > > > call it "development" further) for all those
> > > > > > > > artifacts.
> > > > > > > > >> >> > > Currently
> > > > > > > > >> >> > > > > the
> > > > > > > > >> >> > > > > > > > > "development" version for all of those is in one
> > > > > > > repo
> > > > > > > > >> >> - and
> > > > > > > > >> >> > > while
> > > > > > > > >> >> > > > > > > > > developing one depends on the other, we also test
> > > > > > > all
> > > > > > > > >> of
> > > > > > > > >> >> > those
> > > > > > > > >> >> > > > > together
> > > > > > > > >> >> > > > > > > > and
> > > > > > > > >> >> > > > > > > > > this means that "current best" set of airflow
> > > > > > > sources
> > > > > > > > >> >> > > (including
> > > > > > > > >> >> > > > > > > > > dependencies in setup.py), Dockerfile and Helm
> > > > > > > chart
> > > > > > > > >> work.
> > > > > > > > >> >> > This
> > > > > > > > >> >> > > > > means
> > > > > > > > >> >> > > > > > > for
> > > > > > > > >> >> > > > > > > > > example that you will not be able to break the Helm
> > > > > > > > >> Chart
> > > > > > > > >> >> by
> > > > > > > > >> >> > > > > changing
> > > > > > > > >> >> > > > > > > > > anything that the helm chart depends on in airflow.
> > > > > > > > For
> > > > > > > > >> >> > example
> > > > > > > > >> >> > > > if
> > > > > > > > >> >> > > > > you
> > > > > > > > >> >> > > > > > > > > change "airflow webserver" into "airflow server"
> > > > > > > the
> > > > > > > > >> >> current
> > > > > > > > >> >> > > helm
> > > > > > > > >> >> > > > > chart
> > > > > > > > >> >> > > > > > > > > will break. Similarly if you change entrypoint,sh
> > > > > > > in
> > > > > > > > >> Docker
> > > > > > > > >> >> > > image
> > > > > > > > >> >> > > > > in a
> > > > > > > > >> >> > > > > > > > way
> > > > > > > > >> >> > > > > > > > > that is not compatible with Helm chart, we will not
> > > > > > > > let
> > > > > > > > >> >> that
> > > > > > > > >> >> > > > > happen -
> > > > > > > > >> >> > > > > > > the
> > > > > > > > >> >> > > > > > > > > CI tests will break if either of those changes in
> > > > > > > an
> > > > > > > > >> >> > > incompatible
> > > > > > > > >> >> > > > > way.
> > > > > > > > >> >> > > > > > > > And
> > > > > > > > >> >> > > > > > > > > we can have dependencies in any direction between
> > > > > > > > those
> > > > > > > > >> >> > three.
> > > > > > > > >> >> > > > > When we
> > > > > > > > >> >> > > > > > > > see
> > > > > > > > >> >> > > > > > > > > a commit break either of the three - we can make a
> > > > > > > > >> decision
> > > > > > > > >> >> > > about
> > > > > > > > >> >> > > > > what
> > > > > > > > >> >> > > > > > > to
> > > > > > > > >> >> > > > > > > > > do - either accept and document the incompatibility
> > > > > > > > >> >> or fix
> > > > > > > > >> >> > it.
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > Of course keeping that property (testing it all
> > > > > > > > >> together)
> > > > > > > > >> >> is
> > > > > > > > >> >> > > also
> > > > > > > > >> >> > > > > > > > possible
> > > > > > > > >> >> > > > > > > > > if they are in completely separate repos. There are
> > > > > > > > >> several
> > > > > > > > >> >> > > > > > > > > cross-dependencies - Docker image building depends
> > > > > > > on
> > > > > > > > >> >> > > > dependencies
> > > > > > > > >> >> > > > > in
> > > > > > > > >> >> > > > > > > > > setup.py for example, you cannot build Docker image
> > > > > > > > >> from
> > > > > > > > >> >> only
> > > > > > > > >> >> > > > > > > Dockerfile
> > > > > > > > >> >> > > > > > > > > without the sources of airflow nor build and test
> > > > > > > > helm
> > > > > > > > >> >> charts
> > > > > > > > >> >> > > > > without
> > > > > > > > >> >> > > > > > > the
> > > > > > > > >> >> > > > > > > > > image (and sources - because that's where the
> > > > > > > current
> > > > > > > > >> >> > > kubernetes
> > > > > > > > >> >> > > > > tests
> > > > > > > > >> >> > > > > > > > > are). If we want to continue doing it for both Helm
> > > > > > > > and
> > > > > > > > >> >> > > > > Dockerfile, we
> > > > > > > > >> >> > > > > > > > > would have to basically check out the latest
> > > > > > > sources
> > > > > > > > of
> > > > > > > > >> >> > Airflow
> > > > > > > > >> >> > > > > and run
> > > > > > > > >> >> > > > > > > > the
> > > > > > > > >> >> > > > > > > > > CI tests before merging any Docker or Helm Chart
> > > > > > > > >> changes
> > > > > > > > >> >> and
> > > > > > > > >> >> > > the
> > > > > > > > >> >> > > > > > > > opposite -
> > > > > > > > >> >> > > > > > > > > we will have to download Dockerfile/Helm chart and
> > > > > > > > >> build
> > > > > > > > >> >> > > > > image/install
> > > > > > > > >> >> > > > > > > > Helm
> > > > > > > > >> >> > > > > > > > > chart when we are running CI tests for Airflow.
> > > > > > > This
> > > > > > > > is
> > > > > > > > >> >> > > possible
> > > > > > > > >> >> > > > > and we
> > > > > > > > >> >> > > > > > > > > could do it, but it adds complexity to the build/CI
> > > > > > > > >> >> process.
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > Having such split also makes some updates more
> > > > > > > > >> >> difficult -
> > > > > > > > >> >> > for
> > > > > > > > >> >> > > > > example
> > > > > > > > >> >> > > > > > > if
> > > > > > > > >> >> > > > > > > > > we add new "extra" to Airflow that will require to
> > > > > > > > >> install
> > > > > > > > >> >> > > "apt"
> > > > > > > > >> >> > > > > > > > dependency
> > > > > > > > >> >> > > > > > > > > in Dockerfile, we will have to split it into first
> > > > > > > > >> adding
> > > > > > > > >> >> the
> > > > > > > > >> >> > > > > > > dependency
> > > > > > > > >> >> > > > > > > > to
> > > > > > > > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add the
> > > > > > > > >> >> extra to
> > > > > > > > >> >> > > > airflow
> > > > > > > > >> >> > > > > with
> > > > > > > > >> >> > > > > > > > > setup.py. This makes it quite difficult to test it
> > > > > > > > >> together
> > > > > > > > >> >> > > > though
> > > > > > > > >> >> > > > > (the
> > > > > > > > >> >> > > > > > > > > Dockerfile change can only be tested fully after
> > > > > > > > >> >> merging it
> > > > > > > > >> >> > to
> > > > > > > > >> >> > > > > master).
> > > > > > > > >> >> > > > > > > > Not
> > > > > > > > >> >> > > > > > > > > mentioning complexity of managing different
> > > > > > > versions
> > > > > > > > >> >> - your
> > > > > > > > >> >> > > local
> > > > > > > > >> >> > > > > > > > > development Dockerfile version vs sources of
> > > > > > > Airflow
> > > > > > > > >> for
> > > > > > > > >> >> > > example.
> > > > > > > > >> >> > > > > > > Imagine
> > > > > > > > >> >> > > > > > > > > switching between branches where you add two
> > > > > > > > >> >> different apt
> > > > > > > > >> >> > > > > dependencies
> > > > > > > > >> >> > > > > > > > to
> > > > > > > > >> >> > > > > > > > > the Dockerfile. There are more similar scenarios I
> > > > > > > > can
> > > > > > > > >> >> > imagine
> > > > > > > > >> >> > > -
> > > > > > > > >> >> > > > > > > > especially
> > > > > > > > >> >> > > > > > > > > for parallel changes in those repos.
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > This is of course doable to keep them separate, but
> > > > > > > > >> >> it is
> > > > > > > > >> >> > > quite a
> > > > > > > > >> >> > > > > bit
> > > > > > > > >> >> > > > > > > > more
> > > > > > > > >> >> > > > > > > > > complex to set up (especially for a consistent
> > > > > > > > >> development
> > > > > > > > >> >> > > > > environment)
> > > > > > > > >> >> > > > > > > > > when you have separate repos and prevent
> > > > > > > > cross-breaking
> > > > > > > > >> >> > changes
> > > > > > > > >> >> > > > > might
> > > > > > > > >> >> > > > > > > be
> > > > > > > > >> >> > > > > > > > > more difficult.
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > I believe that the best way is to continue
> > > > > > > developing
> > > > > > > > >> >> > airflow +
> > > > > > > > >> >> > > > > image +
> > > > > > > > >> >> > > > > > > > > chart in one repo - airflow, but release them from
> > > > > > > > >> those
> > > > > > > > >> >> > > separate
> > > > > > > > >> >> > > > > > > repos.
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > Airflow source release does not have to contain
> > > > > > > > neither
> > > > > > > > >> >> > chart,
> > > > > > > > >> >> > > > nor
> > > > > > > > >> >> > > > > > > image.
> > > > > > > > >> >> > > > > > > > > And even if it contains sources for those, they are
> > > > > > > > >> >> not the
> > > > > > > > >> >> > > final
> > > > > > > > >> >> > > > > > > > > "artifacts" (installable image and installable helm
> > > > > > > > >> chart).
> > > > > > > > >> >> > > > > > > > > Whenever we decide to release either of them - we
> > > > > > > > >> >> test it
> > > > > > > > >> >> in
> > > > > > > > >> >> > > > > > > > "development".
> > > > > > > > >> >> > > > > > > > > Then only when it is tested, we copy the sources to
> > > > > > > > >> those
> > > > > > > > >> >> > > > separate
> > > > > > > > >> >> > > > > > > repos
> > > > > > > > >> >> > > > > > > > > and release them.
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > With git - we can even do it very easily while
> > > > > > > > >> preserving
> > > > > > > > >> >> > > history
> > > > > > > > >> >> > > > > of
> > > > > > > > >> >> > > > > > > > > commits easily (been there, done that). And then we
> > > > > > > > >> could
> > > > > > > > >> >> > > release
> > > > > > > > >> >> > > > > Helm
> > > > > > > > >> >> > > > > > > > and
> > > > > > > > >> >> > > > > > > > > Docker image separately based on the commits and
> > > > > > > tags
> > > > > > > > >> in
> > > > > > > > >> >> > those
> > > > > > > > >> >> > > > > separate
> > > > > > > > >> >> > > > > > > > > repositories.
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > I agree that separate repos is a more "clean"
> > > > > > > > approach.
> > > > > > > > >> >> But I
> > > > > > > > >> >> > > > > think it
> > > > > > > > >> >> > > > > > > is
> > > > > > > > >> >> > > > > > > > > less convenient for development consistency.
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > J,
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <
> > > > > > > > >> >> > kaxilnaik@gmail.com (mailto:kaxilnaik@gmail.com)
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > > > wrote:
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > > Forgot to mention, having them in separate repo
> > > > > > > > also
> > > > > > > > >> >> helps
> > > > > > > > >> >> > in
> > > > > > > > >> >> > > > > better
> > > > > > > > >> >> > > > > > > > > > managing each individual artifacts.
> > > > > > > > >> >> > > > > > > > > >
> > > > > > > > >> >> > > > > > > > > > Each repo would have a separate Github Issue
> > > > > > > where
> > > > > > > > >> >> we can
> > > > > > > > >> >> > > track
> > > > > > > > >> >> > > > > the
> > > > > > > > >> >> > > > > > > > issue
> > > > > > > > >> >> > > > > > > > > > specific to Helm chart or Dockerfile.
> > > > > > > > >> >> > > > > > > > > >
> > > > > > > > >> >> > > > > > > > > > Regards,
> > > > > > > > >> >> > > > > > > > > > Kaxil
> > > > > > > > >> >> > > > > > > > > >
> > > > > > > > >> >> > > > > > > > > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <
> > > > > > > > >> >> > > kaxilnaik@gmail.com (mailto:kaxilnaik@gmail.com)
> > > > > > > > >> >> > > > >
> > > > > > > > >> >> > > > > > > wrote:
> > > > > > > > >> >> > > > > > > > > >
> > > > > > > > >> >> > > > > > > > > > > The PMC also needs to agree if we want separate
> > > > > > > > >> VOTING
> > > > > > > > >> >> > for
> > > > > > > > >> >> > > > > Docker
> > > > > > > > >> >> > > > > > > > Image
> > > > > > > > >> >> > > > > > > > > > > and Helm chart, I think we do.
> > > > > > > > >> >> > > > > > > > > > >
> > > > > > > > >> >> > > > > > > > > > > Regards,
> > > > > > > > >> >> > > > > > > > > > > Kaxil
> > > > > > > > >> >> > > > > > > > > > >
> > > > > > > > >> >> > > > > > > > > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik <
> > > > > > > > >> >> > > > kaxilnaik@gmail.com (mailto:kaxilnaik@gmail.com)
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > > > wrote:
> > > > > > > > >> >> > > > > > > > > > >
> > > > > > > > >> >> > > > > > > > > > >> Hi all,
> > > > > > > > >> >> > > > > > > > > > >>
> > > > > > > > >> >> > > > > > > > > > >> What do you all think about having Dockerfile
> > > > > > > > >> >> and Helm
> > > > > > > > >> >> > > chart
> > > > > > > > >> >> > > > > in
> > > > > > > > >> >> > > > > > > the
> > > > > > > > >> >> > > > > > > > > same
> > > > > > > > >> >> > > > > > > > > > >> "Airflow" Repo vs separate?
> > > > > > > > >> >> > > > > > > > > > >>
> > > > > > > > >> >> > > > > > > > > > >> I feel having a separate repo for Airflow
> > > > > > > > >> Dockerfile
> > > > > > > > >> >> and
> > > > > > > > >> >> > > > Helm
> > > > > > > > >> >> > > > > > > chart
> > > > > > > > >> >> > > > > > > > > have
> > > > > > > > >> >> > > > > > > > > > >> more benefits like easy to track changes (via
> > > > > > > > >> >> > Changelog),
> > > > > > > > >> >> > > > > easy for
> > > > > > > > >> >> > > > > > > > new
> > > > > > > > >> >> > > > > > > > > > >> contributors, separate release cadence.
> > > > > > > > >> >> > > > > > > > > > >>
> > > > > > > > >> >> > > > > > > > > > >> Currently, docker file and Helm Chart are
> > > > > > > inside
> > > > > > > > >> the
> > > > > > > > >> >> > same
> > > > > > > > >> >> > > > > repo and
> > > > > > > > >> >> > > > > > > > > when
> > > > > > > > >> >> > > > > > > > > > >> we release changelog for a new Airflow
> > > > > > > version,
> > > > > > > > it
> > > > > > > > >> >> would
> > > > > > > > >> >> > > > > include
> > > > > > > > >> >> > > > > > > all
> > > > > > > > >> >> > > > > > > > > > >> changes (Airflow + Dockerfile + Helm chart)
> > > > > > > > >> >> which I
> > > > > > > > >> >> > think
> > > > > > > > >> >> > > is
> > > > > > > > >> >> > > > > not
> > > > > > > > >> >> > > > > > > > that
> > > > > > > > >> >> > > > > > > > > > great.
> > > > > > > > >> >> > > > > > > > > > >>
> > > > > > > > >> >> > > > > > > > > > >> Also having them all inside a single repo
> > > > > > > means
> > > > > > > > >> >> changes
> > > > > > > > >> >> > in
> > > > > > > > >> >> > > > > Helm
> > > > > > > > >> >> > > > > > > > Chart
> > > > > > > > >> >> > > > > > > > > > and
> > > > > > > > >> >> > > > > > > > > > >> Dockerfile can block Airflow release. We could
> > > > > > > > use
> > > > > > > > >> >> > stable
> > > > > > > > >> >> > > > Helm
> > > > > > > > >> >> > > > > > > Chart
> > > > > > > > >> >> > > > > > > > > > >> version and Dockerfile version to test Airflow
> > > > > > > > >> >> so that
> > > > > > > > >> >> > > they
> > > > > > > > >> >> > > > > are
> > > > > > > > >> >> > > > > > > > > > blockers to
> > > > > > > > >> >> > > > > > > > > > >> release too.
> > > > > > > > >> >> > > > > > > > > > >>
> > > > > > > > >> >> > > > > > > > > > >> Happy to hear the thoughts from the community.
> > > > > > > > >> >> > > > > > > > > > >>
> > > > > > > > >> >> > > > > > > > > > >> Regards,
> > > > > > > > >> >> > > > > > > > > > >> Kaxil
> > > > > > > > >> >> > > > > > > > > > >>
> > > > > > > > >> >> > > > > > > > > > >
> > > > > > > > >> >> > > > > > > > > >
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > --
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > Jarek Potiuk
> > > > > > > > >> >> > > > > > > > > Polidea <https://www.polidea.com/> | Principal
> > > > > > > > >> Software
> > > > > > > > >> >> > > Engineer
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > > > >> >> > > > > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > > > > >> >> > > > > > > > >
> > > > > > > > >> >> > > > > > > >
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > --
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > Jarek Potiuk
> > > > > > > > >> >> > > > > > > Polidea <https://www.polidea.com/> | Principal
> > > > > > > Software
> > > > > > > > >> >> Engineer
> > > > > > > > >> >> > > > > > >
> > > > > > > > >> >> > > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > > > >> >> > > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > --
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > Jarek Potiuk
> > > > > > > > >> >> > > > > > Polidea <https://www.polidea.com/> | Principal Software
> > > > > > > > >> Engineer
> > > > > > > > >> >> > > > > >
> > > > > > > > >> >> > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > > > >> >> > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > > --
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > > Jarek Potiuk
> > > > > > > > >> >> > > > Polidea <https://www.polidea.com/> | Principal Software
> > > > > > > > Engineer
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > > > M: +48 660 796 129 <+48660796129>
> > > > > > > > >> >> > > > [image: Polidea] <https://www.polidea.com/>
> > > > > > > > >> >> > > >
> > > > > > > > >> >> > >
> > > > > > > > >> >> >
> > > > > > > > >> >>
> > > > > > > > >> >>
> > > > > > > > >> >> --
> > > > > > > > >> >>
> > > > > > > > >> >> Jarek Potiuk
> > > > > > > > >> >> Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > > > > > >> >>
> > > > > > > > >> >> M: +48 660 796 129 <+48660796129>
> > > > > > > > >> >> [image: Polidea] <https://www.polidea.com/>
> > > > > > > > >> >>
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Jarek Potiuk
> > > > > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > > > > > >
> > > > > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > >
> > > > > > > > Jarek Potiuk
> > > > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > > > > >
> > > > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jarek Potiuk
> > > > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > > > >
> > > > > > M: +48 660 796 129 <+48660796129>
> > > > > > [image: Polidea] <https://www.polidea.com/>
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > > Jarek Potiuk
> > > > > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > M: +48 660 796 129 (tel:+48660796129)
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> > >
> > > --
> > >
> > >
> > > Jarek Potiuk
> > > Polidea (https://www.polidea.com/) | Principal Software Engineer
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > M: +48 660 796 129 (tel:+48660796129)
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
>
>
> --
>
>
> Jarek Potiuk
> Polidea (https://www.polidea.com/) | Principal Software Engineer
>
>
>
>
>
>
>
> M: +48 660 796 129 (tel:+48660796129)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


Re: Separate Repo vs MonoRepo for Dockerfile & Helm Chart

Posted by Jarek Potiuk <Ja...@polidea.com>.
Calling for Lazy consensus here. Unless someone objects in 72 hours
(roughly end of this weekend) I will create an "airflow-docker" repo.

For now, I want to focus only on building the docker image. Any other stuff
(docker-compose, helm chart) might be a separate discussion after that.

J.

On Mon, Oct 26, 2020 at 6:26 AM Daniel Imberman <da...@gmail.com>
wrote:

> I am all for this. This is how kubernetes does it and it has worked out
> really well for them.
>
> On Sun, Oct 25, 2020 at 10:23 PM, Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> Yep, that would be nice. Agree that this is not obvious where some files
> come from.
>
> Agree this could be done if everyone thinks it's a good idea. This would
> be perfectly doable, we could even make it works with the whole history
> maintained (we'd just need to include historical paths in the script).
>
> And if we make it in time before 1.10.13, we could even release it within
> 1.10.13.
>
> J
>
>
> On Sun, Oct 25, 2020 at 10:03 PM Kamil Breguła <ka...@polidea.com>
> wrote:
>
>> I took a quick look and I like the overall concept, but I'm just
>> wondering if it will be clear enough for users. Currently, these scripts
>> copy different files from different directories and the mapping of the
>> source to the destination is written in the scripts. This will make it
>> difficult to contribute to this "sub-project". In my opinion, if we want to
>> create new repositories from some files, we should only do it for one
>> directory. If this directory has dependencies, we should try to break them
>> down. The end-user should not get the impression that they are in contact
>> with the copied repository at the first glance. Otherwise, we will not
>> achieve our primary goal - to facilitate end-user use.
>>
>> In this case, it means that we should create a new directory in
>> apache/airflow named "prod-docker-image" or similar and move to it the
>> necessary Dockerfiles, documentation, scripts, and all other assets. In
>> particular, this directory should contain README.md which actually
>> describes the contents of that directory.
>>
>> A good example is /chart directory. It only has one dependency which is
>> not is "/chart" directory - the "Contributing" section in README.md refers
>> to the file in the root directory of the repository. This link will stop
>> working if we create a new repository from the entire directory. It will be
>> trivial to fix.
>>
>> On Sun, Oct 25, 2020 at 9:18 PM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>>> Hello Everyone,
>>>
>>> I would like to come back to the discussion as I have *JUST* implemented
>>> the solution (very simple but 100% working) to this monorepo vs. separate
>>> repos.
>>>
>>> You can take a look at this repo of mine:
>>> https://github.com/potiuk/airflow-docker. It is very simple and works
>>> like a charm. I implemented it to solve the issue
>>> https://github.com/apache/airflow/issues/11740
>>>
>>> This is a separate repo that people can use to have a separate
>>> "read-only" repository that **only** keeps our Dockerfile-related stuff -
>>> including the full history of changes related (and only those), full
>>> traceability, and incremental, automated synchronization from our "airflow"
>>> repo.
>>>
>>> I can - any time - set it up as "apache/airflow-docker" and get it to
>>> synchronize every day or every hour.
>>>
>>> Here, how it works:
>>>
>>> * The "master" and "v1-10-stable" branches are filtered to only contain
>>> files that are needed to build Prod Docker image
>>> * We keep history of all relevant commits in those branches
>>> * In the "main" branch we only keep the "scheduled" Github Actions
>>> workflow that does the synchronization and README.md which explains what
>>> needs to be done to build the docker image
>>> * I am using the excellent "git-filter-repo" tool which does the job
>>> really well and fast. Git-filter-repo is recommended by Git maintainers
>>> over the old, slow and much worse built-in git-filter-branch:
>>> https://git-scm.com/docs/git-filter-branch#_warning
>>> * the jobs to synchronize the repo takes 1m30 s to run - it is rather
>>> fast despite analyzing 13500 commits :)
>>> * it runs incrementally - just adding new commits when they appear
>>> * it is very simple, few lines script + few steps in Github Action to
>>> checkout/push the right branches
>>> * we keep all the commit mapping in the repo as well, so we have 1-1
>>> relationship between the commits in the "docker repo" and the original ones
>>> in Airflow repo
>>> * synchronization is 1-way - airflow -> airlfow-docker
>>> * we can use a very similar approach for synchronizing:
>>> * Helm chart
>>> * Open API clients
>>> * other stuff
>>>
>>> It also follows our source release strategy - it has the same
>>> "properties" as our main repo - so it is merely a "convenience" way of
>>> accessing the Docker customization options, but the same functionality is
>>> available in our officially released sources.
>>>
>>> Do you think we should turn it into the "apache/airflow-docker" repo?
>>>
>>> J.
>>>
>>>
>>>
>>> On Sun, Jul 5, 2020 at 8:12 PM Daniel Imberman <
>>> daniel.imberman@gmail.com> wrote:
>>>
>>>> Worth noting that git has the ability to cherry-pick only specific
>>>> directories. If we keep all of helm + tests in one directory, docker +
>>>> tests in another, and core + tests in a third directory it would be pretty
>>>> simple to automate splitting them.
>>>>
>>>>
>>>> https://stackoverflow.com/questions/19821749/git-cherry-pick-or-merge-specific-directory-from-another-branch
>>>>
>>>> via Newton Mail [
>>>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>>>> ]
>>>> On Sun, Jul 5, 2020 at 9:57 AM, Daniel Imberman <
>>>> daniel.imberman@gmail.com> wrote:
>>>> I can’t agree with this enough :). I think writing a few bots to
>>>> separate out sections will be MUCH easier in the long run than maintaining
>>>> multiple repos. Will also prevent the difficulty of setting up a proper dev
>>>> environment for new contributors.
>>>> via Newton Mail [
>>>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>>>> ]
>>>> On Sun, Jul 5, 2020 at 9:53 AM, Jarek Potiuk <Ja...@polidea.com>
>>>> wrote:
>>>> Yeah. I think that the "monorepo" is the only way for now - until (or
>>>> if)
>>>> we reach the size (and maturity) that different teams take care of the
>>>> different projects. Which might even not happen.
>>>>
>>>> But I would love to try the separate repos to publish/release still
>>>> (maybe
>>>> not immediately, but it is a nice concept). I think it should be rather
>>>> easy (I will try it on my own repo first). Also, I think it has another
>>>> advantage - those separate repos might actually run other kinds of
>>>> tests -
>>>> for example, to test if there is "everything" in that repo to release it
>>>> (for example build helm chart) and whether there are no accidental use
>>>> of
>>>> stuff from outside of those dirs.
>>>>
>>>> I already thought about how to do it - it should be rather easy. Of
>>>> course
>>>> - like most of the time - there is a ready-to-use git command doing it
>>>> for
>>>> us. We simply need a bot running for that rep executing a variant of
>>>> this
>>>> command:
>>>>
>>>> https://docs.github.com/en/github/using-git/splitting-a-subfolder-out-into-a-new-repository
>>>> (it
>>>> should only take commits from the commit merged last time). So level of
>>>> automation here is rather minimal.
>>>>
>>>> And if have those repos and at some point of time we decide to split
>>>> eventually - we will have already repos with all history as a starting
>>>> point.
>>>>
>>>> J.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> J.
>>>>
>>>>
>>>> On Sun, Jul 5, 2020 at 4:42 PM Kaxil Naik <ka...@gmail.com> wrote:
>>>>
>>>> > Hmm.. I agree the git-sync would have been a difficult one to solve
>>>> if we
>>>> > had separate repositories.
>>>> >
>>>> > Well, in that case, the mono repo approach (like we have now) indeed
>>>> makes
>>>> > more sense.
>>>> >
>>>> > Regarding the Kubernetes approach, I feel the ones in staging (
>>>> > https://github.com/kubernetes/kubernetes/tree/master/staging) are
>>>> part of
>>>> > the actual product itself but in our case we were discussing between
>>>> Helm
>>>> > chart and Dockerfile which are not actually part of the product. And
>>>> we
>>>> > will need a good deal of automation if we go down that route.
>>>> > I think the plain mono-repo approach is better than that one.
>>>> >
>>>> > Regards,
>>>> > Kaxil
>>>> >
>>>> >
>>>> > On Sun, Jul 5, 2020 at 9:19 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
>>>> >
>>>> > wrote:
>>>> >
>>>> > > And one more perfect illustration of what I am talking about.
>>>> > >
>>>> > > A very good thing just happened. I was running the PR while writing
>>>> the
>>>> > > email (long time as you might imagine) and the new K8S tests with
>>>> 1.10.11
>>>> > > just failed. https://github.com/apache/airflow/pull/9663
>>>> > >
>>>> > > If had released the helm chart before we would've clear (small)
>>>> > > incompatibility here. And by seeing the test failing we could make
>>>> > decision
>>>> > > what to do:
>>>> > >
>>>> > > 1) fix it differently
>>>> > > 2) document it as a breaking Helm change, "1.10.12+ image" and make
>>>> test
>>>> > > work in both cases
>>>> > > 3) revert ...
>>>> > >
>>>> > > But at least we have na early warning that something is wrong. This
>>>> is
>>>> > the
>>>> > > clear value of running the tests at every commit.
>>>> > >
>>>> > > J.
>>>> > >
>>>> > > On Sun, Jul 5, 2020 at 10:08 AM Jarek Potiuk <
>>>> Jarek.Potiuk@polidea.com>
>>>> > > wrote:
>>>> > >
>>>> > > > I just have another example of a case where splitting the repos
>>>> and
>>>> > using
>>>> > > > only "released versions" across repositories might be a complete
>>>> > overkill
>>>> > > > when it comes to development complexity.
>>>> > > >
>>>> > > > We have this change from Aneesh:
>>>> > > > https://github.com/apache/airflow/pull/9371 about adding a
>>>> git-sync
>>>> > > > option to the helm chart.
>>>> > > >
>>>> > > > That's a new feature, but we would like to test both 1.10 and the
>>>> > master
>>>> > > > version of KubernetesExecutor with that. It should work for both
>>>> of
>>>> > them
>>>> > > -
>>>> > > > there is no coupling/dependency in the "airflow' code for it.
>>>> > > >
>>>> > > > However, there is a strong coupling in the tests. We have the
>>>> > > > "kubernetes_tests" running tests using all three: chart,
>>>> production
>>>> > > docker,
>>>> > > > and Airflow, Those tests will have to be likely adapted to work
>>>> with
>>>> > the
>>>> > > > new git-sync option. They were disabled previously as we had
>>>> problems
>>>> > > with
>>>> > > > them before the helm chart was used for tests but we can turn
>>>> them back
>>>> > > on
>>>> > > > now when git-sync is added to the helm chart. Those tests are
>>>> part of
>>>> > > > airflow test suite and we discussed with Daniel that they should
>>>> stay
>>>> > > there
>>>> > > > - those tests are importing airflow code, they are using latest
>>>> example
>>>> > > > dags which are also in the airflow code.
>>>> > > >
>>>> > > > So we have two ways how we can develop this -
>>>> > > > A) monorepo (current)
>>>> > > > B) separate repos.
>>>> > > >
>>>> > > > Just to remind - he goal is that our change is tested against:
>>>> > > >
>>>> > > > 1) Released Airflow version (say 1.10.11).
>>>> > > > 2) Development airflow version (master - soon possibly
>>>> development)
>>>> > > > 3) Development docker image built with either "development" or
>>>> > "1.10.11"
>>>> > > > (we can release the Docker image for 1.10.11 independently from
>>>> the
>>>> > > current
>>>> > > > development HEAD). The docker image is supposed to work with any
>>>> > version
>>>> > > of
>>>> > > > airflow
>>>> > > >
>>>> > > > In the case of A) Monorepo we have all that as a given.
>>>> > > >
>>>> > > > I just sent this really small PR that should do the job:
>>>> > > > https://github.com/apache/airflow/pull/9663. What it does, it
>>>> takes
>>>> > the
>>>> > > > latest "development" docker image, "development" chart, bakes in
>>>> the
>>>> > > latest
>>>> > > > "example dags" from "development branch". The image uses either
>>>> > > > "development" or released (from PyPI) "1.10.11" Airflow version -
>>>> and
>>>> > run
>>>> > > > the "development" tests against it. This is exactly what we want.
>>>> If we
>>>> > > add
>>>> > > > new features to the helm chart, the Kubernetes tests will have to
>>>> be
>>>> > > > updated to include that - and this will happen in the airflow
>>>> > > "development"
>>>> > > > branch. The REALLY good thing in it - since we are running those
>>>> tests
>>>> > in
>>>> > > > CI build of airflow development branch - we prevent anyone from
>>>> making
>>>> > > > breaking changes. It is a given that both - the "development" of
>>>> > airflow
>>>> > > > and the "1.10.11" version of airflow will continue to work with
>>>> the
>>>> > image
>>>> > > > and chart.
>>>> > > >
>>>> > > >
>>>> > > > In the case of B) where we split the repos:
>>>> > > >
>>>> > > > We have to decide where to keep the "kubernetes_tests" - should
>>>> they be
>>>> > > in
>>>> > > > "Airflow" or in "Helm". They are testing BOTH so we can choose
>>>> either
>>>> > > way.
>>>> > > > Together with Daniel we plan to expand those tests to cover all
>>>> the
>>>> > > > different options we have in the Chart - testing all of it -
>>>> Kubernetes
>>>> > > > Executor, Celery Executor running on Kubernetes, MySQL (once we
>>>> add
>>>> > it),
>>>> > > > etc. etc. So we want to make sure we have a matrix of tests
>>>> covering a
>>>> > > > number of deployment options. Those tests do not exist yet, and
>>>> they
>>>> > will
>>>> > > > have to be written. In principle - they can be moved to the "Helm"
>>>> > > > repository. That's where they conceptually belong. However -
>>>> there is a
>>>> > > > Huge value in running the tests in airflow "development" - the
>>>> value is
>>>> > > > that no-one will be able to break the "development" airflow,
>>>> because
>>>> > > those
>>>> > > > tests are run with every PR. I think we have no choice but to run
>>>> those
>>>> > > > tests always in development. Otherwise, people maintaining the
>>>> helm
>>>> > chart
>>>> > > > will have to fix the problems introduced by people changing
>>>> Airflow
>>>> > > code. I
>>>> > > > think this is a pretty bad idea to allow that. So if we move those
>>>> > tests
>>>> > > to
>>>> > > > Helm Chart repo we have to figure out how to run those
>>>> "kubernetes"
>>>> > tests
>>>> > > > in CI for every build. This is quite possible - by getting the
>>>> latest
>>>> > > > master from helm chart and running the build, but it has several
>>>> > > problems:
>>>> > > >
>>>> > > > 1) The test code for CI will have to continue to stay in Airflow
>>>> (to
>>>> > run
>>>> > > > CI builds) - this means that we already have coupling and some
>>>> code
>>>> > > related
>>>> > > > to the execution of the helm tests has to be any way in Airflow.
>>>> > > >
>>>> > > > 2) Bigger problem. What happens if as "Airflow developer" you DO
>>>> > > introduce
>>>> > > > a change that breaks the helm chart? You will see a CI error
>>>> and.....
>>>> > You
>>>> > > > will not know what to do. Do you involve people who maintain the
>>>> helm
>>>> > > chart
>>>> > > > and wait for them? I think not. You should be able to reproduce
>>>> the
>>>> > > problem
>>>> > > > locally and fix it yourself (maybe with the help of others - but
>>>> you
>>>> > > should
>>>> > > > be able to fix your own commit). We would have to teach people
>>>> how to
>>>> > > bring
>>>> > > > the docker image and helm chart code from the latest version and
>>>> run
>>>> > the
>>>> > > > tests. We could do it automatically with Breeze (similarly as we
>>>> do
>>>> > with
>>>> > > > other integrations - where we bring in Kerberos, Mongo, and a
>>>> multitude
>>>> > > of
>>>> > > > others) without them even knowing it, but this might be fairly
>>>> complex
>>>> > > and
>>>> > > > prone to errors. In Monorepo - we already have a simple way of
>>>> > > reproducing
>>>> > > > and running the tests locally and everything is in one place.
>>>> > > >
>>>> > > > 3) There is a chance that someone makes a change in Helm in
>>>> parallel
>>>> > to a
>>>> > > > change in Airflow that breaks it. This could easily happen in the
>>>> > > "git-sync
>>>> > > > case" or when we add "MySQL" for example in the future. And there
>>>> is no
>>>> > > way
>>>> > > > to prevent it.
>>>> > > >
>>>> > > > 4) If we only test against "released" Helm and Airflow (that was
>>>> one of
>>>> > > > the suggestions), the problem is even bigger. How do you know
>>>> that you
>>>> > do
>>>> > > > not break the currently "developed" helm chart? Or how do you
>>>> know that
>>>> > > the
>>>> > > > currently "developed" helm chart works with latest Airflow
>>>> release? If
>>>> > > you
>>>> > > > do not do those checks at the "commit" time, then you defer this
>>>> to
>>>> > > > "release time" and only then you might find out that decisions
>>>> you made
>>>> > > > during development have to be reverted. This is a very, very bad
>>>> idea
>>>> > > IMHO
>>>> > > > again leading to the case that the release manager will have to
>>>> fix
>>>> > > > problems introduced by others.
>>>> > > >
>>>> > > > J,
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > On Fri, Jul 3, 2020 at 10:28 PM Ash Berlin-Taylor <ash@apache.org
>>>> >
>>>> > > wrote:
>>>> > > >
>>>> > > >> Monorepo FTW.
>>>> > > >>
>>>> > > >> Yes, it gets a little bit messier around release, but the
>>>> approach of
>>>> > > >> automatically extracting out the commits (or parts of commits)
>>>> to a
>>>> > > >> separate repo for releasing may be the solution to that problem
>>>> > > >>
>>>> > > >>
>>>> > > >> -ash
>>>> > > >>
>>>> > > >> On Jul 3 2020, at 7:51 pm, Kaxil Naik <ka...@gmail.com>
>>>> wrote:
>>>> > > >>
>>>> > > >> > I will take a look at the Kubernetes approach and get back to
>>>> this
>>>> > > >> thread.
>>>> > > >> >
>>>> > > >> > We had a discussion with Daniel yesterday and we are both
>>>> concerned
>>>> > > >> about
>>>> > > >> >> all the overhead for people like us who work on all three
>>>> > "entities"
>>>> > > >> >> at the
>>>> > > >> >> same time. Even just explaining how to work with Pull
>>>> Requests and
>>>> > in
>>>> > > >> what
>>>> > > >> >> sequence those PRs would have to be opened and merged in case
>>>> of
>>>> > > >> changes
>>>> > > >> >> that are spanning across several "entities" - was a
>>>> challenge. I
>>>> > was
>>>> > > >> unable
>>>> > > >> >> to clearly explain the sequence and way of reviewing/merging
>>>> the
>>>> > PRs
>>>> > > >> that
>>>> > > >> >> will have to be made if we have submodules. This is a bad
>>>> sign as I
>>>> > > was
>>>> > > >> >> using submodules in the past and know how it works but I was
>>>> unable
>>>> > > to
>>>> > > >> >> explain it clearly.
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > We don't even need submodules tbh. We can just use Bash Script
>>>> that
>>>> > > >> > pulls a
>>>> > > >> > pinned Helm Chart version.
>>>> > > >> > We only need Helm chart to run integration test for k8s
>>>> (atleast for
>>>> > > >> now).
>>>> > > >> > We already use tons of Bash scripts.
>>>> > > >> >
>>>> > > >> > One of the important benefits of separation that changes in one
>>>> > > >> component
>>>> > > >> > should not need change in other component, atleast
>>>> > > >> > not immediately.
>>>> > > >> >
>>>> > > >> > Changes in Helm chart and Docker file should never need
>>>> changes in
>>>> > > >> Airflow
>>>> > > >> > Changes in Airflow should only ever need a change in
>>>> Dockerfile and
>>>> > > Helm
>>>> > > >> > Chart after a new version is released.
>>>> > > >> >
>>>> > > >> > I just had a talk with Daniel too and still didn't find a good
>>>> > enough
>>>> > > >> > reason to have them in the same repo.
>>>> > > >> >
>>>> > > >> > I will definitely look at the Kubernetes approach (maybe it is
>>>> > better)
>>>> > > >> and
>>>> > > >> > get back to this thread. But as of now I don't see any major
>>>> PROs
>>>> > > >> > for having them in the same repo.
>>>> > > >> >
>>>> > > >> > Regards,
>>>> > > >> > Kaxil
>>>> > > >> >
>>>> > > >> >
>>>> > > >> >
>>>> > > >> > On Fri, Jul 3, 2020 at 5:00 PM Jarek Potiuk <
>>>> > Jarek.Potiuk@polidea.com
>>>> > > >
>>>> > > >> > wrote:
>>>> > > >> >
>>>> > > >> >> I think Ry's point is an important one - I thought about
>>>> writing a
>>>> > > >> longer
>>>> > > >> >> post but I looked at the Kubernetes structure and I really
>>>> like it
>>>> > so
>>>> > > >> just
>>>> > > >> >> wanted to comment on this last one.
>>>> > > >> >>
>>>> > > >> >> Seems that it is simply one "authoritative" (or source of
>>>> truth)
>>>> > repo
>>>> > > >> where
>>>> > > >> >> everything is developed in monorepo fashion but then there is
>>>> a bot
>>>> > > >> >> that moves every commit related to subdirectories to those
>>>> > > "split-out"
>>>> > > >> >> repos. There are never direct commits of people or PRs in the
>>>> > > >> "split-out"
>>>> > > >> >> repositories. This is very similar to my original proposal to
>>>> have
>>>> > > >> >> dedicated repos used for releases - but with an automated way
>>>> of
>>>> > > >> publishing
>>>> > > >> >> the commits to the "separated" repos at the moment, they are
>>>> merged
>>>> > > to
>>>> > > >> >> master in the main repo. I love it.
>>>> > > >> >>
>>>> > > >> >> I think it's really good and "pragmatic" solution. The code is
>>>> > > >> >> available in
>>>> > > >> >> separate repos, including the history of commits related to
>>>> each
>>>> > > >> "entity"
>>>> > > >> >> (so only chart-related commits in chart repo). Issues for
>>>> > particular
>>>> > > >> >> "entities" are in those separate repos as well (something that
>>>> > Kaxil
>>>> > > >> >> mentioned). Users (not developers!) who are interested only in
>>>> > > >> Dockerfile
>>>> > > >> >> or Helm Chart have separate repos they can look at - with only
>>>> > > relevant
>>>> > > >> >> changes and history of releases for that particular entity.
>>>> They
>>>> > can
>>>> > > >> raise
>>>> > > >> >> issues there (and in GitHub, we can easily refer to those
>>>> issues
>>>> > from
>>>> > > >> the
>>>> > > >> >> main "airflow" repo). All the discussion from "user issues"
>>>> are
>>>> > kept
>>>> > > >> >> in the
>>>> > > >> >> relevant repositories. Still - comments about development
>>>> changes
>>>> > > (and
>>>> > > >> >> related issues) might still be kept in the main "airflow"
>>>> repo -
>>>> > next
>>>> > > >> to
>>>> > > >> >> other "development" changes.
>>>> > > >> >>
>>>> > > >> >> We can run separate releases from those linked repositories
>>>> and
>>>> > even
>>>> > > >> >> publish sources directly from those repositories rather than
>>>> from
>>>> > the
>>>> > > >> main
>>>> > > >> >> one. At the same time - we avoid all the hassle of submodules.
>>>> > > >> >>
>>>> > > >> >> We had a discussion with Daniel yesterday and we are both
>>>> concerned
>>>> > > >> about
>>>> > > >> >> all the overhead for people like us who work on all three
>>>> > "entities"
>>>> > > >> >> at the
>>>> > > >> >> same time. Even just explaining how to work with Pull
>>>> Requests and
>>>> > in
>>>> > > >> what
>>>> > > >> >> sequence those PRs would have to be opened and merged in case
>>>> of
>>>> > > >> changes
>>>> > > >> >> that are spanning across several "entities" - was a
>>>> challenge. I
>>>> > was
>>>> > > >> unable
>>>> > > >> >> to clearly explain the sequence and way of reviewing/merging
>>>> the
>>>> > PRs
>>>> > > >> that
>>>> > > >> >> will have to be made if we have submodules. This is a bad
>>>> sign as I
>>>> > > was
>>>> > > >> >> using submodules in the past and know how it works but I was
>>>> unable
>>>> > > to
>>>> > > >> >> explain it clearly.
>>>> > > >> >>
>>>> > > >> >> I really, really like Kubernetes approach - seems that it's
>>>> one of
>>>> > > the
>>>> > > >> >> cases where we can "eat cake and have it too".
>>>> > > >> >>
>>>> > > >> >> J.
>>>> > > >> >>
>>>> > > >> >>
>>>> > > >> >> On Thu, Jul 2, 2020 at 5:59 PM Ry Walker <ry...@rywalker.com>
>>>> wrote:
>>>> > > >> >>
>>>> > > >> >> > One reason to have a monorepo is for project branding, and
>>>> end
>>>> > user
>>>> > > >> >> > experience. But for component development experience, it's
>>>> nice
>>>> > to
>>>> > > >> >> have a
>>>> > > >> >> > small, dedicated repo.
>>>> > > >> >> >
>>>> > > >> >> > I think the git submodule approach is technically sound,
>>>> but is
>>>> > at
>>>> > > >> odds
>>>> > > >> >> > with making the project easy to consume/understand from the
>>>> end
>>>> > > user
>>>> > > >> >> > perspective, especially if we expand the use of
>>>> subprojects. And
>>>> > > >> >> the main
>>>> > > >> >> > Airflow commit graph would appear to be slowing down which
>>>> is bad
>>>> > > for
>>>> > > >> >> > Airflow brand perception.
>>>> > > >> >> >
>>>> > > >> >> > Kubernetes has many sub-repos that are integrated into the
>>>> main
>>>> > > >> >> repo -
>>>> > > >> >> > which I think could be the best of both worlds:
>>>> > > >> >> > Example:
>>>> > > >> https://github.com/kubernetes/kubernetes/tree/master/staging
>>>> > > >> >> >
>>>> > > >> >> > I haven't dug in very deeply, and I won't pretend to
>>>> understand
>>>> > how
>>>> > > >> >> > challenging it may be to maintain this structure, but I'd
>>>> support
>>>> > > >> >> breaking
>>>> > > >> >> > more components out of the main Airflow repo for dev
>>>> purposes
>>>> > (for
>>>> > > >> >> example,
>>>> > > >> >> > in the future, it'd be nice to have airflow-cli,
>>>> airflow-api,
>>>> > > >> >> > airflow-scheduler, individual provider repos that are
>>>> cleanly
>>>> > > >> separated)
>>>> > > >> >> as
>>>> > > >> >> > long as we bring the commits/contributions back into the
>>>> monorepo
>>>> > > >> with
>>>> > > >> >> > automation.
>>>> > > >> >> >
>>>> > > >> >> > Maybe we could dive a little deeper into how K8s is
>>>> operating,
>>>> > > before
>>>> > > >> >> going
>>>> > > >> >> > with submodules?
>>>> > > >> >> >
>>>> > > >> >> > -Ry
>>>> > > >> >> >
>>>> > > >> >> >
>>>> > > >> >> >
>>>> > > >> >> >
>>>> > > >> >> > On Thu, Jul 2, 2020 at 11:24 AM Kaxil Naik <
>>>> kaxilnaik@gmail.com>
>>>> > > >> wrote:
>>>> > > >> >> >
>>>> > > >> >> > > Let's come to a consensus first before we do anything :-)
>>>> > > >> >> > >
>>>> > > >> >> > > Is everyone happy with separate repo approach? Let's wait
>>>> for
>>>> > 72
>>>> > > >> hours
>>>> > > >> >> to
>>>> > > >> >> > > hear from all and then have a plan on how we do it? WDYT?
>>>> > > >> >> > >
>>>> > > >> >> > > But indeed git submodules approach sounds good. We do it
>>>> for
>>>> > for
>>>> > > >> >> *Airflow
>>>> > > >> >> > > Site *(
>>>> > > >> >> > >
>>>> > > >> >> > >
>>>> > > >> >> >
>>>> > > >> >>
>>>> > > >>
>>>> > >
>>>> >
>>>> https://github.com/apache/airflow-site/tree/master/landing-pages/site/themes
>>>> > > >> >> > > )
>>>> > > >> >> > > too.
>>>> > > >> >> > >
>>>> > > >> >> > > Regards,
>>>> > > >> >> > > Kaxil
>>>> > > >> >> > >
>>>> > > >> >> > > On Thu, Jul 2, 2020 at 4:15 PM Jarek Potiuk <
>>>> > > >> Jarek.Potiuk@polidea.com>
>>>> > > >> >> > > wrote:
>>>> > > >> >> > >
>>>> > > >> >> > > > Absolutely - I am happy to add "best practices" and
>>>> short
>>>> > > >> >> "howto do
>>>> > > >> >> > stuff
>>>> > > >> >> > > > with git submodules" - and this knowledge will only be
>>>> > needed
>>>> > > >> for
>>>> > > >> >> > > > interacting with prod image/helmchart/running kubernetes
>>>> > tests.
>>>> > > >> For
>>>> > > >> >> all
>>>> > > >> >> > > the
>>>> > > >> >> > > > other purposes it should be "business as usual".
>>>> > > >> >> > > >
>>>> > > >> >> > > > On Thu, Jul 2, 2020 at 4:53 PM Daniel Imberman <
>>>> > > >> >> > > daniel.imberman@gmail.com>
>>>> > > >> >> > > > wrote:
>>>> > > >> >> > > >
>>>> > > >> >> > > > > I think git submodules sounds like a great idea. We
>>>> would
>>>> > > >> >> need to
>>>> > > >> >> > write
>>>> > > >> >> > > > > this into the CONTRIBUTING.md to let people know how
>>>> to do
>>>> > it
>>>> > > >> but
>>>> > > >> >> > It’s
>>>> > > >> >> > > a
>>>> > > >> >> > > > > “teach once” situation.
>>>> > > >> >> > > > >
>>>> > > >> >> > > > > via Newton Mail [
>>>> > > >> >> > > > >
>>>> > > >> >> > > >
>>>> > > >> >> > >
>>>> > > >> >> >
>>>> > > >> >>
>>>> > > >>
>>>> > >
>>>> >
>>>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>>>> > > >> >> > > > > ]
>>>> > > >> >> > > > > On Thu, Jul 2, 2020 at 2:44 AM, Tomasz Urbaszek <
>>>> > > >> >> > turbaszek@apache.org>
>>>> > > >> >> > > > > wrote:
>>>> > > >> >> > > > > I support the idea of separate repos. The git
>>>> submodules
>>>> > > >> mentioned
>>>> > > >> >> by
>>>> > > >> >> > > > > Jarek sounds like an interesting solution. It may add
>>>> some
>>>> > > >> >> complexity
>>>> > > >> >> > > > > for new contributors but it's not rocket science. If
>>>> we
>>>> > agree
>>>> > > >> on
>>>> > > >> >> > using
>>>> > > >> >> > > > > this we should add small how-to in contributing.rst I
>>>> think
>>>> > > >> (i.e.
>>>> > > >> >> do
>>>> > > >> >> > I
>>>> > > >> >> > > > > have to have fork of each repo?).
>>>> > > >> >> > > > >
>>>> > > >> >> > > > > As stressed previously if we go this route we should
>>>> make
>>>> > > >> >> sure we
>>>> > > >> >> > have
>>>> > > >> >> > > > > nice testing of all those three components. Regarding
>>>> the
>>>> > > >> >> versioning,
>>>> > > >> >> > > > > I have no strong opinion but I fully support using
>>>> separate
>>>> > > >> issues
>>>> > > >> >> > for
>>>> > > >> >> > > > > airflow, docker, and helm.
>>>> > > >> >> > > > >
>>>> > > >> >> > > > > Tomek
>>>> > > >> >> > > > >
>>>> > > >> >> > > > >
>>>> > > >> >> > > > > On Thu, Jul 2, 2020 at 9:26 AM Jarek Potiuk <
>>>> > > >> >> > Jarek.Potiuk@polidea.com>
>>>> > > >> >> > > > > wrote:
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > On Thu, Jul 2, 2020 at 3:16 AM Daniel Imberman <
>>>> > > >> >> > > > > daniel.imberman@gmail.com>
>>>> > > >> >> > > > > > wrote:
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > I’m fine with keeping it as three separate repos but
>>>> > > merging
>>>> > > >> >> > testing
>>>> > > >> >> > > > > > > somehow (e.g. the source code chart would pull the
>>>> > > >> helm/docker
>>>> > > >> >> > > chart
>>>> > > >> >> > > > > into
>>>> > > >> >> > > > > > > .build) but we need to do it in a way that
>>>> doesn’t make
>>>> > > >> testing
>>>> > > >> >> > too
>>>> > > >> >> > > > > > > difficult.
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > So for example: How do I test/integration test a
>>>> change
>>>> > > >> that
>>>> > > >> >> > > > involves a
>>>> > > >> >> > > > > > > change to all three and has to be done at the same
>>>> > time?
>>>> > > >> >> Perhaps
>>>> > > >> >> > a
>>>> > > >> >> > > > > user can
>>>> > > >> >> > > > > > > “register” a branch of helm and docker when they
>>>> start
>>>> > up
>>>> > > >> >> breeze?
>>>> > > >> >> > > Or
>>>> > > >> >> > > > > > > perhaps we create a “parent” integration test
>>>> that uses
>>>> > > the
>>>> > > >> >> three
>>>> > > >> >> > > > > together?
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > Yes, those are exactly my concerns when splitting
>>>> the
>>>> > > repos.
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > I think testing for development should remain in the
>>>> > > >> "airflow"
>>>> > > >> >> > repo.
>>>> > > >> >> > > It
>>>> > > >> >> > > > > is
>>>> > > >> >> > > > > > the "central one" in fact. I slept it over and I
>>>> think
>>>> > > using
>>>> > > >> >> > > "released"
>>>> > > >> >> > > > > > versions for development testing will suffer from
>>>> this
>>>> > "we
>>>> > > >> >> need a
>>>> > > >> >> > > > change
>>>> > > >> >> > > > > in
>>>> > > >> >> > > > > > all three of those".
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > But we have an easy solution I think.
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > I think that simply setting submodules properly
>>>> should do
>>>> > > >> >> to the
>>>> > > >> >> > job:
>>>> > > >> >> > > > > > https://git-scm.com/book/en/v2/Git-Tools-Submodules
>>>> .
>>>> > They
>>>> > > >> seem
>>>> > > >> >> to
>>>> > > >> >> > be
>>>> > > >> >> > > > > > perfect for our case.
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > For those who have not used it - in short -
>>>> submodules
>>>> > work
>>>> > > >> in
>>>> > > >> >> the
>>>> > > >> >> > > way
>>>> > > >> >> > > > > that
>>>> > > >> >> > > > > > they register the "linked repos" and store related
>>>> "hash"
>>>> > > >> >> of the
>>>> > > >> >> > > commit
>>>> > > >> >> > > > > > from that linked repo. For example, the "chart"
>>>> folder
>>>> > will
>>>> > > >> >> be a
>>>> > > >> >> > link
>>>> > > >> >> > > > to
>>>> > > >> >> > > > > > "apache/airflow-helm-chart". We can also move the
>>>> prod
>>>> > > >> Dockerfile
>>>> > > >> >> > to
>>>> > > >> >> > > a
>>>> > > >> >> > > > > > subfolder and link it to the separate repo. Git
>>>> submodule
>>>> > > >> >> has a
>>>> > > >> >> > > > > > built-in mechanism to a) update to the latest
>>>> version of
>>>> > > the
>>>> > > >> >> repo,
>>>> > > >> >> > b)
>>>> > > >> >> > > > > > commit your changes to the linked repo from there
>>>> which
>>>> > is
>>>> > > >> >> all we
>>>> > > >> >> > > > need. I
>>>> > > >> >> > > > > > used those few times - I never liked submodules for
>>>> > sharing
>>>> > > >> >> > "library"
>>>> > > >> >> > > > > code,
>>>> > > >> >> > > > > > but for sharing helm/Docker It seems perfect.
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > From the "regular" developer point of view - you do
>>>> not
>>>> > > >> >> need to
>>>> > > >> >> > > > > get/update
>>>> > > >> >> > > > > > submodules if you do not need to use them - so for
>>>> all
>>>> > the
>>>> > > >> >> > > development
>>>> > > >> >> > > > > > purposes if you only change the "airflow" code, you
>>>> would
>>>> > > not
>>>> > > >> >> even
>>>> > > >> >> > > need
>>>> > > >> >> > > > > to
>>>> > > >> >> > > > > > sync chart or Dockerfile. You do "git checkout" as
>>>> usual
>>>> > > >> >> and it
>>>> > > >> >> > > should
>>>> > > >> >> > > > > > work. So basically - no change for "regular" airflow
>>>> > > >> development.
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > However, if you do need to work on helm + Docker +
>>>> code,
>>>> > > >> >> then you
>>>> > > >> >> > > > simply
>>>> > > >> >> > > > > to
>>>> > > >> >> > > > > > "git submodule update", go to the linked "helm" or
>>>> > "docker"
>>>> > > >> >> folder,
>>>> > > >> >> > > > > > checkout the "master" version and you start making
>>>> > changes.
>>>> > > >> The
>>>> > > >> >> > only
>>>> > > >> >> > > > > thing
>>>> > > >> >> > > > > > to remember when you want to push your changes is
>>>> to do
>>>> > > >> >> `git push
>>>> > > >> >> > > > > > --recurse-sumbodules="check" ` and it will make
>>>> sure that
>>>> > > >> >> all the
>>>> > > >> >> > > repos
>>>> > > >> >> > > > > are
>>>> > > >> >> > > > > > updated, It is a bit involved, but latest git
>>>> version
>>>> > have
>>>> > > >> >> a very
>>>> > > >> >> > > good
>>>> > > >> >> > > > > > support and it must only be used by people who work
>>>> on
>>>> > > >> >> airflow +
>>>> > > >> >> > > > docker +
>>>> > > >> >> > > > > > helm - all the others are unaffected.
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > From the CI perspective also nothing changes - when
>>>> we
>>>> > > >> checkout
>>>> > > >> >> the
>>>> > > >> >> > > > code
>>>> > > >> >> > > > > we
>>>> > > >> >> > > > > > will include submodules and our test harness will be
>>>> > > largely
>>>> > > >> >> > > unchanged.
>>>> > > >> >> > > > > > Submodule provides us with the right mechanism for
>>>> cross
>>>> > > >> >> dependency
>>>> > > >> >> > > > even
>>>> > > >> >> > > > > if
>>>> > > >> >> > > > > > we use branches.
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > If everyone will be ok with that - I am happy to
>>>> set it
>>>> > up,
>>>> > > >> With
>>>> > > >> >> > > > > submodules
>>>> > > >> >> > > > > > - we can switch to separate repos even without
>>>> releasing
>>>> > > >> >> helm and
>>>> > > >> >> > > Prod
>>>> > > >> >> > > > > > chart "officially".
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > J.
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > via Newton Mail [
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > >
>>>> > > >> >> > > >
>>>> > > >> >> > >
>>>> > > >> >> >
>>>> > > >> >>
>>>> > > >>
>>>> > >
>>>> >
>>>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>>>> > > >> >> > > > > > > ]
>>>> > > >> >> > > > > > > On Wed, Jul 1, 2020 at 3:20 PM, Jarek Potiuk <
>>>> > > >> >> > > > Jarek.Potiuk@polidea.com
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > > wrote:
>>>> > > >> >> > > > > > > Sure. We can work with such an approach. There
>>>> will be
>>>> > > some
>>>> > > >> >> > > > > dependencies
>>>> > > >> >> > > > > > > that we might find are problematic, but If we all
>>>> see
>>>> > > >> >> that it's
>>>> > > >> >> > > > > > > worth trying, there is a clear benefit that it
>>>> makes
>>>> > for
>>>> > > a
>>>> > > >> >> > "clean"
>>>> > > >> >> > > > > > > split between those different "entities". And
>>>> possibly
>>>> > > >> >> once we
>>>> > > >> >> > > > release
>>>> > > >> >> > > > > > > first versions of both image and chart, such
>>>> problems
>>>> > > >> >> will be
>>>> > > >> >> > rare
>>>> > > >> >> > > > and
>>>> > > >> >> > > > > easy
>>>> > > >> >> > > > > > > to fix.
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > I personally think such split is inevitable
>>>> eventually,
>>>> > > >> it's
>>>> > > >> >> > just a
>>>> > > >> >> > > > > matter
>>>> > > >> >> > > > > > > when to do it. If we decide to make this happen
>>>> soon -
>>>> > I
>>>> > > am
>>>> > > >> >> more
>>>> > > >> >> > > than
>>>> > > >> >> > > > > happy
>>>> > > >> >> > > > > > > to work on making the split reality.
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > One prerequisite to that is that all those - Helm
>>>> > Chart,
>>>> > > >> Prod
>>>> > > >> >> > Image
>>>> > > >> >> > > > and
>>>> > > >> >> > > > > > > Airflow are released in stable versions separately
>>>> > > >> >> "officially" -
>>>> > > >> >> > > > from
>>>> > > >> >> > > > > the
>>>> > > >> >> > > > > > > current sources (otherwise there will be no way
>>>> to test
>>>> > > >> >> > > cross-repo).
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > I think for that we will need to agree on the
>>>> > versioning
>>>> > > >> scheme
>>>> > > >> >> > and
>>>> > > >> >> > > > > cadence
>>>> > > >> >> > > > > > > for the Image and Helm Chart, then copy sources
>>>> from
>>>> > > >> airflow
>>>> > > >> >> and
>>>> > > >> >> > > > > release
>>>> > > >> >> > > > > > > them as "baseline" including setup the tests for
>>>> all of
>>>> > > >> >> those -
>>>> > > >> >> > > then
>>>> > > >> >> > > > we
>>>> > > >> >> > > > > > > can remove both Helm and Dockerfile from the
>>>> airflow
>>>> > > repo.
>>>> > > >> >> Happy
>>>> > > >> >> > to
>>>> > > >> >> > > > > help
>>>> > > >> >> > > > > > > with that if that's the direction we choose as a
>>>> > > >> >> community. It
>>>> > > >> >> is
>>>> > > >> >> > > > > important
>>>> > > >> >> > > > > > > though that we keep the cross-repo testing
>>>> working. We
>>>> > > >> >> have it
>>>> > > >> >> > > > working
>>>> > > >> >> > > > > as
>>>> > > >> >> > > > > > > of yesterday, so now the matter is - whatever we
>>>> do we
>>>> > > >> >> keep it
>>>> > > >> >> > > > running
>>>> > > >> >> > > > > and
>>>> > > >> >> > > > > > > have development environment support easy
>>>> development
>>>> > and
>>>> > > >> >> testing
>>>> > > >> >> > > of
>>>> > > >> >> > > > > > > either of the three (including CI testing
>>>> cross-repos)
>>>> > ,
>>>> > > >> That's
>>>> > > >> >> > the
>>>> > > >> >> > > > > only
>>>> > > >> >> > > > > > > really important thing to me - the rest is more of
>>>> > > >> technicality
>>>> > > >> >> > how
>>>> > > >> >> > > > we
>>>> > > >> >> > > > > link
>>>> > > >> >> > > > > > > the repos, but principle remains.
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > Do we have an idea for the versioning scheme that
>>>> we
>>>> > > >> >> would like
>>>> > > >> >> > to
>>>> > > >> >> > > > use
>>>> > > >> >> > > > > for
>>>> > > >> >> > > > > > > the Helm Chart and prod image ?
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > Should we make it CalVer
>>>> > > >> >> <https://calver.org/overview.html> or
>>>> > > >> >> > > > SemVer
>>>> > > >> >> > > > > > > <https://semver.org/> (or some other scheme)?
>>>> And how
>>>> > > >> should
>>>> > > >> >> we
>>>> > > >> >> > > > treat
>>>> > > >> >> > > > > the
>>>> > > >> >> > > > > > > combinations with Airflow?
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > My thoughts (but I have no strong opinions as
>>>> long as
>>>> > > >> someone
>>>> > > >> >> > > > proposes
>>>> > > >> >> > > > > more
>>>> > > >> >> > > > > > > sensible versioning schemes):
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > 1) Airflow code - we continue the release scheme
>>>> we
>>>> > have
>>>> > > >> (with
>>>> > > >> >> > > > > deciding on
>>>> > > >> >> > > > > > > 2.* scheme for the release). I expect in the
>>>> future we
>>>> > > >> might
>>>> > > >> >> > decide
>>>> > > >> >> > > > on
>>>> > > >> >> > > > > > > doing branches or patches so for 2.* I'd opt for
>>>> going
>>>> > > full
>>>> > > >> >> > SemVer
>>>> > > >> >> > > > > approach
>>>> > > >> >> > > > > > > and patches released from branches.
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > 2) I believe that Helm Chart can be versioned
>>>> with its
>>>> > > own
>>>> > > >> >> > version
>>>> > > >> >> > > > > (then
>>>> > > >> >> > > > > > > you specify the image version as helm parameter).
>>>> For
>>>> > the
>>>> > > >> Helm
>>>> > > >> >> > > Chart
>>>> > > >> >> > > > I
>>>> > > >> >> > > > > > > think CalVer might be OK as I do not expect any
>>>> > > >> >> branching/patches
>>>> > > >> >> > > in
>>>> > > >> >> > > > > the
>>>> > > >> >> > > > > > > future - I'd expect that there will be a single
>>>> stream
>>>> > of
>>>> > > >> >> > releases.
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > 3) Dockerfile (+ related files such as
>>>> .dockerignore,
>>>> > > empty
>>>> > > >> >> dir,
>>>> > > >> >> > > > > > > entrypoints etc). i do not imagine a lot of
>>>> branching
>>>> > for
>>>> > > >> >> those -
>>>> > > >> >> > > we
>>>> > > >> >> > > > > > > should be able to release a new version of a
>>>> Dockerfile
>>>> > > (+
>>>> > > >> >> > related
>>>> > > >> >> > > > > files)
>>>> > > >> >> > > > > > > working with nearly any earlier Airflow release,
>>>> so
>>>> > > CalVer
>>>> > > >> >> seems
>>>> > > >> >> > > > like a
>>>> > > >> >> > > > > > > good choice.
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > 4) Image versioning becomes a bit most complex
>>>> because
>>>> > > the
>>>> > > >> >> image
>>>> > > >> >> > > tag
>>>> > > >> >> > > > is
>>>> > > >> >> > > > > > > always combination of:
>>>> > > >> >> > > > > > > * Dockerfile (+ related files) version
>>>> > > >> >> > > > > > > * Airflow Version
>>>> > > >> >> > > > > > > * Python Version
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > An example versioning I can imagine:
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > *Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 -
>>>> > patch
>>>> > > >> level
>>>> > > >> >> > (if
>>>> > > >> >> > > we
>>>> > > >> >> > > > > > > decide to have patches).
>>>> > > >> >> > > > > > > *Dockerfile: *2020.07.12, 2020.08.20...... ->
>>>> depending
>>>> > > >> >> when we
>>>> > > >> >> > > > release
>>>> > > >> >> > > > > > > them
>>>> > > >> >> > > > > > > *Helm Chart*: 2020.07.10, 2020.08.09 ...... Each
>>>> Helm
>>>> > > Chart
>>>> > > >> >> has a
>>>> > > >> >> > > > > minimum
>>>> > > >> >> > > > > > > version of both Dockerfile and Airflow versions it
>>>> > works
>>>> > > >> with.
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > *Example Docker Image tags:*
>>>> > > >> >> > > > > > >
>>>> > > >> apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > WDYT?
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > J,
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik <
>>>> > > >> >> kaxilnaik@gmail.com>
>>>> > > >> >> > > > > wrote:
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > > I think we should have "separate repos for
>>>> > development"
>>>> > > >> too.
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > > 3 Repos in total:
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > > 1) apache/airflow
>>>> > > >> >> > > > > > > > 2) apache/airflow-docker-image
>>>> > > >> >> > > > > > > > 3) apache/airflow-helm-chart
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > > (1) *apache/airflow* should use a pinned stable
>>>> > version
>>>> > > >> of
>>>> > > >> >> > > Airflow
>>>> > > >> >> > > > > Helm
>>>> > > >> >> > > > > > > > chart to run Kubernetes tests
>>>> > > >> >> > > > > > > > (2) *apache/airflow* already has
>>>> *Dockerfile.ci* file
>>>> > > >> which
>>>> > > >> >> it
>>>> > > >> >> > > can
>>>> > > >> >> > > > > use to
>>>> > > >> >> > > > > > > > run airflow tests on docker images.
>>>> > > >> >> > > > > > > > (3) *apache/airflow-docker-image *should use the
>>>> > latest
>>>> > > >> >> > available
>>>> > > >> >> > > > > stable
>>>> > > >> >> > > > > > > > version of airflow
>>>> > > >> >> > > > > > > > (4) *apache/airflow-helm-chart *should use the
>>>> latest
>>>> > > >> >> available
>>>> > > >> >> > > > > stable
>>>> > > >> >> > > > > > > > version of airflow
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > > Having such split also makes some updates more
>>>> > > >> >> difficult -
>>>> > > >> >> for
>>>> > > >> >> > > > > example if
>>>> > > >> >> > > > > > > > > we add new "extra" to Airflow that will
>>>> require to
>>>> > > >> install
>>>> > > >> >> > > "apt"
>>>> > > >> >> > > > > > > > dependency
>>>> > > >> >> > > > > > > > > in Dockerfile, we will have to split it into
>>>> first
>>>> > > >> adding
>>>> > > >> >> the
>>>> > > >> >> > > > > > > dependency
>>>> > > >> >> > > > > > > > to
>>>> > > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add
>>>> the
>>>> > > >> >> extra to
>>>> > > >> >> > > > airflow
>>>> > > >> >> > > > > with
>>>> > > >> >> > > > > > > > > setup.py.
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > > Adding a new extra to setup.py would not (and
>>>> should
>>>> > > not)
>>>> > > >> >> > impact
>>>> > > >> >> > > > the
>>>> > > >> >> > > > > > > > development of *apache/airflow-docker-image*
>>>> > > >> >> > > > > > > > Once an RC is cut for apache/airflow or after a
>>>> new
>>>> > > >> version
>>>> > > >> >> is
>>>> > > >> >> > > > > released
>>>> > > >> >> > > > > > > for
>>>> > > >> >> > > > > > > > apache/airflow, we can work on supporting the
>>>> new
>>>> > > airflow
>>>> > > >> >> > version
>>>> > > >> >> > > > in
>>>> > > >> >> > > > > the
>>>> > > >> >> > > > > > > > Production Docker Image.
>>>> > > >> >> > > > > > > > While doing that we can add all the libraries
>>>> that
>>>> > are
>>>> > > >> needed
>>>> > > >> >> > by
>>>> > > >> >> > > > the
>>>> > > >> >> > > > > new
>>>> > > >> >> > > > > > > > Airflow Version and we will have a clean commit
>>>> > history
>>>> > > >> and
>>>> > > >> >> > > > > changelog for
>>>> > > >> >> > > > > > > > Docker image.
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > > We definitely do not need to work parallelly on
>>>> both
>>>> > > the
>>>> > > >> >> repos.
>>>> > > >> >> > > By
>>>> > > >> >> > > > > doing
>>>> > > >> >> > > > > > > > development in a separate repo we keep
>>>> consistent
>>>> > > >> "source"
>>>> > > >> >> > files
>>>> > > >> >> > > > and
>>>> > > >> >> > > > > we
>>>> > > >> >> > > > > > > can
>>>> > > >> >> > > > > > > > release each artifact with a
>>>> > > >> >> > > > > > > > separate cadence. If someone discovers bug in
>>>> newly
>>>> > > >> released
>>>> > > >> >> > > > > Dockerimage,
>>>> > > >> >> > > > > > > > we should be easily able to cut out a new
>>>> release
>>>> > with
>>>> > > >> the
>>>> > > >> >> > patch
>>>> > > >> >> > > > > without
>>>> > > >> >> > > > > > > > worrying about how development is
>>>> > > >> >> > > > > > > > going in the apache/airflow repo.
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > > *Apache Flink & Apache CoucheDB *does it in the
>>>> > similar
>>>> > > >> >> manner:
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > > https://github.com/apache/flink &
>>>> > > >> >> > > > > https://github.com/apache/flink-docker
>>>> > > >> >> > > > > > > > https://github.com/apache/couchdb &
>>>> > > >> >> > > > > > > > https://github.com/apache/couchdb-docker
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > > Regards,
>>>> > > >> >> > > > > > > > Kaxil
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > > On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk <
>>>> > > >> >> > > > > Jarek.Potiuk@polidea.com>
>>>> > > >> >> > > > > > > > wrote:
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > > > > I do not think it's only the question of
>>>> Mono/Multi
>>>> > > >> repos.
>>>> > > >> >> > > While
>>>> > > >> >> > > > I
>>>> > > >> >> > > > > > > > clearly
>>>> > > >> >> > > > > > > > > see the benefit of separate repos I also see
>>>> some
>>>> > > >> >> drawbacks.
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > And if it bothers others, I am happy to
>>>> follow the
>>>> > > >> >> majority.
>>>> > > >> >> > If
>>>> > > >> >> > > > we
>>>> > > >> >> > > > > > > think
>>>> > > >> >> > > > > > > > > that a bit more complexity in testing
>>>> justifies
>>>> > > >> separating
>>>> > > >> >> > > those
>>>> > > >> >> > > > > three
>>>> > > >> >> > > > > > > > > completely and having more "clean"- it's also
>>>> > > >> >> workable but
>>>> > > >> >> > IMHO
>>>> > > >> >> > > > > > > > introduces
>>>> > > >> >> > > > > > > > > certain complexity in development.
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > However I think this is not 0/1 a kind of
>>>> Hybrid
>>>> > > >> approach
>>>> > > >> >> in
>>>> > > >> >> > my
>>>> > > >> >> > > > > opinion
>>>> > > >> >> > > > > > > > > might be best of both worlds - development and
>>>> > > >> >> releases .
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > Let me explain what I mean by "Hybrid":
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > I think we definitely should have separate
>>>> > > >> >> repositories to
>>>> > > >> >> > > > release
>>>> > > >> >> > > > > > > those
>>>> > > >> >> > > > > > > > > artifacts and I think there is no doubt about
>>>> it:
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > * airflow (apache/airflow)
>>>> > > >> >> > > > > > > > > * prod docker image (apache/airflow-docker)
>>>> > > >> >> > > > > > > > > * helm chart (apache/airflow-helm)
>>>> > > >> >> > > > > > > > > * api clients (we already have separate repos
>>>> for
>>>> > > >> those)
>>>> > > >> >> > > > > > > > > (apache/airflow-client-*)
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > I think the only question is where we develop
>>>> all
>>>> > > those
>>>> > > >> >> > > (develop
>>>> > > >> >> > > > !=
>>>> > > >> >> > > > > > > > > release). There are certain benefits of
>>>> having a
>>>> > > single
>>>> > > >> >> > > "master"
>>>> > > >> >> > > > > (let's
>>>> > > >> >> > > > > > > > > call it "development" further) for all those
>>>> > > artifacts.
>>>> > > >> >> > > Currently
>>>> > > >> >> > > > > the
>>>> > > >> >> > > > > > > > > "development" version for all of those is in
>>>> one
>>>> > repo
>>>> > > >> >> - and
>>>> > > >> >> > > while
>>>> > > >> >> > > > > > > > > developing one depends on the other, we also
>>>> test
>>>> > all
>>>> > > >> of
>>>> > > >> >> > those
>>>> > > >> >> > > > > together
>>>> > > >> >> > > > > > > > and
>>>> > > >> >> > > > > > > > > this means that "current best" set of airflow
>>>> > sources
>>>> > > >> >> > > (including
>>>> > > >> >> > > > > > > > > dependencies in setup.py), Dockerfile and Helm
>>>> > chart
>>>> > > >> work.
>>>> > > >> >> > This
>>>> > > >> >> > > > > means
>>>> > > >> >> > > > > > > for
>>>> > > >> >> > > > > > > > > example that you will not be able to break
>>>> the Helm
>>>> > > >> Chart
>>>> > > >> >> by
>>>> > > >> >> > > > > changing
>>>> > > >> >> > > > > > > > > anything that the helm chart depends on in
>>>> airflow.
>>>> > > For
>>>> > > >> >> > example
>>>> > > >> >> > > > if
>>>> > > >> >> > > > > you
>>>> > > >> >> > > > > > > > > change "airflow webserver" into "airflow
>>>> server"
>>>> > the
>>>> > > >> >> current
>>>> > > >> >> > > helm
>>>> > > >> >> > > > > chart
>>>> > > >> >> > > > > > > > > will break. Similarly if you change
>>>> entrypoint,sh
>>>> > in
>>>> > > >> Docker
>>>> > > >> >> > > image
>>>> > > >> >> > > > > in a
>>>> > > >> >> > > > > > > > way
>>>> > > >> >> > > > > > > > > that is not compatible with Helm chart, we
>>>> will not
>>>> > > let
>>>> > > >> >> that
>>>> > > >> >> > > > > happen -
>>>> > > >> >> > > > > > > the
>>>> > > >> >> > > > > > > > > CI tests will break if either of those
>>>> changes in
>>>> > an
>>>> > > >> >> > > incompatible
>>>> > > >> >> > > > > way.
>>>> > > >> >> > > > > > > > And
>>>> > > >> >> > > > > > > > > we can have dependencies in any direction
>>>> between
>>>> > > those
>>>> > > >> >> > three.
>>>> > > >> >> > > > > When we
>>>> > > >> >> > > > > > > > see
>>>> > > >> >> > > > > > > > > a commit break either of the three - we can
>>>> make a
>>>> > > >> decision
>>>> > > >> >> > > about
>>>> > > >> >> > > > > what
>>>> > > >> >> > > > > > > to
>>>> > > >> >> > > > > > > > > do - either accept and document the
>>>> incompatibility
>>>> > > >> >> or fix
>>>> > > >> >> > it.
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > Of course keeping that property (testing it
>>>> all
>>>> > > >> together)
>>>> > > >> >> is
>>>> > > >> >> > > also
>>>> > > >> >> > > > > > > > possible
>>>> > > >> >> > > > > > > > > if they are in completely separate repos.
>>>> There are
>>>> > > >> several
>>>> > > >> >> > > > > > > > > cross-dependencies - Docker image building
>>>> depends
>>>> > on
>>>> > > >> >> > > > dependencies
>>>> > > >> >> > > > > in
>>>> > > >> >> > > > > > > > > setup.py for example, you cannot build Docker
>>>> image
>>>> > > >> from
>>>> > > >> >> only
>>>> > > >> >> > > > > > > Dockerfile
>>>> > > >> >> > > > > > > > > without the sources of airflow nor build and
>>>> test
>>>> > > helm
>>>> > > >> >> charts
>>>> > > >> >> > > > > without
>>>> > > >> >> > > > > > > the
>>>> > > >> >> > > > > > > > > image (and sources - because that's where the
>>>> > current
>>>> > > >> >> > > kubernetes
>>>> > > >> >> > > > > tests
>>>> > > >> >> > > > > > > > > are). If we want to continue doing it for
>>>> both Helm
>>>> > > and
>>>> > > >> >> > > > > Dockerfile, we
>>>> > > >> >> > > > > > > > > would have to basically check out the latest
>>>> > sources
>>>> > > of
>>>> > > >> >> > Airflow
>>>> > > >> >> > > > > and run
>>>> > > >> >> > > > > > > > the
>>>> > > >> >> > > > > > > > > CI tests before merging any Docker or Helm
>>>> Chart
>>>> > > >> changes
>>>> > > >> >> and
>>>> > > >> >> > > the
>>>> > > >> >> > > > > > > > opposite -
>>>> > > >> >> > > > > > > > > we will have to download Dockerfile/Helm
>>>> chart and
>>>> > > >> build
>>>> > > >> >> > > > > image/install
>>>> > > >> >> > > > > > > > Helm
>>>> > > >> >> > > > > > > > > chart when we are running CI tests for
>>>> Airflow.
>>>> > This
>>>> > > is
>>>> > > >> >> > > possible
>>>> > > >> >> > > > > and we
>>>> > > >> >> > > > > > > > > could do it, but it adds complexity to the
>>>> build/CI
>>>> > > >> >> process.
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > Having such split also makes some updates more
>>>> > > >> >> difficult -
>>>> > > >> >> > for
>>>> > > >> >> > > > > example
>>>> > > >> >> > > > > > > if
>>>> > > >> >> > > > > > > > > we add new "extra" to Airflow that will
>>>> require to
>>>> > > >> install
>>>> > > >> >> > > "apt"
>>>> > > >> >> > > > > > > > dependency
>>>> > > >> >> > > > > > > > > in Dockerfile, we will have to split it into
>>>> first
>>>> > > >> adding
>>>> > > >> >> the
>>>> > > >> >> > > > > > > dependency
>>>> > > >> >> > > > > > > > to
>>>> > > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add
>>>> the
>>>> > > >> >> extra to
>>>> > > >> >> > > > airflow
>>>> > > >> >> > > > > with
>>>> > > >> >> > > > > > > > > setup.py. This makes it quite difficult to
>>>> test it
>>>> > > >> together
>>>> > > >> >> > > > though
>>>> > > >> >> > > > > (the
>>>> > > >> >> > > > > > > > > Dockerfile change can only be tested fully
>>>> after
>>>> > > >> >> merging it
>>>> > > >> >> > to
>>>> > > >> >> > > > > master).
>>>> > > >> >> > > > > > > > Not
>>>> > > >> >> > > > > > > > > mentioning complexity of managing different
>>>> > versions
>>>> > > >> >> - your
>>>> > > >> >> > > local
>>>> > > >> >> > > > > > > > > development Dockerfile version vs sources of
>>>> > Airflow
>>>> > > >> for
>>>> > > >> >> > > example.
>>>> > > >> >> > > > > > > Imagine
>>>> > > >> >> > > > > > > > > switching between branches where you add two
>>>> > > >> >> different apt
>>>> > > >> >> > > > > dependencies
>>>> > > >> >> > > > > > > > to
>>>> > > >> >> > > > > > > > > the Dockerfile. There are more similar
>>>> scenarios I
>>>> > > can
>>>> > > >> >> > imagine
>>>> > > >> >> > > -
>>>> > > >> >> > > > > > > > especially
>>>> > > >> >> > > > > > > > > for parallel changes in those repos.
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > This is of course doable to keep them
>>>> separate, but
>>>> > > >> >> it is
>>>> > > >> >> > > quite a
>>>> > > >> >> > > > > bit
>>>> > > >> >> > > > > > > > more
>>>> > > >> >> > > > > > > > > complex to set up (especially for a consistent
>>>> > > >> development
>>>> > > >> >> > > > > environment)
>>>> > > >> >> > > > > > > > > when you have separate repos and prevent
>>>> > > cross-breaking
>>>> > > >> >> > changes
>>>> > > >> >> > > > > might
>>>> > > >> >> > > > > > > be
>>>> > > >> >> > > > > > > > > more difficult.
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > I believe that the best way is to continue
>>>> > developing
>>>> > > >> >> > airflow +
>>>> > > >> >> > > > > image +
>>>> > > >> >> > > > > > > > > chart in one repo - airflow, but release them
>>>> from
>>>> > > >> those
>>>> > > >> >> > > separate
>>>> > > >> >> > > > > > > repos.
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > Airflow source release does not have to
>>>> contain
>>>> > > neither
>>>> > > >> >> > chart,
>>>> > > >> >> > > > nor
>>>> > > >> >> > > > > > > image.
>>>> > > >> >> > > > > > > > > And even if it contains sources for those,
>>>> they are
>>>> > > >> >> not the
>>>> > > >> >> > > final
>>>> > > >> >> > > > > > > > > "artifacts" (installable image and
>>>> installable helm
>>>> > > >> chart).
>>>> > > >> >> > > > > > > > > Whenever we decide to release either of them
>>>> - we
>>>> > > >> >> test it
>>>> > > >> >> in
>>>> > > >> >> > > > > > > > "development".
>>>> > > >> >> > > > > > > > > Then only when it is tested, we copy the
>>>> sources to
>>>> > > >> those
>>>> > > >> >> > > > separate
>>>> > > >> >> > > > > > > repos
>>>> > > >> >> > > > > > > > > and release them.
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > With git - we can even do it very easily while
>>>> > > >> preserving
>>>> > > >> >> > > history
>>>> > > >> >> > > > > of
>>>> > > >> >> > > > > > > > > commits easily (been there, done that). And
>>>> then we
>>>> > > >> could
>>>> > > >> >> > > release
>>>> > > >> >> > > > > Helm
>>>> > > >> >> > > > > > > > and
>>>> > > >> >> > > > > > > > > Docker image separately based on the commits
>>>> and
>>>> > tags
>>>> > > >> in
>>>> > > >> >> > those
>>>> > > >> >> > > > > separate
>>>> > > >> >> > > > > > > > > repositories.
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > I agree that separate repos is a more "clean"
>>>> > > approach.
>>>> > > >> >> But I
>>>> > > >> >> > > > > think it
>>>> > > >> >> > > > > > > is
>>>> > > >> >> > > > > > > > > less convenient for development consistency.
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > J,
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <
>>>> > > >> >> > kaxilnaik@gmail.com
>>>> > > >> >> > > >
>>>> > > >> >> > > > > wrote:
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > > Forgot to mention, having them in separate
>>>> repo
>>>> > > also
>>>> > > >> >> helps
>>>> > > >> >> > in
>>>> > > >> >> > > > > better
>>>> > > >> >> > > > > > > > > > managing each individual artifacts.
>>>> > > >> >> > > > > > > > > >
>>>> > > >> >> > > > > > > > > > Each repo would have a separate Github Issue
>>>> > where
>>>> > > >> >> we can
>>>> > > >> >> > > track
>>>> > > >> >> > > > > the
>>>> > > >> >> > > > > > > > issue
>>>> > > >> >> > > > > > > > > > specific to Helm chart or Dockerfile.
>>>> > > >> >> > > > > > > > > >
>>>> > > >> >> > > > > > > > > > Regards,
>>>> > > >> >> > > > > > > > > > Kaxil
>>>> > > >> >> > > > > > > > > >
>>>> > > >> >> > > > > > > > > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <
>>>> > > >> >> > > kaxilnaik@gmail.com
>>>> > > >> >> > > > >
>>>> > > >> >> > > > > > > wrote:
>>>> > > >> >> > > > > > > > > >
>>>> > > >> >> > > > > > > > > > > The PMC also needs to agree if we want
>>>> separate
>>>> > > >> VOTING
>>>> > > >> >> > for
>>>> > > >> >> > > > > Docker
>>>> > > >> >> > > > > > > > Image
>>>> > > >> >> > > > > > > > > > > and Helm chart, I think we do.
>>>> > > >> >> > > > > > > > > > >
>>>> > > >> >> > > > > > > > > > > Regards,
>>>> > > >> >> > > > > > > > > > > Kaxil
>>>> > > >> >> > > > > > > > > > >
>>>> > > >> >> > > > > > > > > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik
>>>> <
>>>> > > >> >> > > > kaxilnaik@gmail.com
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > > > wrote:
>>>> > > >> >> > > > > > > > > > >
>>>> > > >> >> > > > > > > > > > >> Hi all,
>>>> > > >> >> > > > > > > > > > >>
>>>> > > >> >> > > > > > > > > > >> What do you all think about having
>>>> Dockerfile
>>>> > > >> >> and Helm
>>>> > > >> >> > > chart
>>>> > > >> >> > > > > in
>>>> > > >> >> > > > > > > the
>>>> > > >> >> > > > > > > > > same
>>>> > > >> >> > > > > > > > > > >> "Airflow" Repo vs separate?
>>>> > > >> >> > > > > > > > > > >>
>>>> > > >> >> > > > > > > > > > >> I feel having a separate repo for Airflow
>>>> > > >> Dockerfile
>>>> > > >> >> and
>>>> > > >> >> > > > Helm
>>>> > > >> >> > > > > > > chart
>>>> > > >> >> > > > > > > > > have
>>>> > > >> >> > > > > > > > > > >> more benefits like easy to track changes
>>>> (via
>>>> > > >> >> > Changelog),
>>>> > > >> >> > > > > easy for
>>>> > > >> >> > > > > > > > new
>>>> > > >> >> > > > > > > > > > >> contributors, separate release cadence.
>>>> > > >> >> > > > > > > > > > >>
>>>> > > >> >> > > > > > > > > > >> Currently, docker file and Helm Chart are
>>>> > inside
>>>> > > >> the
>>>> > > >> >> > same
>>>> > > >> >> > > > > repo and
>>>> > > >> >> > > > > > > > > when
>>>> > > >> >> > > > > > > > > > >> we release changelog for a new Airflow
>>>> > version,
>>>> > > it
>>>> > > >> >> would
>>>> > > >> >> > > > > include
>>>> > > >> >> > > > > > > all
>>>> > > >> >> > > > > > > > > > >> changes (Airflow + Dockerfile + Helm
>>>> chart)
>>>> > > >> >> which I
>>>> > > >> >> > think
>>>> > > >> >> > > is
>>>> > > >> >> > > > > not
>>>> > > >> >> > > > > > > > that
>>>> > > >> >> > > > > > > > > > great.
>>>> > > >> >> > > > > > > > > > >>
>>>> > > >> >> > > > > > > > > > >> Also having them all inside a single repo
>>>> > means
>>>> > > >> >> changes
>>>> > > >> >> > in
>>>> > > >> >> > > > > Helm
>>>> > > >> >> > > > > > > > Chart
>>>> > > >> >> > > > > > > > > > and
>>>> > > >> >> > > > > > > > > > >> Dockerfile can block Airflow release. We
>>>> could
>>>> > > use
>>>> > > >> >> > stable
>>>> > > >> >> > > > Helm
>>>> > > >> >> > > > > > > Chart
>>>> > > >> >> > > > > > > > > > >> version and Dockerfile version to test
>>>> Airflow
>>>> > > >> >> so that
>>>> > > >> >> > > they
>>>> > > >> >> > > > > are
>>>> > > >> >> > > > > > > > > > blockers to
>>>> > > >> >> > > > > > > > > > >> release too.
>>>> > > >> >> > > > > > > > > > >>
>>>> > > >> >> > > > > > > > > > >> Happy to hear the thoughts from the
>>>> community.
>>>> > > >> >> > > > > > > > > > >>
>>>> > > >> >> > > > > > > > > > >> Regards,
>>>> > > >> >> > > > > > > > > > >> Kaxil
>>>> > > >> >> > > > > > > > > > >>
>>>> > > >> >> > > > > > > > > > >
>>>> > > >> >> > > > > > > > > >
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > --
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > Jarek Potiuk
>>>> > > >> >> > > > > > > > > Polidea <https://www.polidea.com/> |
>>>> Principal
>>>> > > >> Software
>>>> > > >> >> > > Engineer
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > > > M: +48 660 796 129 <+48660796129>
>>>> > > >> >> > > > > > > > > [image: Polidea] <https://www.polidea.com/>
>>>> > > >> >> > > > > > > > >
>>>> > > >> >> > > > > > > >
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > --
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > Jarek Potiuk
>>>> > > >> >> > > > > > > Polidea <https://www.polidea.com/> | Principal
>>>> > Software
>>>> > > >> >> Engineer
>>>> > > >> >> > > > > > >
>>>> > > >> >> > > > > > > M: +48 660 796 129 <+48660796129>
>>>> > > >> >> > > > > > > [image: Polidea] <https://www.polidea.com/>
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > --
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > Jarek Potiuk
>>>> > > >> >> > > > > > Polidea <https://www.polidea.com/> | Principal
>>>> Software
>>>> > > >> Engineer
>>>> > > >> >> > > > > >
>>>> > > >> >> > > > > > M: +48 660 796 129 <+48660796129>
>>>> > > >> >> > > > > > [image: Polidea] <https://www.polidea.com/>
>>>> > > >> >> > > >
>>>> > > >> >> > > >
>>>> > > >> >> > > >
>>>> > > >> >> > > > --
>>>> > > >> >> > > >
>>>> > > >> >> > > > Jarek Potiuk
>>>> > > >> >> > > > Polidea <https://www.polidea.com/> | Principal Software
>>>> > > Engineer
>>>> > > >> >> > > >
>>>> > > >> >> > > > M: +48 660 796 129 <+48660796129>
>>>> > > >> >> > > > [image: Polidea] <https://www.polidea.com/>
>>>> > > >> >> > > >
>>>> > > >> >> > >
>>>> > > >> >> >
>>>> > > >> >>
>>>> > > >> >>
>>>> > > >> >> --
>>>> > > >> >>
>>>> > > >> >> Jarek Potiuk
>>>> > > >> >> Polidea <https://www.polidea.com/> | Principal Software
>>>> Engineer
>>>> > > >> >>
>>>> > > >> >> M: +48 660 796 129 <+48660796129>
>>>> > > >> >> [image: Polidea] <https://www.polidea.com/>
>>>> > > >> >>
>>>> > > >> >
>>>> > > >>
>>>> > > >
>>>> > > >
>>>> > > > --
>>>> > > >
>>>> > > > Jarek Potiuk
>>>> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>> > > >
>>>> > > > M: +48 660 796 129 <+48660796129>
>>>> > > > [image: Polidea] <https://www.polidea.com/>
>>>> > > >
>>>> > > >
>>>> > >
>>>> > > --
>>>> > >
>>>> > > Jarek Potiuk
>>>> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>> > >
>>>> > > M: +48 660 796 129 <+48660796129>
>>>> > > [image: Polidea] <https://www.polidea.com/>
>>>> > >
>>>> >
>>>>
>>>>
>>>> --
>>>>
>>>> Jarek Potiuk
>>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>>
>>>> M: +48 660 796 129 <+48660796129>
>>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>>
>>>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Separate Repo vs MonoRepo for Dockerfile & Helm Chart

Posted by Daniel Imberman <da...@gmail.com>.
I am all for this. This is how kubernetes does it and it has worked out really well for them.

On Sun, Oct 25, 2020 at 10:23 PM, Jarek Potiuk <Ja...@polidea.com> wrote:
Yep, that would be nice. Agree that this is not obvious where some files come from.
Agree this could be done if everyone thinks it's a good idea. This would be perfectly doable, we could even make it works with the whole history maintained (we'd just need to include historical paths in the script).
And if we make it in time before 1.10.13, we could even release it within 1.10.13.
J

On Sun, Oct 25, 2020 at 10:03 PM Kamil Breguła < kamil.bregula@polidea.com [kamil.bregula@polidea.com] > wrote:
I took a quick look and I like the overall concept, but I'm just wondering if it will be clear enough for users. Currently, these scripts copy different files from different directories and the mapping of the source to the destination is written in the scripts. This will make it difficult to contribute to this "sub-project". In my opinion, if we want to create new repositories from some files, we should only do it for one directory. If this directory has dependencies, we should try to break them down. The end-user should not get the impression that they are in contact with the copied repository at the first glance. Otherwise, we will not achieve our primary goal - to facilitate end-user use.

In this case, it means that we should create a new directory in apache/airflow named "prod-docker-image" or similar and move to it the necessary Dockerfiles, documentation, scripts, and all other assets. In particular, this directory should contain README.md which actually describes the contents of that directory.

A good example is /chart directory. It only has one dependency which is not is "/chart" directory - the "Contributing" section in README.md refers to the file in the root directory of the repository. This link will stop working if we create a new repository from the entire directory. It will be trivial to fix.

On Sun, Oct 25, 2020 at 9:18 PM Jarek Potiuk < Jarek.Potiuk@polidea.com [Jarek.Potiuk@polidea.com] > wrote:
Hello Everyone,
I would like to come back to the discussion as I have *JUST* implemented the solution (very simple but 100% working) to this monorepo vs. separate repos.
You can take a look at this repo of mine: https://github.com/potiuk/airflow-docker [https://github.com/potiuk/airflow-docker] . It is very simple and works like a charm. I implemented it to solve the issue https://github.com/apache/airflow/issues/11740 [https://github.com/apache/airflow/issues/11740]
This is a separate repo that people can use to have a separate "read-only" repository that **only** keeps our Dockerfile-related stuff - including the full history of changes related (and only those), full traceability, and incremental, automated synchronization from our "airflow" repo.

I can - any time - set it up as "apache/airflow-docker" and get it to synchronize every day or every hour.
Here, how it works:
* The "master" and "v1-10-stable" branches are filtered to only contain files that are needed to build Prod Docker image * We keep history of all relevant commits in those branches * In the "main" branch we only keep the "scheduled" Github Actions workflow that does the synchronization and README.md which explains what needs to be done to build the docker image * I am using the excellent "git-filter-repo" tool which does the job really well and fast. Git-filter-repo is recommended by Git maintainers over the old, slow and much worse built-in git-filter-branch: https://git-scm.com/docs/git-filter-branch#_warning [https://git-scm.com/docs/git-filter-branch#_warning] * the jobs to synchronize the repo takes 1m30 s to run - it is rather fast despite analyzing 13500 commits :) * it runs incrementally - just adding new commits when they appear * it is very simple, few lines script + few steps in Github Action to checkout/push the right branches * we keep all the commit mapping in the repo as well, so we have 1-1 relationship between the commits in the "docker repo" and the original ones in Airflow repo * synchronization is 1-way - airflow -> airlfow-docker * we can use a very similar approach for synchronizing: * Helm chart * Open API clients * other stuff
It also follows our source release strategy - it has the same "properties" as our main repo - so it is merely a "convenience" way of accessing the Docker customization options, but the same functionality is available in our officially released sources.
Do you think we should turn it into the "apache/airflow-docker" repo?
J.


On Sun, Jul 5, 2020 at 8:12 PM Daniel Imberman < daniel.imberman@gmail.com [daniel.imberman@gmail.com] > wrote:
Worth noting that git has the ability to cherry-pick only specific directories. If we keep all of helm + tests in one directory, docker + tests in another, and core + tests in a third directory it would be pretty simple to automate splitting them.

https://stackoverflow.com/questions/19821749/git-cherry-pick-or-merge-specific-directory-from-another-branch [https://stackoverflow.com/questions/19821749/git-cherry-pick-or-merge-specific-directory-from-another-branch]

via Newton Mail [ https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2 [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2] ]
On Sun, Jul 5, 2020 at 9:57 AM, Daniel Imberman < daniel.imberman@gmail.com [daniel.imberman@gmail.com] > wrote:
I can’t agree with this enough :). I think writing a few bots to separate out sections will be MUCH easier in the long run than maintaining multiple repos. Will also prevent the difficulty of setting up a proper dev environment for new contributors.
via Newton Mail [ https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2 [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2] ]
On Sun, Jul 5, 2020 at 9:53 AM, Jarek Potiuk < Jarek.Potiuk@polidea.com [Jarek.Potiuk@polidea.com] > wrote:
Yeah. I think that the "monorepo" is the only way for now - until (or if)
we reach the size (and maturity) that different teams take care of the
different projects. Which might even not happen.

But I would love to try the separate repos to publish/release still (maybe
not immediately, but it is a nice concept). I think it should be rather
easy (I will try it on my own repo first). Also, I think it has another
advantage - those separate repos might actually run other kinds of tests -
for example, to test if there is "everything" in that repo to release it
(for example build helm chart) and whether there are no accidental use of
stuff from outside of those dirs.

I already thought about how to do it - it should be rather easy. Of course
- like most of the time - there is a ready-to-use git command doing it for
us. We simply need a bot running for that rep executing a variant of this
command:
https://docs.github.com/en/github/using-git/splitting-a-subfolder-out-into-a-new-repository [https://docs.github.com/en/github/using-git/splitting-a-subfolder-out-into-a-new-repository]
(it
should only take commits from the commit merged last time). So level of
automation here is rather minimal.

And if have those repos and at some point of time we decide to split
eventually - we will have already repos with all history as a starting
point.

J.







J.


On Sun, Jul 5, 2020 at 4:42 PM Kaxil Naik < kaxilnaik@gmail.com [kaxilnaik@gmail.com] > wrote:

> Hmm.. I agree the git-sync would have been a difficult one to solve if we
> had separate repositories.
>
> Well, in that case, the mono repo approach (like we have now) indeed makes
> more sense.
>
> Regarding the Kubernetes approach, I feel the ones in staging (
> https://github.com/kubernetes/kubernetes/tree/master/staging [https://github.com/kubernetes/kubernetes/tree/master/staging] ) are part of
> the actual product itself but in our case we were discussing between Helm
> chart and Dockerfile which are not actually part of the product. And we
> will need a good deal of automation if we go down that route.
> I think the plain mono-repo approach is better than that one.
>
> Regards,
> Kaxil
>
>
> On Sun, Jul 5, 2020 at 9:19 AM Jarek Potiuk < Jarek.Potiuk@polidea.com [Jarek.Potiuk@polidea.com] >
> wrote:
>
> > And one more perfect illustration of what I am talking about.
> >
> > A very good thing just happened. I was running the PR while writing the
> > email (long time as you might imagine) and the new K8S tests with 1.10.11
> > just failed. https://github.com/apache/airflow/pull/9663 [https://github.com/apache/airflow/pull/9663]
> >
> > If had released the helm chart before we would've clear (small)
> > incompatibility here. And by seeing the test failing we could make
> decision
> > what to do:
> >
> > 1) fix it differently
> > 2) document it as a breaking Helm change, "1.10.12+ image" and make test
> > work in both cases
> > 3) revert ...
> >
> > But at least we have na early warning that something is wrong. This is
> the
> > clear value of running the tests at every commit.
> >
> > J.
> >
> > On Sun, Jul 5, 2020 at 10:08 AM Jarek Potiuk < Jarek.Potiuk@polidea.com [Jarek.Potiuk@polidea.com] >
> > wrote:
> >
> > > I just have another example of a case where splitting the repos and
> using
> > > only "released versions" across repositories might be a complete
> overkill
> > > when it comes to development complexity.
> > >
> > > We have this change from Aneesh:
> > > https://github.com/apache/airflow/pull/9371 [https://github.com/apache/airflow/pull/9371] about adding a git-sync
> > > option to the helm chart.
> > >
> > > That's a new feature, but we would like to test both 1.10 and the
> master
> > > version of KubernetesExecutor with that. It should work for both of
> them
> > -
> > > there is no coupling/dependency in the "airflow' code for it.
> > >
> > > However, there is a strong coupling in the tests. We have the
> > > "kubernetes_tests" running tests using all three: chart, production
> > docker,
> > > and Airflow, Those tests will have to be likely adapted to work with
> the
> > > new git-sync option. They were disabled previously as we had problems
> > with
> > > them before the helm chart was used for tests but we can turn them back
> > on
> > > now when git-sync is added to the helm chart. Those tests are part of
> > > airflow test suite and we discussed with Daniel that they should stay
> > there
> > > - those tests are importing airflow code, they are using latest example
> > > dags which are also in the airflow code.
> > >
> > > So we have two ways how we can develop this -
> > > A) monorepo (current)
> > > B) separate repos.
> > >
> > > Just to remind - he goal is that our change is tested against:
> > >
> > > 1) Released Airflow version (say 1.10.11).
> > > 2) Development airflow version (master - soon possibly development)
> > > 3) Development docker image built with either "development" or
> "1.10.11"
> > > (we can release the Docker image for 1.10.11 independently from the
> > current
> > > development HEAD). The docker image is supposed to work with any
> version
> > of
> > > airflow
> > >
> > > In the case of A) Monorepo we have all that as a given.
> > >
> > > I just sent this really small PR that should do the job:
> > > https://github.com/apache/airflow/pull/9663 [https://github.com/apache/airflow/pull/9663] . What it does, it takes
> the
> > > latest "development" docker image, "development" chart, bakes in the
> > latest
> > > "example dags" from "development branch". The image uses either
> > > "development" or released (from PyPI) "1.10.11" Airflow version - and
> run
> > > the "development" tests against it. This is exactly what we want. If we
> > add
> > > new features to the helm chart, the Kubernetes tests will have to be
> > > updated to include that - and this will happen in the airflow
> > "development"
> > > branch. The REALLY good thing in it - since we are running those tests
> in
> > > CI build of airflow development branch - we prevent anyone from making
> > > breaking changes. It is a given that both - the "development" of
> airflow
> > > and the "1.10.11" version of airflow will continue to work with the
> image
> > > and chart.
> > >
> > >
> > > In the case of B) where we split the repos:
> > >
> > > We have to decide where to keep the "kubernetes_tests" - should they be
> > in
> > > "Airflow" or in "Helm". They are testing BOTH so we can choose either
> > way.
> > > Together with Daniel we plan to expand those tests to cover all the
> > > different options we have in the Chart - testing all of it - Kubernetes
> > > Executor, Celery Executor running on Kubernetes, MySQL (once we add
> it),
> > > etc. etc. So we want to make sure we have a matrix of tests covering a
> > > number of deployment options. Those tests do not exist yet, and they
> will
> > > have to be written. In principle - they can be moved to the "Helm"
> > > repository. That's where they conceptually belong. However - there is a
> > > Huge value in running the tests in airflow "development" - the value is
> > > that no-one will be able to break the "development" airflow, because
> > those
> > > tests are run with every PR. I think we have no choice but to run those
> > > tests always in development. Otherwise, people maintaining the helm
> chart
> > > will have to fix the problems introduced by people changing Airflow
> > code. I
> > > think this is a pretty bad idea to allow that. So if we move those
> tests
> > to
> > > Helm Chart repo we have to figure out how to run those "kubernetes"
> tests
> > > in CI for every build. This is quite possible - by getting the latest
> > > master from helm chart and running the build, but it has several
> > problems:
> > >
> > > 1) The test code for CI will have to continue to stay in Airflow (to
> run
> > > CI builds) - this means that we already have coupling and some code
> > related
> > > to the execution of the helm tests has to be any way in Airflow.
> > >
> > > 2) Bigger problem. What happens if as "Airflow developer" you DO
> > introduce
> > > a change that breaks the helm chart? You will see a CI error and.....
> You
> > > will not know what to do. Do you involve people who maintain the helm
> > chart
> > > and wait for them? I think not. You should be able to reproduce the
> > problem
> > > locally and fix it yourself (maybe with the help of others - but you
> > should
> > > be able to fix your own commit). We would have to teach people how to
> > bring
> > > the docker image and helm chart code from the latest version and run
> the
> > > tests. We could do it automatically with Breeze (similarly as we do
> with
> > > other integrations - where we bring in Kerberos, Mongo, and a multitude
> > of
> > > others) without them even knowing it, but this might be fairly complex
> > and
> > > prone to errors. In Monorepo - we already have a simple way of
> > reproducing
> > > and running the tests locally and everything is in one place.
> > >
> > > 3) There is a chance that someone makes a change in Helm in parallel
> to a
> > > change in Airflow that breaks it. This could easily happen in the
> > "git-sync
> > > case" or when we add "MySQL" for example in the future. And there is no
> > way
> > > to prevent it.
> > >
> > > 4) If we only test against "released" Helm and Airflow (that was one of
> > > the suggestions), the problem is even bigger. How do you know that you
> do
> > > not break the currently "developed" helm chart? Or how do you know that
> > the
> > > currently "developed" helm chart works with latest Airflow release? If
> > you
> > > do not do those checks at the "commit" time, then you defer this to
> > > "release time" and only then you might find out that decisions you made
> > > during development have to be reverted. This is a very, very bad idea
> > IMHO
> > > again leading to the case that the release manager will have to fix
> > > problems introduced by others.
> > >
> > > J,
> > >
> > >
> > >
> > > On Fri, Jul 3, 2020 at 10:28 PM Ash Berlin-Taylor < ash@apache.org [ash@apache.org] >
> > wrote:
> > >
> > >> Monorepo FTW.
> > >>
> > >> Yes, it gets a little bit messier around release, but the approach of
> > >> automatically extracting out the commits (or parts of commits) to a
> > >> separate repo for releasing may be the solution to that problem
> > >>
> > >>
> > >> -ash
> > >>
> > >> On Jul 3 2020, at 7:51 pm, Kaxil Naik < kaxilnaik@gmail.com [kaxilnaik@gmail.com] > wrote:
> > >>
> > >> > I will take a look at the Kubernetes approach and get back to this
> > >> thread.
> > >> >
> > >> > We had a discussion with Daniel yesterday and we are both concerned
> > >> about
> > >> >> all the overhead for people like us who work on all three
> "entities"
> > >> >> at the
> > >> >> same time. Even just explaining how to work with Pull Requests and
> in
> > >> what
> > >> >> sequence those PRs would have to be opened and merged in case of
> > >> changes
> > >> >> that are spanning across several "entities" - was a challenge. I
> was
> > >> unable
> > >> >> to clearly explain the sequence and way of reviewing/merging the
> PRs
> > >> that
> > >> >> will have to be made if we have submodules. This is a bad sign as I
> > was
> > >> >> using submodules in the past and know how it works but I was unable
> > to
> > >> >> explain it clearly.
> > >> >
> > >> >
> > >> > We don't even need submodules tbh. We can just use Bash Script that
> > >> > pulls a
> > >> > pinned Helm Chart version.
> > >> > We only need Helm chart to run integration test for k8s (atleast for
> > >> now).
> > >> > We already use tons of Bash scripts.
> > >> >
> > >> > One of the important benefits of separation that changes in one
> > >> component
> > >> > should not need change in other component, atleast
> > >> > not immediately.
> > >> >
> > >> > Changes in Helm chart and Docker file should never need changes in
> > >> Airflow
> > >> > Changes in Airflow should only ever need a change in Dockerfile and
> > Helm
> > >> > Chart after a new version is released.
> > >> >
> > >> > I just had a talk with Daniel too and still didn't find a good
> enough
> > >> > reason to have them in the same repo.
> > >> >
> > >> > I will definitely look at the Kubernetes approach (maybe it is
> better)
> > >> and
> > >> > get back to this thread. But as of now I don't see any major PROs
> > >> > for having them in the same repo.
> > >> >
> > >> > Regards,
> > >> > Kaxil
> > >> >
> > >> >
> > >> >
> > >> > On Fri, Jul 3, 2020 at 5:00 PM Jarek Potiuk <
> Jarek.Potiuk@polidea.com [Jarek.Potiuk@polidea.com]
> > >
> > >> > wrote:
> > >> >
> > >> >> I think Ry's point is an important one - I thought about writing a
> > >> longer
> > >> >> post but I looked at the Kubernetes structure and I really like it
> so
> > >> just
> > >> >> wanted to comment on this last one.
> > >> >>
> > >> >> Seems that it is simply one "authoritative" (or source of truth)
> repo
> > >> where
> > >> >> everything is developed in monorepo fashion but then there is a bot
> > >> >> that moves every commit related to subdirectories to those
> > "split-out"
> > >> >> repos. There are never direct commits of people or PRs in the
> > >> "split-out"
> > >> >> repositories. This is very similar to my original proposal to have
> > >> >> dedicated repos used for releases - but with an automated way of
> > >> publishing
> > >> >> the commits to the "separated" repos at the moment, they are merged
> > to
> > >> >> master in the main repo. I love it.
> > >> >>
> > >> >> I think it's really good and "pragmatic" solution. The code is
> > >> >> available in
> > >> >> separate repos, including the history of commits related to each
> > >> "entity"
> > >> >> (so only chart-related commits in chart repo). Issues for
> particular
> > >> >> "entities" are in those separate repos as well (something that
> Kaxil
> > >> >> mentioned). Users (not developers!) who are interested only in
> > >> Dockerfile
> > >> >> or Helm Chart have separate repos they can look at - with only
> > relevant
> > >> >> changes and history of releases for that particular entity. They
> can
> > >> raise
> > >> >> issues there (and in GitHub, we can easily refer to those issues
> from
> > >> the
> > >> >> main "airflow" repo). All the discussion from "user issues" are
> kept
> > >> >> in the
> > >> >> relevant repositories. Still - comments about development changes
> > (and
> > >> >> related issues) might still be kept in the main "airflow" repo -
> next
> > >> to
> > >> >> other "development" changes.
> > >> >>
> > >> >> We can run separate releases from those linked repositories and
> even
> > >> >> publish sources directly from those repositories rather than from
> the
> > >> main
> > >> >> one. At the same time - we avoid all the hassle of submodules.
> > >> >>
> > >> >> We had a discussion with Daniel yesterday and we are both concerned
> > >> about
> > >> >> all the overhead for people like us who work on all three
> "entities"
> > >> >> at the
> > >> >> same time. Even just explaining how to work with Pull Requests and
> in
> > >> what
> > >> >> sequence those PRs would have to be opened and merged in case of
> > >> changes
> > >> >> that are spanning across several "entities" - was a challenge. I
> was
> > >> unable
> > >> >> to clearly explain the sequence and way of reviewing/merging the
> PRs
> > >> that
> > >> >> will have to be made if we have submodules. This is a bad sign as I
> > was
> > >> >> using submodules in the past and know how it works but I was unable
> > to
> > >> >> explain it clearly.
> > >> >>
> > >> >> I really, really like Kubernetes approach - seems that it's one of
> > the
> > >> >> cases where we can "eat cake and have it too".
> > >> >>
> > >> >> J.
> > >> >>
> > >> >>
> > >> >> On Thu, Jul 2, 2020 at 5:59 PM Ry Walker < ry@rywalker.com [ry@rywalker.com] > wrote:
> > >> >>
> > >> >> > One reason to have a monorepo is for project branding, and end
> user
> > >> >> > experience. But for component development experience, it's nice
> to
> > >> >> have a
> > >> >> > small, dedicated repo.
> > >> >> >
> > >> >> > I think the git submodule approach is technically sound, but is
> at
> > >> odds
> > >> >> > with making the project easy to consume/understand from the end
> > user
> > >> >> > perspective, especially if we expand the use of subprojects. And
> > >> >> the main
> > >> >> > Airflow commit graph would appear to be slowing down which is bad
> > for
> > >> >> > Airflow brand perception.
> > >> >> >
> > >> >> > Kubernetes has many sub-repos that are integrated into the main
> > >> >> repo -
> > >> >> > which I think could be the best of both worlds:
> > >> >> > Example:
> > >> https://github.com/kubernetes/kubernetes/tree/master/staging [https://github.com/kubernetes/kubernetes/tree/master/staging]
> > >> >> >
> > >> >> > I haven't dug in very deeply, and I won't pretend to understand
> how
> > >> >> > challenging it may be to maintain this structure, but I'd support
> > >> >> breaking
> > >> >> > more components out of the main Airflow repo for dev purposes
> (for
> > >> >> example,
> > >> >> > in the future, it'd be nice to have airflow-cli, airflow-api,
> > >> >> > airflow-scheduler, individual provider repos that are cleanly
> > >> separated)
> > >> >> as
> > >> >> > long as we bring the commits/contributions back into the monorepo
> > >> with
> > >> >> > automation.
> > >> >> >
> > >> >> > Maybe we could dive a little deeper into how K8s is operating,
> > before
> > >> >> going
> > >> >> > with submodules?
> > >> >> >
> > >> >> > -Ry
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> >
> > >> >> > On Thu, Jul 2, 2020 at 11:24 AM Kaxil Naik < kaxilnaik@gmail.com [kaxilnaik@gmail.com] >
> > >> wrote:
> > >> >> >
> > >> >> > > Let's come to a consensus first before we do anything :-)
> > >> >> > >
> > >> >> > > Is everyone happy with separate repo approach? Let's wait for
> 72
> > >> hours
> > >> >> to
> > >> >> > > hear from all and then have a plan on how we do it? WDYT?
> > >> >> > >
> > >> >> > > But indeed git submodules approach sounds good. We do it for
> for
> > >> >> *Airflow
> > >> >> > > Site *(
> > >> >> > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> https://github.com/apache/airflow-site/tree/master/landing-pages/site/themes [https://github.com/apache/airflow-site/tree/master/landing-pages/site/themes]
> > >> >> > > )
> > >> >> > > too.
> > >> >> > >
> > >> >> > > Regards,
> > >> >> > > Kaxil
> > >> >> > >
> > >> >> > > On Thu, Jul 2, 2020 at 4:15 PM Jarek Potiuk <
> > >> Jarek.Potiuk@polidea.com [Jarek.Potiuk@polidea.com] >
> > >> >> > > wrote:
> > >> >> > >
> > >> >> > > > Absolutely - I am happy to add "best practices" and short
> > >> >> "howto do
> > >> >> > stuff
> > >> >> > > > with git submodules" - and this knowledge will only be
> needed
> > >> for
> > >> >> > > > interacting with prod image/helmchart/running kubernetes
> tests.
> > >> For
> > >> >> all
> > >> >> > > the
> > >> >> > > > other purposes it should be "business as usual".
> > >> >> > > >
> > >> >> > > > On Thu, Jul 2, 2020 at 4:53 PM Daniel Imberman <
> > >> >> > > daniel.imberman@gmail.com [daniel.imberman@gmail.com] >
> > >> >> > > > wrote:
> > >> >> > > >
> > >> >> > > > > I think git submodules sounds like a great idea. We would
> > >> >> need to
> > >> >> > write
> > >> >> > > > > this into the CONTRIBUTING.md to let people know how to do
> it
> > >> but
> > >> >> > It’s
> > >> >> > > a
> > >> >> > > > > “teach once” situation.
> > >> >> > > > >
> > >> >> > > > > via Newton Mail [
> > >> >> > > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2 [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2]
> > >> >> > > > > ]
> > >> >> > > > > On Thu, Jul 2, 2020 at 2:44 AM, Tomasz Urbaszek <
> > >> >> > turbaszek@apache.org [turbaszek@apache.org] >
> > >> >> > > > > wrote:
> > >> >> > > > > I support the idea of separate repos. The git submodules
> > >> mentioned
> > >> >> by
> > >> >> > > > > Jarek sounds like an interesting solution. It may add some
> > >> >> complexity
> > >> >> > > > > for new contributors but it's not rocket science. If we
> agree
> > >> on
> > >> >> > using
> > >> >> > > > > this we should add small how-to in contributing.rst I think
> > >> (i.e.
> > >> >> do
> > >> >> > I
> > >> >> > > > > have to have fork of each repo?).
> > >> >> > > > >
> > >> >> > > > > As stressed previously if we go this route we should make
> > >> >> sure we
> > >> >> > have
> > >> >> > > > > nice testing of all those three components. Regarding the
> > >> >> versioning,
> > >> >> > > > > I have no strong opinion but I fully support using separate
> > >> issues
> > >> >> > for
> > >> >> > > > > airflow, docker, and helm.
> > >> >> > > > >
> > >> >> > > > > Tomek
> > >> >> > > > >
> > >> >> > > > >
> > >> >> > > > > On Thu, Jul 2, 2020 at 9:26 AM Jarek Potiuk <
> > >> >> > Jarek.Potiuk@polidea.com [Jarek.Potiuk@polidea.com] >
> > >> >> > > > > wrote:
> > >> >> > > > > >
> > >> >> > > > > > On Thu, Jul 2, 2020 at 3:16 AM Daniel Imberman <
> > >> >> > > > > daniel.imberman@gmail.com [daniel.imberman@gmail.com] >
> > >> >> > > > > > wrote:
> > >> >> > > > > >
> > >> >> > > > > > I’m fine with keeping it as three separate repos but
> > merging
> > >> >> > testing
> > >> >> > > > > > > somehow (e.g. the source code chart would pull the
> > >> helm/docker
> > >> >> > > chart
> > >> >> > > > > into
> > >> >> > > > > > > .build) but we need to do it in a way that doesn’t make
> > >> testing
> > >> >> > too
> > >> >> > > > > > > difficult.
> > >> >> > > > > > >
> > >> >> > > > > > > So for example: How do I test/integration test a change
> > >> that
> > >> >> > > > involves a
> > >> >> > > > > > > change to all three and has to be done at the same
> time?
> > >> >> Perhaps
> > >> >> > a
> > >> >> > > > > user can
> > >> >> > > > > > > “register” a branch of helm and docker when they start
> up
> > >> >> breeze?
> > >> >> > > Or
> > >> >> > > > > > > perhaps we create a “parent” integration test that uses
> > the
> > >> >> three
> > >> >> > > > > together?
> > >> >> > > > > > >
> > >> >> > > > > >
> > >> >> > > > > > Yes, those are exactly my concerns when splitting the
> > repos.
> > >> >> > > > > >
> > >> >> > > > > > I think testing for development should remain in the
> > >> "airflow"
> > >> >> > repo.
> > >> >> > > It
> > >> >> > > > > is
> > >> >> > > > > > the "central one" in fact. I slept it over and I think
> > using
> > >> >> > > "released"
> > >> >> > > > > > versions for development testing will suffer from this
> "we
> > >> >> need a
> > >> >> > > > change
> > >> >> > > > > in
> > >> >> > > > > > all three of those".
> > >> >> > > > > >
> > >> >> > > > > > But we have an easy solution I think.
> > >> >> > > > > >
> > >> >> > > > > > I think that simply setting submodules properly should do
> > >> >> to the
> > >> >> > job:
> > >> >> > > > > > https://git-scm.com/book/en/v2/Git-Tools-Submodules [https://git-scm.com/book/en/v2/Git-Tools-Submodules] .
> They
> > >> seem
> > >> >> to
> > >> >> > be
> > >> >> > > > > > perfect for our case.
> > >> >> > > > > >
> > >> >> > > > > > For those who have not used it - in short - submodules
> work
> > >> in
> > >> >> the
> > >> >> > > way
> > >> >> > > > > that
> > >> >> > > > > > they register the "linked repos" and store related "hash"
> > >> >> of the
> > >> >> > > commit
> > >> >> > > > > > from that linked repo. For example, the "chart" folder
> will
> > >> >> be a
> > >> >> > link
> > >> >> > > > to
> > >> >> > > > > > "apache/airflow-helm-chart". We can also move the prod
> > >> Dockerfile
> > >> >> > to
> > >> >> > > a
> > >> >> > > > > > subfolder and link it to the separate repo. Git submodule
> > >> >> has a
> > >> >> > > > > > built-in mechanism to a) update to the latest version of
> > the
> > >> >> repo,
> > >> >> > b)
> > >> >> > > > > > commit your changes to the linked repo from there which
> is
> > >> >> all we
> > >> >> > > > need. I
> > >> >> > > > > > used those few times - I never liked submodules for
> sharing
> > >> >> > "library"
> > >> >> > > > > code,
> > >> >> > > > > > but for sharing helm/Docker It seems perfect.
> > >> >> > > > > >
> > >> >> > > > > > From the "regular" developer point of view - you do not
> > >> >> need to
> > >> >> > > > > get/update
> > >> >> > > > > > submodules if you do not need to use them - so for all
> the
> > >> >> > > development
> > >> >> > > > > > purposes if you only change the "airflow" code, you would
> > not
> > >> >> even
> > >> >> > > need
> > >> >> > > > > to
> > >> >> > > > > > sync chart or Dockerfile. You do "git checkout" as usual
> > >> >> and it
> > >> >> > > should
> > >> >> > > > > > work. So basically - no change for "regular" airflow
> > >> development.
> > >> >> > > > > >
> > >> >> > > > > > However, if you do need to work on helm + Docker + code,
> > >> >> then you
> > >> >> > > > simply
> > >> >> > > > > to
> > >> >> > > > > > "git submodule update", go to the linked "helm" or
> "docker"
> > >> >> folder,
> > >> >> > > > > > checkout the "master" version and you start making
> changes.
> > >> The
> > >> >> > only
> > >> >> > > > > thing
> > >> >> > > > > > to remember when you want to push your changes is to do
> > >> >> `git push
> > >> >> > > > > > --recurse-sumbodules="check" ` and it will make sure that
> > >> >> all the
> > >> >> > > repos
> > >> >> > > > > are
> > >> >> > > > > > updated, It is a bit involved, but latest git version
> have
> > >> >> a very
> > >> >> > > good
> > >> >> > > > > > support and it must only be used by people who work on
> > >> >> airflow +
> > >> >> > > > docker +
> > >> >> > > > > > helm - all the others are unaffected.
> > >> >> > > > > >
> > >> >> > > > > > From the CI perspective also nothing changes - when we
> > >> checkout
> > >> >> the
> > >> >> > > > code
> > >> >> > > > > we
> > >> >> > > > > > will include submodules and our test harness will be
> > largely
> > >> >> > > unchanged.
> > >> >> > > > > > Submodule provides us with the right mechanism for cross
> > >> >> dependency
> > >> >> > > > even
> > >> >> > > > > if
> > >> >> > > > > > we use branches.
> > >> >> > > > > >
> > >> >> > > > > > If everyone will be ok with that - I am happy to set it
> up,
> > >> With
> > >> >> > > > > submodules
> > >> >> > > > > > - we can switch to separate repos even without releasing
> > >> >> helm and
> > >> >> > > Prod
> > >> >> > > > > > chart "officially".
> > >> >> > > > > >
> > >> >> > > > > > J.
> > >> >> > > > > >
> > >> >> > > > > >
> > >> >> > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > via Newton Mail [
> > >> >> > > > > > >
> > >> >> > > > >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2 [https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2]
> > >> >> > > > > > > ]
> > >> >> > > > > > > On Wed, Jul 1, 2020 at 3:20 PM, Jarek Potiuk <
> > >> >> > > > Jarek.Potiuk@polidea.com [Jarek.Potiuk@polidea.com]
> > >> >> > > > > >
> > >> >> > > > > > > wrote:
> > >> >> > > > > > > Sure. We can work with such an approach. There will be
> > some
> > >> >> > > > > dependencies
> > >> >> > > > > > > that we might find are problematic, but If we all see
> > >> >> that it's
> > >> >> > > > > > > worth trying, there is a clear benefit that it makes
> for
> > a
> > >> >> > "clean"
> > >> >> > > > > > > split between those different "entities". And possibly
> > >> >> once we
> > >> >> > > > release
> > >> >> > > > > > > first versions of both image and chart, such problems
> > >> >> will be
> > >> >> > rare
> > >> >> > > > and
> > >> >> > > > > easy
> > >> >> > > > > > > to fix.
> > >> >> > > > > > >
> > >> >> > > > > > > I personally think such split is inevitable eventually,
> > >> it's
> > >> >> > just a
> > >> >> > > > > matter
> > >> >> > > > > > > when to do it. If we decide to make this happen soon -
> I
> > am
> > >> >> more
> > >> >> > > than
> > >> >> > > > > happy
> > >> >> > > > > > > to work on making the split reality.
> > >> >> > > > > > >
> > >> >> > > > > > > One prerequisite to that is that all those - Helm
> Chart,
> > >> Prod
> > >> >> > Image
> > >> >> > > > and
> > >> >> > > > > > > Airflow are released in stable versions separately
> > >> >> "officially" -
> > >> >> > > > from
> > >> >> > > > > the
> > >> >> > > > > > > current sources (otherwise there will be no way to test
> > >> >> > > cross-repo).
> > >> >> > > > > > >
> > >> >> > > > > > > I think for that we will need to agree on the
> versioning
> > >> scheme
> > >> >> > and
> > >> >> > > > > cadence
> > >> >> > > > > > > for the Image and Helm Chart, then copy sources from
> > >> airflow
> > >> >> and
> > >> >> > > > > release
> > >> >> > > > > > > them as "baseline" including setup the tests for all of
> > >> >> those -
> > >> >> > > then
> > >> >> > > > we
> > >> >> > > > > > > can remove both Helm and Dockerfile from the airflow
> > repo.
> > >> >> Happy
> > >> >> > to
> > >> >> > > > > help
> > >> >> > > > > > > with that if that's the direction we choose as a
> > >> >> community. It
> > >> >> is
> > >> >> > > > > important
> > >> >> > > > > > > though that we keep the cross-repo testing working. We
> > >> >> have it
> > >> >> > > > working
> > >> >> > > > > as
> > >> >> > > > > > > of yesterday, so now the matter is - whatever we do we
> > >> >> keep it
> > >> >> > > > running
> > >> >> > > > > and
> > >> >> > > > > > > have development environment support easy development
> and
> > >> >> testing
> > >> >> > > of
> > >> >> > > > > > > either of the three (including CI testing cross-repos)
> ,
> > >> That's
> > >> >> > the
> > >> >> > > > > only
> > >> >> > > > > > > really important thing to me - the rest is more of
> > >> technicality
> > >> >> > how
> > >> >> > > > we
> > >> >> > > > > link
> > >> >> > > > > > > the repos, but principle remains.
> > >> >> > > > > > >
> > >> >> > > > > > > Do we have an idea for the versioning scheme that we
> > >> >> would like
> > >> >> > to
> > >> >> > > > use
> > >> >> > > > > for
> > >> >> > > > > > > the Helm Chart and prod image ?
> > >> >> > > > > > >
> > >> >> > > > > > > Should we make it CalVer
> > >> >> < https://calver.org/overview.html [https://calver.org/overview.html] > or
> > >> >> > > > SemVer
> > >> >> > > > > > > < https://semver.org/ [https://semver.org/] > (or some other scheme)? And how
> > >> should
> > >> >> we
> > >> >> > > > treat
> > >> >> > > > > the
> > >> >> > > > > > > combinations with Airflow?
> > >> >> > > > > > >
> > >> >> > > > > > > My thoughts (but I have no strong opinions as long as
> > >> someone
> > >> >> > > > proposes
> > >> >> > > > > more
> > >> >> > > > > > > sensible versioning schemes):
> > >> >> > > > > > >
> > >> >> > > > > > > 1) Airflow code - we continue the release scheme we
> have
> > >> (with
> > >> >> > > > > deciding on
> > >> >> > > > > > > 2.* scheme for the release). I expect in the future we
> > >> might
> > >> >> > decide
> > >> >> > > > on
> > >> >> > > > > > > doing branches or patches so for 2.* I'd opt for going
> > full
> > >> >> > SemVer
> > >> >> > > > > approach
> > >> >> > > > > > > and patches released from branches.
> > >> >> > > > > > >
> > >> >> > > > > > > 2) I believe that Helm Chart can be versioned with its
> > own
> > >> >> > version
> > >> >> > > > > (then
> > >> >> > > > > > > you specify the image version as helm parameter). For
> the
> > >> Helm
> > >> >> > > Chart
> > >> >> > > > I
> > >> >> > > > > > > think CalVer might be OK as I do not expect any
> > >> >> branching/patches
> > >> >> > > in
> > >> >> > > > > the
> > >> >> > > > > > > future - I'd expect that there will be a single stream
> of
> > >> >> > releases.
> > >> >> > > > > > >
> > >> >> > > > > > > 3) Dockerfile (+ related files such as .dockerignore,
> > empty
> > >> >> dir,
> > >> >> > > > > > > entrypoints etc). i do not imagine a lot of branching
> for
> > >> >> those -
> > >> >> > > we
> > >> >> > > > > > > should be able to release a new version of a Dockerfile
> > (+
> > >> >> > related
> > >> >> > > > > files)
> > >> >> > > > > > > working with nearly any earlier Airflow release, so
> > CalVer
> > >> >> seems
> > >> >> > > > like a
> > >> >> > > > > > > good choice.
> > >> >> > > > > > >
> > >> >> > > > > > > 4) Image versioning becomes a bit most complex because
> > the
> > >> >> image
> > >> >> > > tag
> > >> >> > > > is
> > >> >> > > > > > > always combination of:
> > >> >> > > > > > > * Dockerfile (+ related files) version
> > >> >> > > > > > > * Airflow Version
> > >> >> > > > > > > * Python Version
> > >> >> > > > > > >
> > >> >> > > > > > > An example versioning I can imagine:
> > >> >> > > > > > >
> > >> >> > > > > > > *Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 -
> patch
> > >> level
> > >> >> > (if
> > >> >> > > we
> > >> >> > > > > > > decide to have patches).
> > >> >> > > > > > > *Dockerfile: *2020.07.12, 2020.08.20...... -> depending
> > >> >> when we
> > >> >> > > > release
> > >> >> > > > > > > them
> > >> >> > > > > > > *Helm Chart*: 2020.07.10, 2020.08.09 ...... Each Helm
> > Chart
> > >> >> has a
> > >> >> > > > > minimum
> > >> >> > > > > > > version of both Dockerfile and Airflow versions it
> works
> > >> with.
> > >> >> > > > > > >
> > >> >> > > > > > > *Example Docker Image tags:*
> > >> >> > > > > > >
> > >> apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6
> > >> >> > > > > > >
> > >> >> > > > > > > WDYT?
> > >> >> > > > > > >
> > >> >> > > > > > > J,
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik <
> > >> >> kaxilnaik@gmail.com [kaxilnaik@gmail.com] >
> > >> >> > > > > wrote:
> > >> >> > > > > > >
> > >> >> > > > > > > > I think we should have "separate repos for
> development"
> > >> too.
> > >> >> > > > > > > >
> > >> >> > > > > > > > 3 Repos in total:
> > >> >> > > > > > > >
> > >> >> > > > > > > > 1) apache/airflow
> > >> >> > > > > > > > 2) apache/airflow-docker-image
> > >> >> > > > > > > > 3) apache/airflow-helm-chart
> > >> >> > > > > > > >
> > >> >> > > > > > > >
> > >> >> > > > > > > > (1) *apache/airflow* should use a pinned stable
> version
> > >> of
> > >> >> > > Airflow
> > >> >> > > > > Helm
> > >> >> > > > > > > > chart to run Kubernetes tests
> > >> >> > > > > > > > (2) *apache/airflow* already has *Dockerfile.ci* file
> > >> which
> > >> >> it
> > >> >> > > can
> > >> >> > > > > use to
> > >> >> > > > > > > > run airflow tests on docker images.
> > >> >> > > > > > > > (3) *apache/airflow-docker-image *should use the
> latest
> > >> >> > available
> > >> >> > > > > stable
> > >> >> > > > > > > > version of airflow
> > >> >> > > > > > > > (4) *apache/airflow-helm-chart *should use the latest
> > >> >> available
> > >> >> > > > > stable
> > >> >> > > > > > > > version of airflow
> > >> >> > > > > > > >
> > >> >> > > > > > > > Having such split also makes some updates more
> > >> >> difficult -
> > >> >> for
> > >> >> > > > > example if
> > >> >> > > > > > > > > we add new "extra" to Airflow that will require to
> > >> install
> > >> >> > > "apt"
> > >> >> > > > > > > > dependency
> > >> >> > > > > > > > > in Dockerfile, we will have to split it into first
> > >> adding
> > >> >> the
> > >> >> > > > > > > dependency
> > >> >> > > > > > > > to
> > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add the
> > >> >> extra to
> > >> >> > > > airflow
> > >> >> > > > > with
> > >> >> > > > > > > > > setup.py.
> > >> >> > > > > > > >
> > >> >> > > > > > > >
> > >> >> > > > > > > > Adding a new extra to setup.py would not (and should
> > not)
> > >> >> > impact
> > >> >> > > > the
> > >> >> > > > > > > > development of *apache/airflow-docker-image*
> > >> >> > > > > > > > Once an RC is cut for apache/airflow or after a new
> > >> version
> > >> >> is
> > >> >> > > > > released
> > >> >> > > > > > > for
> > >> >> > > > > > > > apache/airflow, we can work on supporting the new
> > airflow
> > >> >> > version
> > >> >> > > > in
> > >> >> > > > > the
> > >> >> > > > > > > > Production Docker Image.
> > >> >> > > > > > > > While doing that we can add all the libraries that
> are
> > >> needed
> > >> >> > by
> > >> >> > > > the
> > >> >> > > > > new
> > >> >> > > > > > > > Airflow Version and we will have a clean commit
> history
> > >> and
> > >> >> > > > > changelog for
> > >> >> > > > > > > > Docker image.
> > >> >> > > > > > > >
> > >> >> > > > > > > > We definitely do not need to work parallelly on both
> > the
> > >> >> repos.
> > >> >> > > By
> > >> >> > > > > doing
> > >> >> > > > > > > > development in a separate repo we keep consistent
> > >> "source"
> > >> >> > files
> > >> >> > > > and
> > >> >> > > > > we
> > >> >> > > > > > > can
> > >> >> > > > > > > > release each artifact with a
> > >> >> > > > > > > > separate cadence. If someone discovers bug in newly
> > >> released
> > >> >> > > > > Dockerimage,
> > >> >> > > > > > > > we should be easily able to cut out a new release
> with
> > >> the
> > >> >> > patch
> > >> >> > > > > without
> > >> >> > > > > > > > worrying about how development is
> > >> >> > > > > > > > going in the apache/airflow repo.
> > >> >> > > > > > > >
> > >> >> > > > > > > >
> > >> >> > > > > > > > *Apache Flink & Apache CoucheDB *does it in the
> similar
> > >> >> manner:
> > >> >> > > > > > > >
> > >> >> > > > > > > > https://github.com/apache/flink [https://github.com/apache/flink] &
> > >> >> > > > > https://github.com/apache/flink-docker [https://github.com/apache/flink-docker]
> > >> >> > > > > > > > https://github.com/apache/couchdb [https://github.com/apache/couchdb] &
> > >> >> > > > > > > > https://github.com/apache/couchdb-docker [https://github.com/apache/couchdb-docker]
> > >> >> > > > > > > >
> > >> >> > > > > > > > Regards,
> > >> >> > > > > > > > Kaxil
> > >> >> > > > > > > >
> > >> >> > > > > > > >
> > >> >> > > > > > > >
> > >> >> > > > > > > >
> > >> >> > > > > > > >
> > >> >> > > > > > > >
> > >> >> > > > > > > > On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk <
> > >> >> > > > > Jarek.Potiuk@polidea.com [Jarek.Potiuk@polidea.com] >
> > >> >> > > > > > > > wrote:
> > >> >> > > > > > > >
> > >> >> > > > > > > > > I do not think it's only the question of Mono/Multi
> > >> repos.
> > >> >> > > While
> > >> >> > > > I
> > >> >> > > > > > > > clearly
> > >> >> > > > > > > > > see the benefit of separate repos I also see some
> > >> >> drawbacks.
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > And if it bothers others, I am happy to follow the
> > >> >> majority.
> > >> >> > If
> > >> >> > > > we
> > >> >> > > > > > > think
> > >> >> > > > > > > > > that a bit more complexity in testing justifies
> > >> separating
> > >> >> > > those
> > >> >> > > > > three
> > >> >> > > > > > > > > completely and having more "clean"- it's also
> > >> >> workable but
> > >> >> > IMHO
> > >> >> > > > > > > > introduces
> > >> >> > > > > > > > > certain complexity in development.
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > However I think this is not 0/1 a kind of Hybrid
> > >> approach
> > >> >> in
> > >> >> > my
> > >> >> > > > > opinion
> > >> >> > > > > > > > > might be best of both worlds - development and
> > >> >> releases .
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > Let me explain what I mean by "Hybrid":
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > I think we definitely should have separate
> > >> >> repositories to
> > >> >> > > > release
> > >> >> > > > > > > those
> > >> >> > > > > > > > > artifacts and I think there is no doubt about it:
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > * airflow (apache/airflow)
> > >> >> > > > > > > > > * prod docker image (apache/airflow-docker)
> > >> >> > > > > > > > > * helm chart (apache/airflow-helm)
> > >> >> > > > > > > > > * api clients (we already have separate repos for
> > >> those)
> > >> >> > > > > > > > > (apache/airflow-client-*)
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > I think the only question is where we develop all
> > those
> > >> >> > > (develop
> > >> >> > > > !=
> > >> >> > > > > > > > > release). There are certain benefits of having a
> > single
> > >> >> > > "master"
> > >> >> > > > > (let's
> > >> >> > > > > > > > > call it "development" further) for all those
> > artifacts.
> > >> >> > > Currently
> > >> >> > > > > the
> > >> >> > > > > > > > > "development" version for all of those is in one
> repo
> > >> >> - and
> > >> >> > > while
> > >> >> > > > > > > > > developing one depends on the other, we also test
> all
> > >> of
> > >> >> > those
> > >> >> > > > > together
> > >> >> > > > > > > > and
> > >> >> > > > > > > > > this means that "current best" set of airflow
> sources
> > >> >> > > (including
> > >> >> > > > > > > > > dependencies in setup.py), Dockerfile and Helm
> chart
> > >> work.
> > >> >> > This
> > >> >> > > > > means
> > >> >> > > > > > > for
> > >> >> > > > > > > > > example that you will not be able to break the Helm
> > >> Chart
> > >> >> by
> > >> >> > > > > changing
> > >> >> > > > > > > > > anything that the helm chart depends on in airflow.
> > For
> > >> >> > example
> > >> >> > > > if
> > >> >> > > > > you
> > >> >> > > > > > > > > change "airflow webserver" into "airflow server"
> the
> > >> >> current
> > >> >> > > helm
> > >> >> > > > > chart
> > >> >> > > > > > > > > will break. Similarly if you change entrypoint,sh
> in
> > >> Docker
> > >> >> > > image
> > >> >> > > > > in a
> > >> >> > > > > > > > way
> > >> >> > > > > > > > > that is not compatible with Helm chart, we will not
> > let
> > >> >> that
> > >> >> > > > > happen -
> > >> >> > > > > > > the
> > >> >> > > > > > > > > CI tests will break if either of those changes in
> an
> > >> >> > > incompatible
> > >> >> > > > > way.
> > >> >> > > > > > > > And
> > >> >> > > > > > > > > we can have dependencies in any direction between
> > those
> > >> >> > three.
> > >> >> > > > > When we
> > >> >> > > > > > > > see
> > >> >> > > > > > > > > a commit break either of the three - we can make a
> > >> decision
> > >> >> > > about
> > >> >> > > > > what
> > >> >> > > > > > > to
> > >> >> > > > > > > > > do - either accept and document the incompatibility
> > >> >> or fix
> > >> >> > it.
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > Of course keeping that property (testing it all
> > >> together)
> > >> >> is
> > >> >> > > also
> > >> >> > > > > > > > possible
> > >> >> > > > > > > > > if they are in completely separate repos. There are
> > >> several
> > >> >> > > > > > > > > cross-dependencies - Docker image building depends
> on
> > >> >> > > > dependencies
> > >> >> > > > > in
> > >> >> > > > > > > > > setup.py for example, you cannot build Docker image
> > >> from
> > >> >> only
> > >> >> > > > > > > Dockerfile
> > >> >> > > > > > > > > without the sources of airflow nor build and test
> > helm
> > >> >> charts
> > >> >> > > > > without
> > >> >> > > > > > > the
> > >> >> > > > > > > > > image (and sources - because that's where the
> current
> > >> >> > > kubernetes
> > >> >> > > > > tests
> > >> >> > > > > > > > > are). If we want to continue doing it for both Helm
> > and
> > >> >> > > > > Dockerfile, we
> > >> >> > > > > > > > > would have to basically check out the latest
> sources
> > of
> > >> >> > Airflow
> > >> >> > > > > and run
> > >> >> > > > > > > > the
> > >> >> > > > > > > > > CI tests before merging any Docker or Helm Chart
> > >> changes
> > >> >> and
> > >> >> > > the
> > >> >> > > > > > > > opposite -
> > >> >> > > > > > > > > we will have to download Dockerfile/Helm chart and
> > >> build
> > >> >> > > > > image/install
> > >> >> > > > > > > > Helm
> > >> >> > > > > > > > > chart when we are running CI tests for Airflow.
> This
> > is
> > >> >> > > possible
> > >> >> > > > > and we
> > >> >> > > > > > > > > could do it, but it adds complexity to the build/CI
> > >> >> process.
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > Having such split also makes some updates more
> > >> >> difficult -
> > >> >> > for
> > >> >> > > > > example
> > >> >> > > > > > > if
> > >> >> > > > > > > > > we add new "extra" to Airflow that will require to
> > >> install
> > >> >> > > "apt"
> > >> >> > > > > > > > dependency
> > >> >> > > > > > > > > in Dockerfile, we will have to split it into first
> > >> adding
> > >> >> the
> > >> >> > > > > > > dependency
> > >> >> > > > > > > > to
> > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add the
> > >> >> extra to
> > >> >> > > > airflow
> > >> >> > > > > with
> > >> >> > > > > > > > > setup.py. This makes it quite difficult to test it
> > >> together
> > >> >> > > > though
> > >> >> > > > > (the
> > >> >> > > > > > > > > Dockerfile change can only be tested fully after
> > >> >> merging it
> > >> >> > to
> > >> >> > > > > master).
> > >> >> > > > > > > > Not
> > >> >> > > > > > > > > mentioning complexity of managing different
> versions
> > >> >> - your
> > >> >> > > local
> > >> >> > > > > > > > > development Dockerfile version vs sources of
> Airflow
> > >> for
> > >> >> > > example.
> > >> >> > > > > > > Imagine
> > >> >> > > > > > > > > switching between branches where you add two
> > >> >> different apt
> > >> >> > > > > dependencies
> > >> >> > > > > > > > to
> > >> >> > > > > > > > > the Dockerfile. There are more similar scenarios I
> > can
> > >> >> > imagine
> > >> >> > > -
> > >> >> > > > > > > > especially
> > >> >> > > > > > > > > for parallel changes in those repos.
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > This is of course doable to keep them separate, but
> > >> >> it is
> > >> >> > > quite a
> > >> >> > > > > bit
> > >> >> > > > > > > > more
> > >> >> > > > > > > > > complex to set up (especially for a consistent
> > >> development
> > >> >> > > > > environment)
> > >> >> > > > > > > > > when you have separate repos and prevent
> > cross-breaking
> > >> >> > changes
> > >> >> > > > > might
> > >> >> > > > > > > be
> > >> >> > > > > > > > > more difficult.
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > I believe that the best way is to continue
> developing
> > >> >> > airflow +
> > >> >> > > > > image +
> > >> >> > > > > > > > > chart in one repo - airflow, but release them from
> > >> those
> > >> >> > > separate
> > >> >> > > > > > > repos.
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > Airflow source release does not have to contain
> > neither
> > >> >> > chart,
> > >> >> > > > nor
> > >> >> > > > > > > image.
> > >> >> > > > > > > > > And even if it contains sources for those, they are
> > >> >> not the
> > >> >> > > final
> > >> >> > > > > > > > > "artifacts" (installable image and installable helm
> > >> chart).
> > >> >> > > > > > > > > Whenever we decide to release either of them - we
> > >> >> test it
> > >> >> in
> > >> >> > > > > > > > "development".
> > >> >> > > > > > > > > Then only when it is tested, we copy the sources to
> > >> those
> > >> >> > > > separate
> > >> >> > > > > > > repos
> > >> >> > > > > > > > > and release them.
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > With git - we can even do it very easily while
> > >> preserving
> > >> >> > > history
> > >> >> > > > > of
> > >> >> > > > > > > > > commits easily (been there, done that). And then we
> > >> could
> > >> >> > > release
> > >> >> > > > > Helm
> > >> >> > > > > > > > and
> > >> >> > > > > > > > > Docker image separately based on the commits and
> tags
> > >> in
> > >> >> > those
> > >> >> > > > > separate
> > >> >> > > > > > > > > repositories.
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > I agree that separate repos is a more "clean"
> > approach.
> > >> >> But I
> > >> >> > > > > think it
> > >> >> > > > > > > is
> > >> >> > > > > > > > > less convenient for development consistency.
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > J,
> > >> >> > > > > > > > >
> > >> >> > > > > > > > >
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <
> > >> >> > kaxilnaik@gmail.com [kaxilnaik@gmail.com]
> > >> >> > > >
> > >> >> > > > > wrote:
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > > Forgot to mention, having them in separate repo
> > also
> > >> >> helps
> > >> >> > in
> > >> >> > > > > better
> > >> >> > > > > > > > > > managing each individual artifacts.
> > >> >> > > > > > > > > >
> > >> >> > > > > > > > > > Each repo would have a separate Github Issue
> where
> > >> >> we can
> > >> >> > > track
> > >> >> > > > > the
> > >> >> > > > > > > > issue
> > >> >> > > > > > > > > > specific to Helm chart or Dockerfile.
> > >> >> > > > > > > > > >
> > >> >> > > > > > > > > > Regards,
> > >> >> > > > > > > > > > Kaxil
> > >> >> > > > > > > > > >
> > >> >> > > > > > > > > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <
> > >> >> > > kaxilnaik@gmail.com [kaxilnaik@gmail.com]
> > >> >> > > > >
> > >> >> > > > > > > wrote:
> > >> >> > > > > > > > > >
> > >> >> > > > > > > > > > > The PMC also needs to agree if we want separate
> > >> VOTING
> > >> >> > for
> > >> >> > > > > Docker
> > >> >> > > > > > > > Image
> > >> >> > > > > > > > > > > and Helm chart, I think we do.
> > >> >> > > > > > > > > > >
> > >> >> > > > > > > > > > > Regards,
> > >> >> > > > > > > > > > > Kaxil
> > >> >> > > > > > > > > > >
> > >> >> > > > > > > > > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik <
> > >> >> > > > kaxilnaik@gmail.com [kaxilnaik@gmail.com]
> > >> >> > > > > >
> > >> >> > > > > > > > wrote:
> > >> >> > > > > > > > > > >
> > >> >> > > > > > > > > > >> Hi all,
> > >> >> > > > > > > > > > >>
> > >> >> > > > > > > > > > >> What do you all think about having Dockerfile
> > >> >> and Helm
> > >> >> > > chart
> > >> >> > > > > in
> > >> >> > > > > > > the
> > >> >> > > > > > > > > same
> > >> >> > > > > > > > > > >> "Airflow" Repo vs separate?
> > >> >> > > > > > > > > > >>
> > >> >> > > > > > > > > > >> I feel having a separate repo for Airflow
> > >> Dockerfile
> > >> >> and
> > >> >> > > > Helm
> > >> >> > > > > > > chart
> > >> >> > > > > > > > > have
> > >> >> > > > > > > > > > >> more benefits like easy to track changes (via
> > >> >> > Changelog),
> > >> >> > > > > easy for
> > >> >> > > > > > > > new
> > >> >> > > > > > > > > > >> contributors, separate release cadence.
> > >> >> > > > > > > > > > >>
> > >> >> > > > > > > > > > >> Currently, docker file and Helm Chart are
> inside
> > >> the
> > >> >> > same
> > >> >> > > > > repo and
> > >> >> > > > > > > > > when
> > >> >> > > > > > > > > > >> we release changelog for a new Airflow
> version,
> > it
> > >> >> would
> > >> >> > > > > include
> > >> >> > > > > > > all
> > >> >> > > > > > > > > > >> changes (Airflow + Dockerfile + Helm chart)
> > >> >> which I
> > >> >> > think
> > >> >> > > is
> > >> >> > > > > not
> > >> >> > > > > > > > that
> > >> >> > > > > > > > > > great.
> > >> >> > > > > > > > > > >>
> > >> >> > > > > > > > > > >> Also having them all inside a single repo
> means
> > >> >> changes
> > >> >> > in
> > >> >> > > > > Helm
> > >> >> > > > > > > > Chart
> > >> >> > > > > > > > > > and
> > >> >> > > > > > > > > > >> Dockerfile can block Airflow release. We could
> > use
> > >> >> > stable
> > >> >> > > > Helm
> > >> >> > > > > > > Chart
> > >> >> > > > > > > > > > >> version and Dockerfile version to test Airflow
> > >> >> so that
> > >> >> > > they
> > >> >> > > > > are
> > >> >> > > > > > > > > > blockers to
> > >> >> > > > > > > > > > >> release too.
> > >> >> > > > > > > > > > >>
> > >> >> > > > > > > > > > >> Happy to hear the thoughts from the community.
> > >> >> > > > > > > > > > >>
> > >> >> > > > > > > > > > >> Regards,
> > >> >> > > > > > > > > > >> Kaxil
> > >> >> > > > > > > > > > >>
> > >> >> > > > > > > > > > >
> > >> >> > > > > > > > > >
> > >> >> > > > > > > > >
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > --
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > Jarek Potiuk
> > >> >> > > > > > > > > Polidea < https://www.polidea.com/ [https://www.polidea.com/] > | Principal
> > >> Software
> > >> >> > > Engineer
> > >> >> > > > > > > > >
> > >> >> > > > > > > > > M: +48 660 796 129 <+48660796129>
> > >> >> > > > > > > > > [image: Polidea] < https://www.polidea.com/ [https://www.polidea.com/] >
> > >> >> > > > > > > > >
> > >> >> > > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > >
> > >> >> > > > > > > --
> > >> >> > > > > > >
> > >> >> > > > > > > Jarek Potiuk
> > >> >> > > > > > > Polidea < https://www.polidea.com/ [https://www.polidea.com/] > | Principal
> Software
> > >> >> Engineer
> > >> >> > > > > > >
> > >> >> > > > > > > M: +48 660 796 129 <+48660796129>
> > >> >> > > > > > > [image: Polidea] < https://www.polidea.com/ [https://www.polidea.com/] >
> > >> >> > > > > >
> > >> >> > > > > >
> > >> >> > > > > >
> > >> >> > > > > > --
> > >> >> > > > > >
> > >> >> > > > > > Jarek Potiuk
> > >> >> > > > > > Polidea < https://www.polidea.com/ [https://www.polidea.com/] > | Principal Software
> > >> Engineer
> > >> >> > > > > >
> > >> >> > > > > > M: +48 660 796 129 <+48660796129>
> > >> >> > > > > > [image: Polidea] < https://www.polidea.com/ [https://www.polidea.com/] >
> > >> >> > > >
> > >> >> > > >
> > >> >> > > >
> > >> >> > > > --
> > >> >> > > >
> > >> >> > > > Jarek Potiuk
> > >> >> > > > Polidea < https://www.polidea.com/ [https://www.polidea.com/] > | Principal Software
> > Engineer
> > >> >> > > >
> > >> >> > > > M: +48 660 796 129 <+48660796129>
> > >> >> > > > [image: Polidea] < https://www.polidea.com/ [https://www.polidea.com/] >
> > >> >> > > >
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >>
> > >> >> --
> > >> >>
> > >> >> Jarek Potiuk
> > >> >> Polidea < https://www.polidea.com/ [https://www.polidea.com/] > | Principal Software Engineer
> > >> >>
> > >> >> M: +48 660 796 129 <+48660796129>
> > >> >> [image: Polidea] < https://www.polidea.com/ [https://www.polidea.com/] >
> > >> >>
> > >> >
> > >>
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea < https://www.polidea.com/ [https://www.polidea.com/] > | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] < https://www.polidea.com/ [https://www.polidea.com/] >
> > >
> > >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea < https://www.polidea.com/ [https://www.polidea.com/] > | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] < https://www.polidea.com/ [https://www.polidea.com/] >
> >
>


--

Jarek Potiuk
Polidea < https://www.polidea.com/ [https://www.polidea.com/] > | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] < https://www.polidea.com/ [https://www.polidea.com/] >

--
   Jarek Potiuk                                                       
   Polidea [https://www.polidea.com/] | Principal Software Engineer   

M: +48 660 796 129 [tel:+48660796129]   
[https://www.polidea.com/]              




--
   Jarek Potiuk                                                       
   Polidea [https://www.polidea.com/] | Principal Software Engineer   

M: +48 660 796 129 [tel:+48660796129]   
[https://www.polidea.com/]

Re: Separate Repo vs MonoRepo for Dockerfile & Helm Chart

Posted by Jarek Potiuk <Ja...@polidea.com>.
Yep, that would be nice. Agree that this is not obvious where some files
come from.

Agree this could be done if everyone thinks it's a good idea. This would be
perfectly doable, we could even make it works with the whole history
maintained (we'd just need to include historical paths in the script).

And if we make it in time before 1.10.13, we could even release it within
1.10.13.

J


On Sun, Oct 25, 2020 at 10:03 PM Kamil Breguła <ka...@polidea.com>
wrote:

> I took a quick look and I like the overall concept, but I'm just wondering
> if it will be clear enough for users. Currently, these scripts copy
> different files from different directories and the mapping of the source to
> the destination is written in the scripts. This will make it difficult to
> contribute to this "sub-project". In my opinion, if we want to create new
> repositories from some files, we should only do it for one directory. If
> this directory has dependencies, we should try to break them down. The
> end-user should not get the impression that they are in contact with the
> copied repository at the first glance. Otherwise, we will not achieve our
> primary goal - to facilitate end-user use.
>
> In this case, it means that we should create a new directory in
> apache/airflow named "prod-docker-image" or similar and move to it the
> necessary Dockerfiles, documentation, scripts, and all other assets. In
> particular, this directory should contain README.md which actually
> describes the contents of that directory.
>
> A good example is /chart directory. It only has one dependency which is
> not is "/chart" directory  - the "Contributing" section in README.md refers
> to the file in the root directory of the repository. This link will stop
> working if we create a new repository from the entire directory. It will be
> trivial to fix.
>
> On Sun, Oct 25, 2020 at 9:18 PM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
>> Hello Everyone,
>>
>> I would like to come back to the discussion as I have *JUST* implemented
>> the solution (very simple but 100% working) to this monorepo vs. separate
>> repos.
>>
>> You can take a look at this repo of mine:
>> https://github.com/potiuk/airflow-docker. It is very simple and works
>> like a charm. I implemented it to solve the issue
>> https://github.com/apache/airflow/issues/11740
>>
>> This is a separate repo that people can use to have a separate
>> "read-only" repository that **only** keeps our Dockerfile-related stuff -
>> including the full history of changes related (and only those), full
>> traceability, and incremental, automated synchronization from our "airflow"
>> repo.
>>
>> I can - any time - set it up as "apache/airflow-docker" and get it to
>> synchronize every day or every hour.
>>
>> Here, how it works:
>>
>> * The "master" and "v1-10-stable" branches are filtered to only contain
>> files that are needed to build Prod Docker image
>> * We keep history of all relevant commits in those branches
>> * In the "main" branch we only keep the "scheduled" Github Actions
>> workflow that does the synchronization and README.md which explains what
>> needs to be done to build the docker image
>> * I am using the excellent "git-filter-repo" tool which does the job
>> really well and fast. Git-filter-repo is recommended by Git maintainers
>> over the old, slow and much worse built-in git-filter-branch:
>> https://git-scm.com/docs/git-filter-branch#_warning
>> * the jobs to synchronize the repo takes 1m30 s to run - it is rather
>> fast despite analyzing 13500 commits :)
>> * it runs incrementally - just adding new commits when they appear
>> * it is very simple, few lines script + few steps in Github Action to
>> checkout/push the right branches
>> * we keep all the commit mapping in the repo as well, so we have 1-1
>> relationship between the commits in the "docker repo" and the original ones
>> in Airflow repo
>> * synchronization is 1-way - airflow -> airlfow-docker
>> * we can use a very similar approach for synchronizing:
>>     * Helm chart
>>     * Open API clients
>>     * other stuff
>>
>> It also follows our source release strategy - it has the same
>> "properties" as our main repo - so it is merely a "convenience" way of
>> accessing the Docker customization options, but the same functionality is
>> available in our officially released sources.
>>
>> Do you think we should turn it into the "apache/airflow-docker" repo?
>>
>> J.
>>
>>
>>
>> On Sun, Jul 5, 2020 at 8:12 PM Daniel Imberman <da...@gmail.com>
>> wrote:
>>
>>> Worth noting that git has the ability to cherry-pick only specific
>>> directories. If we keep all of helm + tests in one directory, docker +
>>> tests in another, and core + tests in a third directory it would be pretty
>>> simple to automate splitting them.
>>>
>>>
>>> https://stackoverflow.com/questions/19821749/git-cherry-pick-or-merge-specific-directory-from-another-branch
>>>
>>> via Newton Mail [
>>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>>> ]
>>> On Sun, Jul 5, 2020 at 9:57 AM, Daniel Imberman <
>>> daniel.imberman@gmail.com> wrote:
>>> I can’t agree with this enough :). I think writing a few bots to
>>> separate out sections will be MUCH easier in the long run than maintaining
>>> multiple repos. Will also prevent the difficulty of setting up a proper dev
>>> environment for new contributors.
>>> via Newton Mail [
>>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>>> ]
>>> On Sun, Jul 5, 2020 at 9:53 AM, Jarek Potiuk <Ja...@polidea.com>
>>> wrote:
>>> Yeah. I think that the "monorepo" is the only way for now - until (or if)
>>> we reach the size (and maturity) that different teams take care of the
>>> different projects. Which might even not happen.
>>>
>>> But I would love to try the separate repos to publish/release still
>>> (maybe
>>> not immediately, but it is a nice concept). I think it should be rather
>>> easy (I will try it on my own repo first). Also, I think it has another
>>> advantage - those separate repos might actually run other kinds of tests
>>> -
>>> for example, to test if there is "everything" in that repo to release it
>>> (for example build helm chart) and whether there are no accidental use of
>>> stuff from outside of those dirs.
>>>
>>> I already thought about how to do it - it should be rather easy. Of
>>> course
>>> - like most of the time - there is a ready-to-use git command doing it
>>> for
>>> us. We simply need a bot running for that rep executing a variant of this
>>> command:
>>>
>>> https://docs.github.com/en/github/using-git/splitting-a-subfolder-out-into-a-new-repository
>>> (it
>>> should only take commits from the commit merged last time). So level of
>>> automation here is rather minimal.
>>>
>>> And if have those repos and at some point of time we decide to split
>>> eventually - we will have already repos with all history as a starting
>>> point.
>>>
>>> J.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> J.
>>>
>>>
>>> On Sun, Jul 5, 2020 at 4:42 PM Kaxil Naik <ka...@gmail.com> wrote:
>>>
>>> > Hmm.. I agree the git-sync would have been a difficult one to solve if
>>> we
>>> > had separate repositories.
>>> >
>>> > Well, in that case, the mono repo approach (like we have now) indeed
>>> makes
>>> > more sense.
>>> >
>>> > Regarding the Kubernetes approach, I feel the ones in staging (
>>> > https://github.com/kubernetes/kubernetes/tree/master/staging) are
>>> part of
>>> > the actual product itself but in our case we were discussing between
>>> Helm
>>> > chart and Dockerfile which are not actually part of the product. And we
>>> > will need a good deal of automation if we go down that route.
>>> > I think the plain mono-repo approach is better than that one.
>>> >
>>> > Regards,
>>> > Kaxil
>>> >
>>> >
>>> > On Sun, Jul 5, 2020 at 9:19 AM Jarek Potiuk <Ja...@polidea.com>
>>> > wrote:
>>> >
>>> > > And one more perfect illustration of what I am talking about.
>>> > >
>>> > > A very good thing just happened. I was running the PR while writing
>>> the
>>> > > email (long time as you might imagine) and the new K8S tests with
>>> 1.10.11
>>> > > just failed. https://github.com/apache/airflow/pull/9663
>>> > >
>>> > > If had released the helm chart before we would've clear (small)
>>> > > incompatibility here. And by seeing the test failing we could make
>>> > decision
>>> > > what to do:
>>> > >
>>> > > 1) fix it differently
>>> > > 2) document it as a breaking Helm change, "1.10.12+ image" and make
>>> test
>>> > > work in both cases
>>> > > 3) revert ...
>>> > >
>>> > > But at least we have na early warning that something is wrong. This
>>> is
>>> > the
>>> > > clear value of running the tests at every commit.
>>> > >
>>> > > J.
>>> > >
>>> > > On Sun, Jul 5, 2020 at 10:08 AM Jarek Potiuk <
>>> Jarek.Potiuk@polidea.com>
>>> > > wrote:
>>> > >
>>> > > > I just have another example of a case where splitting the repos and
>>> > using
>>> > > > only "released versions" across repositories might be a complete
>>> > overkill
>>> > > > when it comes to development complexity.
>>> > > >
>>> > > > We have this change from Aneesh:
>>> > > > https://github.com/apache/airflow/pull/9371 about adding a
>>> git-sync
>>> > > > option to the helm chart.
>>> > > >
>>> > > > That's a new feature, but we would like to test both 1.10 and the
>>> > master
>>> > > > version of KubernetesExecutor with that. It should work for both of
>>> > them
>>> > > -
>>> > > > there is no coupling/dependency in the "airflow' code for it.
>>> > > >
>>> > > > However, there is a strong coupling in the tests. We have the
>>> > > > "kubernetes_tests" running tests using all three: chart, production
>>> > > docker,
>>> > > > and Airflow, Those tests will have to be likely adapted to work
>>> with
>>> > the
>>> > > > new git-sync option. They were disabled previously as we had
>>> problems
>>> > > with
>>> > > > them before the helm chart was used for tests but we can turn them
>>> back
>>> > > on
>>> > > > now when git-sync is added to the helm chart. Those tests are part
>>> of
>>> > > > airflow test suite and we discussed with Daniel that they should
>>> stay
>>> > > there
>>> > > > - those tests are importing airflow code, they are using latest
>>> example
>>> > > > dags which are also in the airflow code.
>>> > > >
>>> > > > So we have two ways how we can develop this -
>>> > > > A) monorepo (current)
>>> > > > B) separate repos.
>>> > > >
>>> > > > Just to remind - he goal is that our change is tested against:
>>> > > >
>>> > > > 1) Released Airflow version (say 1.10.11).
>>> > > > 2) Development airflow version (master - soon possibly development)
>>> > > > 3) Development docker image built with either "development" or
>>> > "1.10.11"
>>> > > > (we can release the Docker image for 1.10.11 independently from the
>>> > > current
>>> > > > development HEAD). The docker image is supposed to work with any
>>> > version
>>> > > of
>>> > > > airflow
>>> > > >
>>> > > > In the case of A) Monorepo we have all that as a given.
>>> > > >
>>> > > > I just sent this really small PR that should do the job:
>>> > > > https://github.com/apache/airflow/pull/9663. What it does, it
>>> takes
>>> > the
>>> > > > latest "development" docker image, "development" chart, bakes in
>>> the
>>> > > latest
>>> > > > "example dags" from "development branch". The image uses either
>>> > > > "development" or released (from PyPI) "1.10.11" Airflow version -
>>> and
>>> > run
>>> > > > the "development" tests against it. This is exactly what we want.
>>> If we
>>> > > add
>>> > > > new features to the helm chart, the Kubernetes tests will have to
>>> be
>>> > > > updated to include that - and this will happen in the airflow
>>> > > "development"
>>> > > > branch. The REALLY good thing in it - since we are running those
>>> tests
>>> > in
>>> > > > CI build of airflow development branch - we prevent anyone from
>>> making
>>> > > > breaking changes. It is a given that both - the "development" of
>>> > airflow
>>> > > > and the "1.10.11" version of airflow will continue to work with the
>>> > image
>>> > > > and chart.
>>> > > >
>>> > > >
>>> > > > In the case of B) where we split the repos:
>>> > > >
>>> > > > We have to decide where to keep the "kubernetes_tests" - should
>>> they be
>>> > > in
>>> > > > "Airflow" or in "Helm". They are testing BOTH so we can choose
>>> either
>>> > > way.
>>> > > > Together with Daniel we plan to expand those tests to cover all the
>>> > > > different options we have in the Chart - testing all of it -
>>> Kubernetes
>>> > > > Executor, Celery Executor running on Kubernetes, MySQL (once we add
>>> > it),
>>> > > > etc. etc. So we want to make sure we have a matrix of tests
>>> covering a
>>> > > > number of deployment options. Those tests do not exist yet, and
>>> they
>>> > will
>>> > > > have to be written. In principle - they can be moved to the "Helm"
>>> > > > repository. That's where they conceptually belong. However - there
>>> is a
>>> > > > Huge value in running the tests in airflow "development" - the
>>> value is
>>> > > > that no-one will be able to break the "development" airflow,
>>> because
>>> > > those
>>> > > > tests are run with every PR. I think we have no choice but to run
>>> those
>>> > > > tests always in development. Otherwise, people maintaining the helm
>>> > chart
>>> > > > will have to fix the problems introduced by people changing Airflow
>>> > > code. I
>>> > > > think this is a pretty bad idea to allow that. So if we move those
>>> > tests
>>> > > to
>>> > > > Helm Chart repo we have to figure out how to run those "kubernetes"
>>> > tests
>>> > > > in CI for every build. This is quite possible - by getting the
>>> latest
>>> > > > master from helm chart and running the build, but it has several
>>> > > problems:
>>> > > >
>>> > > > 1) The test code for CI will have to continue to stay in Airflow
>>> (to
>>> > run
>>> > > > CI builds) - this means that we already have coupling and some code
>>> > > related
>>> > > > to the execution of the helm tests has to be any way in Airflow.
>>> > > >
>>> > > > 2) Bigger problem. What happens if as "Airflow developer" you DO
>>> > > introduce
>>> > > > a change that breaks the helm chart? You will see a CI error
>>> and.....
>>> > You
>>> > > > will not know what to do. Do you involve people who maintain the
>>> helm
>>> > > chart
>>> > > > and wait for them? I think not. You should be able to reproduce the
>>> > > problem
>>> > > > locally and fix it yourself (maybe with the help of others - but
>>> you
>>> > > should
>>> > > > be able to fix your own commit). We would have to teach people how
>>> to
>>> > > bring
>>> > > > the docker image and helm chart code from the latest version and
>>> run
>>> > the
>>> > > > tests. We could do it automatically with Breeze (similarly as we do
>>> > with
>>> > > > other integrations - where we bring in Kerberos, Mongo, and a
>>> multitude
>>> > > of
>>> > > > others) without them even knowing it, but this might be fairly
>>> complex
>>> > > and
>>> > > > prone to errors. In Monorepo - we already have a simple way of
>>> > > reproducing
>>> > > > and running the tests locally and everything is in one place.
>>> > > >
>>> > > > 3) There is a chance that someone makes a change in Helm in
>>> parallel
>>> > to a
>>> > > > change in Airflow that breaks it. This could easily happen in the
>>> > > "git-sync
>>> > > > case" or when we add "MySQL" for example in the future. And there
>>> is no
>>> > > way
>>> > > > to prevent it.
>>> > > >
>>> > > > 4) If we only test against "released" Helm and Airflow (that was
>>> one of
>>> > > > the suggestions), the problem is even bigger. How do you know that
>>> you
>>> > do
>>> > > > not break the currently "developed" helm chart? Or how do you know
>>> that
>>> > > the
>>> > > > currently "developed" helm chart works with latest Airflow
>>> release? If
>>> > > you
>>> > > > do not do those checks at the "commit" time, then you defer this to
>>> > > > "release time" and only then you might find out that decisions you
>>> made
>>> > > > during development have to be reverted. This is a very, very bad
>>> idea
>>> > > IMHO
>>> > > > again leading to the case that the release manager will have to fix
>>> > > > problems introduced by others.
>>> > > >
>>> > > > J,
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Fri, Jul 3, 2020 at 10:28 PM Ash Berlin-Taylor <as...@apache.org>
>>> > > wrote:
>>> > > >
>>> > > >> Monorepo FTW.
>>> > > >>
>>> > > >> Yes, it gets a little bit messier around release, but the
>>> approach of
>>> > > >> automatically extracting out the commits (or parts of commits) to
>>> a
>>> > > >> separate repo for releasing may be the solution to that problem
>>> > > >>
>>> > > >>
>>> > > >> -ash
>>> > > >>
>>> > > >> On Jul 3 2020, at 7:51 pm, Kaxil Naik <ka...@gmail.com>
>>> wrote:
>>> > > >>
>>> > > >> > I will take a look at the Kubernetes approach and get back to
>>> this
>>> > > >> thread.
>>> > > >> >
>>> > > >> > We had a discussion with Daniel yesterday and we are both
>>> concerned
>>> > > >> about
>>> > > >> >> all the overhead for people like us who work on all three
>>> > "entities"
>>> > > >> >> at the
>>> > > >> >> same time. Even just explaining how to work with Pull Requests
>>> and
>>> > in
>>> > > >> what
>>> > > >> >> sequence those PRs would have to be opened and merged in case
>>> of
>>> > > >> changes
>>> > > >> >> that are spanning across several "entities" - was a challenge.
>>> I
>>> > was
>>> > > >> unable
>>> > > >> >> to clearly explain the sequence and way of reviewing/merging
>>> the
>>> > PRs
>>> > > >> that
>>> > > >> >> will have to be made if we have submodules. This is a bad sign
>>> as I
>>> > > was
>>> > > >> >> using submodules in the past and know how it works but I was
>>> unable
>>> > > to
>>> > > >> >> explain it clearly.
>>> > > >> >
>>> > > >> >
>>> > > >> > We don't even need submodules tbh. We can just use Bash Script
>>> that
>>> > > >> > pulls a
>>> > > >> > pinned Helm Chart version.
>>> > > >> > We only need Helm chart to run integration test for k8s
>>> (atleast for
>>> > > >> now).
>>> > > >> > We already use tons of Bash scripts.
>>> > > >> >
>>> > > >> > One of the important benefits of separation that changes in one
>>> > > >> component
>>> > > >> > should not need change in other component, atleast
>>> > > >> > not immediately.
>>> > > >> >
>>> > > >> > Changes in Helm chart and Docker file should never need changes
>>> in
>>> > > >> Airflow
>>> > > >> > Changes in Airflow should only ever need a change in Dockerfile
>>> and
>>> > > Helm
>>> > > >> > Chart after a new version is released.
>>> > > >> >
>>> > > >> > I just had a talk with Daniel too and still didn't find a good
>>> > enough
>>> > > >> > reason to have them in the same repo.
>>> > > >> >
>>> > > >> > I will definitely look at the Kubernetes approach (maybe it is
>>> > better)
>>> > > >> and
>>> > > >> > get back to this thread. But as of now I don't see any major
>>> PROs
>>> > > >> > for having them in the same repo.
>>> > > >> >
>>> > > >> > Regards,
>>> > > >> > Kaxil
>>> > > >> >
>>> > > >> >
>>> > > >> >
>>> > > >> > On Fri, Jul 3, 2020 at 5:00 PM Jarek Potiuk <
>>> > Jarek.Potiuk@polidea.com
>>> > > >
>>> > > >> > wrote:
>>> > > >> >
>>> > > >> >> I think Ry's point is an important one - I thought about
>>> writing a
>>> > > >> longer
>>> > > >> >> post but I looked at the Kubernetes structure and I really
>>> like it
>>> > so
>>> > > >> just
>>> > > >> >> wanted to comment on this last one.
>>> > > >> >>
>>> > > >> >> Seems that it is simply one "authoritative" (or source of
>>> truth)
>>> > repo
>>> > > >> where
>>> > > >> >> everything is developed in monorepo fashion but then there is
>>> a bot
>>> > > >> >> that moves every commit related to subdirectories to those
>>> > > "split-out"
>>> > > >> >> repos. There are never direct commits of people or PRs in the
>>> > > >> "split-out"
>>> > > >> >> repositories. This is very similar to my original proposal to
>>> have
>>> > > >> >> dedicated repos used for releases - but with an automated way
>>> of
>>> > > >> publishing
>>> > > >> >> the commits to the "separated" repos at the moment, they are
>>> merged
>>> > > to
>>> > > >> >> master in the main repo. I love it.
>>> > > >> >>
>>> > > >> >> I think it's really good and "pragmatic" solution. The code is
>>> > > >> >> available in
>>> > > >> >> separate repos, including the history of commits related to
>>> each
>>> > > >> "entity"
>>> > > >> >> (so only chart-related commits in chart repo). Issues for
>>> > particular
>>> > > >> >> "entities" are in those separate repos as well (something that
>>> > Kaxil
>>> > > >> >> mentioned). Users (not developers!) who are interested only in
>>> > > >> Dockerfile
>>> > > >> >> or Helm Chart have separate repos they can look at - with only
>>> > > relevant
>>> > > >> >> changes and history of releases for that particular entity.
>>> They
>>> > can
>>> > > >> raise
>>> > > >> >> issues there (and in GitHub, we can easily refer to those
>>> issues
>>> > from
>>> > > >> the
>>> > > >> >> main "airflow" repo). All the discussion from "user issues" are
>>> > kept
>>> > > >> >> in the
>>> > > >> >> relevant repositories. Still - comments about development
>>> changes
>>> > > (and
>>> > > >> >> related issues) might still be kept in the main "airflow" repo
>>> -
>>> > next
>>> > > >> to
>>> > > >> >> other "development" changes.
>>> > > >> >>
>>> > > >> >> We can run separate releases from those linked repositories and
>>> > even
>>> > > >> >> publish sources directly from those repositories rather than
>>> from
>>> > the
>>> > > >> main
>>> > > >> >> one. At the same time - we avoid all the hassle of submodules.
>>> > > >> >>
>>> > > >> >> We had a discussion with Daniel yesterday and we are both
>>> concerned
>>> > > >> about
>>> > > >> >> all the overhead for people like us who work on all three
>>> > "entities"
>>> > > >> >> at the
>>> > > >> >> same time. Even just explaining how to work with Pull Requests
>>> and
>>> > in
>>> > > >> what
>>> > > >> >> sequence those PRs would have to be opened and merged in case
>>> of
>>> > > >> changes
>>> > > >> >> that are spanning across several "entities" - was a challenge.
>>> I
>>> > was
>>> > > >> unable
>>> > > >> >> to clearly explain the sequence and way of reviewing/merging
>>> the
>>> > PRs
>>> > > >> that
>>> > > >> >> will have to be made if we have submodules. This is a bad sign
>>> as I
>>> > > was
>>> > > >> >> using submodules in the past and know how it works but I was
>>> unable
>>> > > to
>>> > > >> >> explain it clearly.
>>> > > >> >>
>>> > > >> >> I really, really like Kubernetes approach - seems that it's
>>> one of
>>> > > the
>>> > > >> >> cases where we can "eat cake and have it too".
>>> > > >> >>
>>> > > >> >> J.
>>> > > >> >>
>>> > > >> >>
>>> > > >> >> On Thu, Jul 2, 2020 at 5:59 PM Ry Walker <ry...@rywalker.com>
>>> wrote:
>>> > > >> >>
>>> > > >> >> > One reason to have a monorepo is for project branding, and
>>> end
>>> > user
>>> > > >> >> > experience. But for component development experience, it's
>>> nice
>>> > to
>>> > > >> >> have a
>>> > > >> >> > small, dedicated repo.
>>> > > >> >> >
>>> > > >> >> > I think the git submodule approach is technically sound, but
>>> is
>>> > at
>>> > > >> odds
>>> > > >> >> > with making the project easy to consume/understand from the
>>> end
>>> > > user
>>> > > >> >> > perspective, especially if we expand the use of subprojects.
>>> And
>>> > > >> >> the main
>>> > > >> >> > Airflow commit graph would appear to be slowing down which
>>> is bad
>>> > > for
>>> > > >> >> > Airflow brand perception.
>>> > > >> >> >
>>> > > >> >> > Kubernetes has many sub-repos that are integrated into the
>>> main
>>> > > >> >> repo -
>>> > > >> >> > which I think could be the best of both worlds:
>>> > > >> >> > Example:
>>> > > >> https://github.com/kubernetes/kubernetes/tree/master/staging
>>> > > >> >> >
>>> > > >> >> > I haven't dug in very deeply, and I won't pretend to
>>> understand
>>> > how
>>> > > >> >> > challenging it may be to maintain this structure, but I'd
>>> support
>>> > > >> >> breaking
>>> > > >> >> > more components out of the main Airflow repo for dev purposes
>>> > (for
>>> > > >> >> example,
>>> > > >> >> > in the future, it'd be nice to have airflow-cli, airflow-api,
>>> > > >> >> > airflow-scheduler, individual provider repos that are cleanly
>>> > > >> separated)
>>> > > >> >> as
>>> > > >> >> > long as we bring the commits/contributions back into the
>>> monorepo
>>> > > >> with
>>> > > >> >> > automation.
>>> > > >> >> >
>>> > > >> >> > Maybe we could dive a little deeper into how K8s is
>>> operating,
>>> > > before
>>> > > >> >> going
>>> > > >> >> > with submodules?
>>> > > >> >> >
>>> > > >> >> > -Ry
>>> > > >> >> >
>>> > > >> >> >
>>> > > >> >> >
>>> > > >> >> >
>>> > > >> >> > On Thu, Jul 2, 2020 at 11:24 AM Kaxil Naik <
>>> kaxilnaik@gmail.com>
>>> > > >> wrote:
>>> > > >> >> >
>>> > > >> >> > > Let's come to a consensus first before we do anything :-)
>>> > > >> >> > >
>>> > > >> >> > > Is everyone happy with separate repo approach? Let's wait
>>> for
>>> > 72
>>> > > >> hours
>>> > > >> >> to
>>> > > >> >> > > hear from all and then have a plan on how we do it? WDYT?
>>> > > >> >> > >
>>> > > >> >> > > But indeed git submodules approach sounds good. We do it
>>> for
>>> > for
>>> > > >> >> *Airflow
>>> > > >> >> > > Site *(
>>> > > >> >> > >
>>> > > >> >> > >
>>> > > >> >> >
>>> > > >> >>
>>> > > >>
>>> > >
>>> >
>>> https://github.com/apache/airflow-site/tree/master/landing-pages/site/themes
>>> > > >> >> > > )
>>> > > >> >> > > too.
>>> > > >> >> > >
>>> > > >> >> > > Regards,
>>> > > >> >> > > Kaxil
>>> > > >> >> > >
>>> > > >> >> > > On Thu, Jul 2, 2020 at 4:15 PM Jarek Potiuk <
>>> > > >> Jarek.Potiuk@polidea.com>
>>> > > >> >> > > wrote:
>>> > > >> >> > >
>>> > > >> >> > > > Absolutely - I am happy to add "best practices" and short
>>> > > >> >> "howto do
>>> > > >> >> > stuff
>>> > > >> >> > > > with git submodules" - and this knowledge will only be
>>> > needed
>>> > > >> for
>>> > > >> >> > > > interacting with prod image/helmchart/running kubernetes
>>> > tests.
>>> > > >> For
>>> > > >> >> all
>>> > > >> >> > > the
>>> > > >> >> > > > other purposes it should be "business as usual".
>>> > > >> >> > > >
>>> > > >> >> > > > On Thu, Jul 2, 2020 at 4:53 PM Daniel Imberman <
>>> > > >> >> > > daniel.imberman@gmail.com>
>>> > > >> >> > > > wrote:
>>> > > >> >> > > >
>>> > > >> >> > > > > I think git submodules sounds like a great idea. We
>>> would
>>> > > >> >> need to
>>> > > >> >> > write
>>> > > >> >> > > > > this into the CONTRIBUTING.md to let people know how
>>> to do
>>> > it
>>> > > >> but
>>> > > >> >> > It’s
>>> > > >> >> > > a
>>> > > >> >> > > > > “teach once” situation.
>>> > > >> >> > > > >
>>> > > >> >> > > > > via Newton Mail [
>>> > > >> >> > > > >
>>> > > >> >> > > >
>>> > > >> >> > >
>>> > > >> >> >
>>> > > >> >>
>>> > > >>
>>> > >
>>> >
>>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>>> > > >> >> > > > > ]
>>> > > >> >> > > > > On Thu, Jul 2, 2020 at 2:44 AM, Tomasz Urbaszek <
>>> > > >> >> > turbaszek@apache.org>
>>> > > >> >> > > > > wrote:
>>> > > >> >> > > > > I support the idea of separate repos. The git
>>> submodules
>>> > > >> mentioned
>>> > > >> >> by
>>> > > >> >> > > > > Jarek sounds like an interesting solution. It may add
>>> some
>>> > > >> >> complexity
>>> > > >> >> > > > > for new contributors but it's not rocket science. If we
>>> > agree
>>> > > >> on
>>> > > >> >> > using
>>> > > >> >> > > > > this we should add small how-to in contributing.rst I
>>> think
>>> > > >> (i.e.
>>> > > >> >> do
>>> > > >> >> > I
>>> > > >> >> > > > > have to have fork of each repo?).
>>> > > >> >> > > > >
>>> > > >> >> > > > > As stressed previously if we go this route we should
>>> make
>>> > > >> >> sure we
>>> > > >> >> > have
>>> > > >> >> > > > > nice testing of all those three components. Regarding
>>> the
>>> > > >> >> versioning,
>>> > > >> >> > > > > I have no strong opinion but I fully support using
>>> separate
>>> > > >> issues
>>> > > >> >> > for
>>> > > >> >> > > > > airflow, docker, and helm.
>>> > > >> >> > > > >
>>> > > >> >> > > > > Tomek
>>> > > >> >> > > > >
>>> > > >> >> > > > >
>>> > > >> >> > > > > On Thu, Jul 2, 2020 at 9:26 AM Jarek Potiuk <
>>> > > >> >> > Jarek.Potiuk@polidea.com>
>>> > > >> >> > > > > wrote:
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > On Thu, Jul 2, 2020 at 3:16 AM Daniel Imberman <
>>> > > >> >> > > > > daniel.imberman@gmail.com>
>>> > > >> >> > > > > > wrote:
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > I’m fine with keeping it as three separate repos but
>>> > > merging
>>> > > >> >> > testing
>>> > > >> >> > > > > > > somehow (e.g. the source code chart would pull the
>>> > > >> helm/docker
>>> > > >> >> > > chart
>>> > > >> >> > > > > into
>>> > > >> >> > > > > > > .build) but we need to do it in a way that doesn’t
>>> make
>>> > > >> testing
>>> > > >> >> > too
>>> > > >> >> > > > > > > difficult.
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > So for example: How do I test/integration test a
>>> change
>>> > > >> that
>>> > > >> >> > > > involves a
>>> > > >> >> > > > > > > change to all three and has to be done at the same
>>> > time?
>>> > > >> >> Perhaps
>>> > > >> >> > a
>>> > > >> >> > > > > user can
>>> > > >> >> > > > > > > “register” a branch of helm and docker when they
>>> start
>>> > up
>>> > > >> >> breeze?
>>> > > >> >> > > Or
>>> > > >> >> > > > > > > perhaps we create a “parent” integration test that
>>> uses
>>> > > the
>>> > > >> >> three
>>> > > >> >> > > > > together?
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > Yes, those are exactly my concerns when splitting the
>>> > > repos.
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > I think testing for development should remain in the
>>> > > >> "airflow"
>>> > > >> >> > repo.
>>> > > >> >> > > It
>>> > > >> >> > > > > is
>>> > > >> >> > > > > > the "central one" in fact. I slept it over and I
>>> think
>>> > > using
>>> > > >> >> > > "released"
>>> > > >> >> > > > > > versions for development testing will suffer from
>>> this
>>> > "we
>>> > > >> >> need a
>>> > > >> >> > > > change
>>> > > >> >> > > > > in
>>> > > >> >> > > > > > all three of those".
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > But we have an easy solution I think.
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > I think that simply setting submodules properly
>>> should do
>>> > > >> >> to the
>>> > > >> >> > job:
>>> > > >> >> > > > > > https://git-scm.com/book/en/v2/Git-Tools-Submodules.
>>> > They
>>> > > >> seem
>>> > > >> >> to
>>> > > >> >> > be
>>> > > >> >> > > > > > perfect for our case.
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > For those who have not used it - in short -
>>> submodules
>>> > work
>>> > > >> in
>>> > > >> >> the
>>> > > >> >> > > way
>>> > > >> >> > > > > that
>>> > > >> >> > > > > > they register the "linked repos" and store related
>>> "hash"
>>> > > >> >> of the
>>> > > >> >> > > commit
>>> > > >> >> > > > > > from that linked repo. For example, the "chart"
>>> folder
>>> > will
>>> > > >> >> be a
>>> > > >> >> > link
>>> > > >> >> > > > to
>>> > > >> >> > > > > > "apache/airflow-helm-chart". We can also move the
>>> prod
>>> > > >> Dockerfile
>>> > > >> >> > to
>>> > > >> >> > > a
>>> > > >> >> > > > > > subfolder and link it to the separate repo. Git
>>> submodule
>>> > > >> >> has a
>>> > > >> >> > > > > > built-in mechanism to a) update to the latest
>>> version of
>>> > > the
>>> > > >> >> repo,
>>> > > >> >> > b)
>>> > > >> >> > > > > > commit your changes to the linked repo from there
>>> which
>>> > is
>>> > > >> >> all we
>>> > > >> >> > > > need. I
>>> > > >> >> > > > > > used those few times - I never liked submodules for
>>> > sharing
>>> > > >> >> > "library"
>>> > > >> >> > > > > code,
>>> > > >> >> > > > > > but for sharing helm/Docker It seems perfect.
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > From the "regular" developer point of view - you do
>>> not
>>> > > >> >> need to
>>> > > >> >> > > > > get/update
>>> > > >> >> > > > > > submodules if you do not need to use them - so for
>>> all
>>> > the
>>> > > >> >> > > development
>>> > > >> >> > > > > > purposes if you only change the "airflow" code, you
>>> would
>>> > > not
>>> > > >> >> even
>>> > > >> >> > > need
>>> > > >> >> > > > > to
>>> > > >> >> > > > > > sync chart or Dockerfile. You do "git checkout" as
>>> usual
>>> > > >> >> and it
>>> > > >> >> > > should
>>> > > >> >> > > > > > work. So basically - no change for "regular" airflow
>>> > > >> development.
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > However, if you do need to work on helm + Docker +
>>> code,
>>> > > >> >> then you
>>> > > >> >> > > > simply
>>> > > >> >> > > > > to
>>> > > >> >> > > > > > "git submodule update", go to the linked "helm" or
>>> > "docker"
>>> > > >> >> folder,
>>> > > >> >> > > > > > checkout the "master" version and you start making
>>> > changes.
>>> > > >> The
>>> > > >> >> > only
>>> > > >> >> > > > > thing
>>> > > >> >> > > > > > to remember when you want to push your changes is to
>>> do
>>> > > >> >> `git push
>>> > > >> >> > > > > > --recurse-sumbodules="check" ` and it will make sure
>>> that
>>> > > >> >> all the
>>> > > >> >> > > repos
>>> > > >> >> > > > > are
>>> > > >> >> > > > > > updated, It is a bit involved, but latest git version
>>> > have
>>> > > >> >> a very
>>> > > >> >> > > good
>>> > > >> >> > > > > > support and it must only be used by people who work
>>> on
>>> > > >> >> airflow +
>>> > > >> >> > > > docker +
>>> > > >> >> > > > > > helm - all the others are unaffected.
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > From the CI perspective also nothing changes - when
>>> we
>>> > > >> checkout
>>> > > >> >> the
>>> > > >> >> > > > code
>>> > > >> >> > > > > we
>>> > > >> >> > > > > > will include submodules and our test harness will be
>>> > > largely
>>> > > >> >> > > unchanged.
>>> > > >> >> > > > > > Submodule provides us with the right mechanism for
>>> cross
>>> > > >> >> dependency
>>> > > >> >> > > > even
>>> > > >> >> > > > > if
>>> > > >> >> > > > > > we use branches.
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > If everyone will be ok with that - I am happy to set
>>> it
>>> > up,
>>> > > >> With
>>> > > >> >> > > > > submodules
>>> > > >> >> > > > > > - we can switch to separate repos even without
>>> releasing
>>> > > >> >> helm and
>>> > > >> >> > > Prod
>>> > > >> >> > > > > > chart "officially".
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > J.
>>> > > >> >> > > > > >
>>> > > >> >> > > > > >
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > via Newton Mail [
>>> > > >> >> > > > > > >
>>> > > >> >> > > > >
>>> > > >> >> > > >
>>> > > >> >> > >
>>> > > >> >> >
>>> > > >> >>
>>> > > >>
>>> > >
>>> >
>>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>>> > > >> >> > > > > > > ]
>>> > > >> >> > > > > > > On Wed, Jul 1, 2020 at 3:20 PM, Jarek Potiuk <
>>> > > >> >> > > > Jarek.Potiuk@polidea.com
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > > wrote:
>>> > > >> >> > > > > > > Sure. We can work with such an approach. There
>>> will be
>>> > > some
>>> > > >> >> > > > > dependencies
>>> > > >> >> > > > > > > that we might find are problematic, but If we all
>>> see
>>> > > >> >> that it's
>>> > > >> >> > > > > > > worth trying, there is a clear benefit that it
>>> makes
>>> > for
>>> > > a
>>> > > >> >> > "clean"
>>> > > >> >> > > > > > > split between those different "entities". And
>>> possibly
>>> > > >> >> once we
>>> > > >> >> > > > release
>>> > > >> >> > > > > > > first versions of both image and chart, such
>>> problems
>>> > > >> >> will be
>>> > > >> >> > rare
>>> > > >> >> > > > and
>>> > > >> >> > > > > easy
>>> > > >> >> > > > > > > to fix.
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > I personally think such split is inevitable
>>> eventually,
>>> > > >> it's
>>> > > >> >> > just a
>>> > > >> >> > > > > matter
>>> > > >> >> > > > > > > when to do it. If we decide to make this happen
>>> soon -
>>> > I
>>> > > am
>>> > > >> >> more
>>> > > >> >> > > than
>>> > > >> >> > > > > happy
>>> > > >> >> > > > > > > to work on making the split reality.
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > One prerequisite to that is that all those - Helm
>>> > Chart,
>>> > > >> Prod
>>> > > >> >> > Image
>>> > > >> >> > > > and
>>> > > >> >> > > > > > > Airflow are released in stable versions separately
>>> > > >> >> "officially" -
>>> > > >> >> > > > from
>>> > > >> >> > > > > the
>>> > > >> >> > > > > > > current sources (otherwise there will be no way to
>>> test
>>> > > >> >> > > cross-repo).
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > I think for that we will need to agree on the
>>> > versioning
>>> > > >> scheme
>>> > > >> >> > and
>>> > > >> >> > > > > cadence
>>> > > >> >> > > > > > > for the Image and Helm Chart, then copy sources
>>> from
>>> > > >> airflow
>>> > > >> >> and
>>> > > >> >> > > > > release
>>> > > >> >> > > > > > > them as "baseline" including setup the tests for
>>> all of
>>> > > >> >> those -
>>> > > >> >> > > then
>>> > > >> >> > > > we
>>> > > >> >> > > > > > > can remove both Helm and Dockerfile from the
>>> airflow
>>> > > repo.
>>> > > >> >> Happy
>>> > > >> >> > to
>>> > > >> >> > > > > help
>>> > > >> >> > > > > > > with that if that's the direction we choose as a
>>> > > >> >> community. It
>>> > > >> >> is
>>> > > >> >> > > > > important
>>> > > >> >> > > > > > > though that we keep the cross-repo testing
>>> working. We
>>> > > >> >> have it
>>> > > >> >> > > > working
>>> > > >> >> > > > > as
>>> > > >> >> > > > > > > of yesterday, so now the matter is - whatever we
>>> do we
>>> > > >> >> keep it
>>> > > >> >> > > > running
>>> > > >> >> > > > > and
>>> > > >> >> > > > > > > have development environment support easy
>>> development
>>> > and
>>> > > >> >> testing
>>> > > >> >> > > of
>>> > > >> >> > > > > > > either of the three (including CI testing
>>> cross-repos)
>>> > ,
>>> > > >> That's
>>> > > >> >> > the
>>> > > >> >> > > > > only
>>> > > >> >> > > > > > > really important thing to me - the rest is more of
>>> > > >> technicality
>>> > > >> >> > how
>>> > > >> >> > > > we
>>> > > >> >> > > > > link
>>> > > >> >> > > > > > > the repos, but principle remains.
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > Do we have an idea for the versioning scheme that
>>> we
>>> > > >> >> would like
>>> > > >> >> > to
>>> > > >> >> > > > use
>>> > > >> >> > > > > for
>>> > > >> >> > > > > > > the Helm Chart and prod image ?
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > Should we make it CalVer
>>> > > >> >> <https://calver.org/overview.html> or
>>> > > >> >> > > > SemVer
>>> > > >> >> > > > > > > <https://semver.org/> (or some other scheme)? And
>>> how
>>> > > >> should
>>> > > >> >> we
>>> > > >> >> > > > treat
>>> > > >> >> > > > > the
>>> > > >> >> > > > > > > combinations with Airflow?
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > My thoughts (but I have no strong opinions as long
>>> as
>>> > > >> someone
>>> > > >> >> > > > proposes
>>> > > >> >> > > > > more
>>> > > >> >> > > > > > > sensible versioning schemes):
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > 1) Airflow code - we continue the release scheme we
>>> > have
>>> > > >> (with
>>> > > >> >> > > > > deciding on
>>> > > >> >> > > > > > > 2.* scheme for the release). I expect in the
>>> future we
>>> > > >> might
>>> > > >> >> > decide
>>> > > >> >> > > > on
>>> > > >> >> > > > > > > doing branches or patches so for 2.* I'd opt for
>>> going
>>> > > full
>>> > > >> >> > SemVer
>>> > > >> >> > > > > approach
>>> > > >> >> > > > > > > and patches released from branches.
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > 2) I believe that Helm Chart can be versioned with
>>> its
>>> > > own
>>> > > >> >> > version
>>> > > >> >> > > > > (then
>>> > > >> >> > > > > > > you specify the image version as helm parameter).
>>> For
>>> > the
>>> > > >> Helm
>>> > > >> >> > > Chart
>>> > > >> >> > > > I
>>> > > >> >> > > > > > > think CalVer might be OK as I do not expect any
>>> > > >> >> branching/patches
>>> > > >> >> > > in
>>> > > >> >> > > > > the
>>> > > >> >> > > > > > > future - I'd expect that there will be a single
>>> stream
>>> > of
>>> > > >> >> > releases.
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > 3) Dockerfile (+ related files such as
>>> .dockerignore,
>>> > > empty
>>> > > >> >> dir,
>>> > > >> >> > > > > > > entrypoints etc). i do not imagine a lot of
>>> branching
>>> > for
>>> > > >> >> those -
>>> > > >> >> > > we
>>> > > >> >> > > > > > > should be able to release a new version of a
>>> Dockerfile
>>> > > (+
>>> > > >> >> > related
>>> > > >> >> > > > > files)
>>> > > >> >> > > > > > > working with nearly any earlier Airflow release, so
>>> > > CalVer
>>> > > >> >> seems
>>> > > >> >> > > > like a
>>> > > >> >> > > > > > > good choice.
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > 4) Image versioning becomes a bit most complex
>>> because
>>> > > the
>>> > > >> >> image
>>> > > >> >> > > tag
>>> > > >> >> > > > is
>>> > > >> >> > > > > > > always combination of:
>>> > > >> >> > > > > > > * Dockerfile (+ related files) version
>>> > > >> >> > > > > > > * Airflow Version
>>> > > >> >> > > > > > > * Python Version
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > An example versioning I can imagine:
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > *Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 -
>>> > patch
>>> > > >> level
>>> > > >> >> > (if
>>> > > >> >> > > we
>>> > > >> >> > > > > > > decide to have patches).
>>> > > >> >> > > > > > > *Dockerfile: *2020.07.12, 2020.08.20...... ->
>>> depending
>>> > > >> >> when we
>>> > > >> >> > > > release
>>> > > >> >> > > > > > > them
>>> > > >> >> > > > > > > *Helm Chart*: 2020.07.10, 2020.08.09 ...... Each
>>> Helm
>>> > > Chart
>>> > > >> >> has a
>>> > > >> >> > > > > minimum
>>> > > >> >> > > > > > > version of both Dockerfile and Airflow versions it
>>> > works
>>> > > >> with.
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > *Example Docker Image tags:*
>>> > > >> >> > > > > > >
>>> > > >> apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > WDYT?
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > J,
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik <
>>> > > >> >> kaxilnaik@gmail.com>
>>> > > >> >> > > > > wrote:
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > > I think we should have "separate repos for
>>> > development"
>>> > > >> too.
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > > 3 Repos in total:
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > > 1) apache/airflow
>>> > > >> >> > > > > > > > 2) apache/airflow-docker-image
>>> > > >> >> > > > > > > > 3) apache/airflow-helm-chart
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > > (1) *apache/airflow* should use a pinned stable
>>> > version
>>> > > >> of
>>> > > >> >> > > Airflow
>>> > > >> >> > > > > Helm
>>> > > >> >> > > > > > > > chart to run Kubernetes tests
>>> > > >> >> > > > > > > > (2) *apache/airflow* already has *Dockerfile.ci*
>>> file
>>> > > >> which
>>> > > >> >> it
>>> > > >> >> > > can
>>> > > >> >> > > > > use to
>>> > > >> >> > > > > > > > run airflow tests on docker images.
>>> > > >> >> > > > > > > > (3) *apache/airflow-docker-image *should use the
>>> > latest
>>> > > >> >> > available
>>> > > >> >> > > > > stable
>>> > > >> >> > > > > > > > version of airflow
>>> > > >> >> > > > > > > > (4) *apache/airflow-helm-chart *should use the
>>> latest
>>> > > >> >> available
>>> > > >> >> > > > > stable
>>> > > >> >> > > > > > > > version of airflow
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > > Having such split also makes some updates more
>>> > > >> >> difficult -
>>> > > >> >> for
>>> > > >> >> > > > > example if
>>> > > >> >> > > > > > > > > we add new "extra" to Airflow that will
>>> require to
>>> > > >> install
>>> > > >> >> > > "apt"
>>> > > >> >> > > > > > > > dependency
>>> > > >> >> > > > > > > > > in Dockerfile, we will have to split it into
>>> first
>>> > > >> adding
>>> > > >> >> the
>>> > > >> >> > > > > > > dependency
>>> > > >> >> > > > > > > > to
>>> > > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add
>>> the
>>> > > >> >> extra to
>>> > > >> >> > > > airflow
>>> > > >> >> > > > > with
>>> > > >> >> > > > > > > > > setup.py.
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > > Adding a new extra to setup.py would not (and
>>> should
>>> > > not)
>>> > > >> >> > impact
>>> > > >> >> > > > the
>>> > > >> >> > > > > > > > development of *apache/airflow-docker-image*
>>> > > >> >> > > > > > > > Once an RC is cut for apache/airflow or after a
>>> new
>>> > > >> version
>>> > > >> >> is
>>> > > >> >> > > > > released
>>> > > >> >> > > > > > > for
>>> > > >> >> > > > > > > > apache/airflow, we can work on supporting the new
>>> > > airflow
>>> > > >> >> > version
>>> > > >> >> > > > in
>>> > > >> >> > > > > the
>>> > > >> >> > > > > > > > Production Docker Image.
>>> > > >> >> > > > > > > > While doing that we can add all the libraries
>>> that
>>> > are
>>> > > >> needed
>>> > > >> >> > by
>>> > > >> >> > > > the
>>> > > >> >> > > > > new
>>> > > >> >> > > > > > > > Airflow Version and we will have a clean commit
>>> > history
>>> > > >> and
>>> > > >> >> > > > > changelog for
>>> > > >> >> > > > > > > > Docker image.
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > > We definitely do not need to work parallelly on
>>> both
>>> > > the
>>> > > >> >> repos.
>>> > > >> >> > > By
>>> > > >> >> > > > > doing
>>> > > >> >> > > > > > > > development in a separate repo we keep consistent
>>> > > >> "source"
>>> > > >> >> > files
>>> > > >> >> > > > and
>>> > > >> >> > > > > we
>>> > > >> >> > > > > > > can
>>> > > >> >> > > > > > > > release each artifact with a
>>> > > >> >> > > > > > > > separate cadence. If someone discovers bug in
>>> newly
>>> > > >> released
>>> > > >> >> > > > > Dockerimage,
>>> > > >> >> > > > > > > > we should be easily able to cut out a new release
>>> > with
>>> > > >> the
>>> > > >> >> > patch
>>> > > >> >> > > > > without
>>> > > >> >> > > > > > > > worrying about how development is
>>> > > >> >> > > > > > > > going in the apache/airflow repo.
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > > *Apache Flink & Apache CoucheDB *does it in the
>>> > similar
>>> > > >> >> manner:
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > > https://github.com/apache/flink &
>>> > > >> >> > > > > https://github.com/apache/flink-docker
>>> > > >> >> > > > > > > > https://github.com/apache/couchdb &
>>> > > >> >> > > > > > > > https://github.com/apache/couchdb-docker
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > > Regards,
>>> > > >> >> > > > > > > > Kaxil
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > > On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk <
>>> > > >> >> > > > > Jarek.Potiuk@polidea.com>
>>> > > >> >> > > > > > > > wrote:
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > > > > I do not think it's only the question of
>>> Mono/Multi
>>> > > >> repos.
>>> > > >> >> > > While
>>> > > >> >> > > > I
>>> > > >> >> > > > > > > > clearly
>>> > > >> >> > > > > > > > > see the benefit of separate repos I also see
>>> some
>>> > > >> >> drawbacks.
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > And if it bothers others, I am happy to follow
>>> the
>>> > > >> >> majority.
>>> > > >> >> > If
>>> > > >> >> > > > we
>>> > > >> >> > > > > > > think
>>> > > >> >> > > > > > > > > that a bit more complexity in testing justifies
>>> > > >> separating
>>> > > >> >> > > those
>>> > > >> >> > > > > three
>>> > > >> >> > > > > > > > > completely and having more "clean"- it's also
>>> > > >> >> workable but
>>> > > >> >> > IMHO
>>> > > >> >> > > > > > > > introduces
>>> > > >> >> > > > > > > > > certain complexity in development.
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > However I think this is not 0/1 a kind of
>>> Hybrid
>>> > > >> approach
>>> > > >> >> in
>>> > > >> >> > my
>>> > > >> >> > > > > opinion
>>> > > >> >> > > > > > > > > might be best of both worlds - development and
>>> > > >> >> releases .
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > Let me explain what I mean by "Hybrid":
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > I think we definitely should have separate
>>> > > >> >> repositories to
>>> > > >> >> > > > release
>>> > > >> >> > > > > > > those
>>> > > >> >> > > > > > > > > artifacts and I think there is no doubt about
>>> it:
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > * airflow (apache/airflow)
>>> > > >> >> > > > > > > > > * prod docker image (apache/airflow-docker)
>>> > > >> >> > > > > > > > > * helm chart (apache/airflow-helm)
>>> > > >> >> > > > > > > > > * api clients (we already have separate repos
>>> for
>>> > > >> those)
>>> > > >> >> > > > > > > > > (apache/airflow-client-*)
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > I think the only question is where we develop
>>> all
>>> > > those
>>> > > >> >> > > (develop
>>> > > >> >> > > > !=
>>> > > >> >> > > > > > > > > release). There are certain benefits of having
>>> a
>>> > > single
>>> > > >> >> > > "master"
>>> > > >> >> > > > > (let's
>>> > > >> >> > > > > > > > > call it "development" further) for all those
>>> > > artifacts.
>>> > > >> >> > > Currently
>>> > > >> >> > > > > the
>>> > > >> >> > > > > > > > > "development" version for all of those is in
>>> one
>>> > repo
>>> > > >> >> - and
>>> > > >> >> > > while
>>> > > >> >> > > > > > > > > developing one depends on the other, we also
>>> test
>>> > all
>>> > > >> of
>>> > > >> >> > those
>>> > > >> >> > > > > together
>>> > > >> >> > > > > > > > and
>>> > > >> >> > > > > > > > > this means that "current best" set of airflow
>>> > sources
>>> > > >> >> > > (including
>>> > > >> >> > > > > > > > > dependencies in setup.py), Dockerfile and Helm
>>> > chart
>>> > > >> work.
>>> > > >> >> > This
>>> > > >> >> > > > > means
>>> > > >> >> > > > > > > for
>>> > > >> >> > > > > > > > > example that you will not be able to break the
>>> Helm
>>> > > >> Chart
>>> > > >> >> by
>>> > > >> >> > > > > changing
>>> > > >> >> > > > > > > > > anything that the helm chart depends on in
>>> airflow.
>>> > > For
>>> > > >> >> > example
>>> > > >> >> > > > if
>>> > > >> >> > > > > you
>>> > > >> >> > > > > > > > > change "airflow webserver" into "airflow
>>> server"
>>> > the
>>> > > >> >> current
>>> > > >> >> > > helm
>>> > > >> >> > > > > chart
>>> > > >> >> > > > > > > > > will break. Similarly if you change
>>> entrypoint,sh
>>> > in
>>> > > >> Docker
>>> > > >> >> > > image
>>> > > >> >> > > > > in a
>>> > > >> >> > > > > > > > way
>>> > > >> >> > > > > > > > > that is not compatible with Helm chart, we
>>> will not
>>> > > let
>>> > > >> >> that
>>> > > >> >> > > > > happen -
>>> > > >> >> > > > > > > the
>>> > > >> >> > > > > > > > > CI tests will break if either of those changes
>>> in
>>> > an
>>> > > >> >> > > incompatible
>>> > > >> >> > > > > way.
>>> > > >> >> > > > > > > > And
>>> > > >> >> > > > > > > > > we can have dependencies in any direction
>>> between
>>> > > those
>>> > > >> >> > three.
>>> > > >> >> > > > > When we
>>> > > >> >> > > > > > > > see
>>> > > >> >> > > > > > > > > a commit break either of the three - we can
>>> make a
>>> > > >> decision
>>> > > >> >> > > about
>>> > > >> >> > > > > what
>>> > > >> >> > > > > > > to
>>> > > >> >> > > > > > > > > do - either accept and document the
>>> incompatibility
>>> > > >> >> or fix
>>> > > >> >> > it.
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > Of course keeping that property (testing it all
>>> > > >> together)
>>> > > >> >> is
>>> > > >> >> > > also
>>> > > >> >> > > > > > > > possible
>>> > > >> >> > > > > > > > > if they are in completely separate repos.
>>> There are
>>> > > >> several
>>> > > >> >> > > > > > > > > cross-dependencies - Docker image building
>>> depends
>>> > on
>>> > > >> >> > > > dependencies
>>> > > >> >> > > > > in
>>> > > >> >> > > > > > > > > setup.py for example, you cannot build Docker
>>> image
>>> > > >> from
>>> > > >> >> only
>>> > > >> >> > > > > > > Dockerfile
>>> > > >> >> > > > > > > > > without the sources of airflow nor build and
>>> test
>>> > > helm
>>> > > >> >> charts
>>> > > >> >> > > > > without
>>> > > >> >> > > > > > > the
>>> > > >> >> > > > > > > > > image (and sources - because that's where the
>>> > current
>>> > > >> >> > > kubernetes
>>> > > >> >> > > > > tests
>>> > > >> >> > > > > > > > > are). If we want to continue doing it for both
>>> Helm
>>> > > and
>>> > > >> >> > > > > Dockerfile, we
>>> > > >> >> > > > > > > > > would have to basically check out the latest
>>> > sources
>>> > > of
>>> > > >> >> > Airflow
>>> > > >> >> > > > > and run
>>> > > >> >> > > > > > > > the
>>> > > >> >> > > > > > > > > CI tests before merging any Docker or Helm
>>> Chart
>>> > > >> changes
>>> > > >> >> and
>>> > > >> >> > > the
>>> > > >> >> > > > > > > > opposite -
>>> > > >> >> > > > > > > > > we will have to download Dockerfile/Helm chart
>>> and
>>> > > >> build
>>> > > >> >> > > > > image/install
>>> > > >> >> > > > > > > > Helm
>>> > > >> >> > > > > > > > > chart when we are running CI tests for Airflow.
>>> > This
>>> > > is
>>> > > >> >> > > possible
>>> > > >> >> > > > > and we
>>> > > >> >> > > > > > > > > could do it, but it adds complexity to the
>>> build/CI
>>> > > >> >> process.
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > Having such split also makes some updates more
>>> > > >> >> difficult -
>>> > > >> >> > for
>>> > > >> >> > > > > example
>>> > > >> >> > > > > > > if
>>> > > >> >> > > > > > > > > we add new "extra" to Airflow that will
>>> require to
>>> > > >> install
>>> > > >> >> > > "apt"
>>> > > >> >> > > > > > > > dependency
>>> > > >> >> > > > > > > > > in Dockerfile, we will have to split it into
>>> first
>>> > > >> adding
>>> > > >> >> the
>>> > > >> >> > > > > > > dependency
>>> > > >> >> > > > > > > > to
>>> > > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add
>>> the
>>> > > >> >> extra to
>>> > > >> >> > > > airflow
>>> > > >> >> > > > > with
>>> > > >> >> > > > > > > > > setup.py. This makes it quite difficult to
>>> test it
>>> > > >> together
>>> > > >> >> > > > though
>>> > > >> >> > > > > (the
>>> > > >> >> > > > > > > > > Dockerfile change can only be tested fully
>>> after
>>> > > >> >> merging it
>>> > > >> >> > to
>>> > > >> >> > > > > master).
>>> > > >> >> > > > > > > > Not
>>> > > >> >> > > > > > > > > mentioning complexity of managing different
>>> > versions
>>> > > >> >> - your
>>> > > >> >> > > local
>>> > > >> >> > > > > > > > > development Dockerfile version vs sources of
>>> > Airflow
>>> > > >> for
>>> > > >> >> > > example.
>>> > > >> >> > > > > > > Imagine
>>> > > >> >> > > > > > > > > switching between branches where you add two
>>> > > >> >> different apt
>>> > > >> >> > > > > dependencies
>>> > > >> >> > > > > > > > to
>>> > > >> >> > > > > > > > > the Dockerfile. There are more similar
>>> scenarios I
>>> > > can
>>> > > >> >> > imagine
>>> > > >> >> > > -
>>> > > >> >> > > > > > > > especially
>>> > > >> >> > > > > > > > > for parallel changes in those repos.
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > This is of course doable to keep them
>>> separate, but
>>> > > >> >> it is
>>> > > >> >> > > quite a
>>> > > >> >> > > > > bit
>>> > > >> >> > > > > > > > more
>>> > > >> >> > > > > > > > > complex to set up (especially for a consistent
>>> > > >> development
>>> > > >> >> > > > > environment)
>>> > > >> >> > > > > > > > > when you have separate repos and prevent
>>> > > cross-breaking
>>> > > >> >> > changes
>>> > > >> >> > > > > might
>>> > > >> >> > > > > > > be
>>> > > >> >> > > > > > > > > more difficult.
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > I believe that the best way is to continue
>>> > developing
>>> > > >> >> > airflow +
>>> > > >> >> > > > > image +
>>> > > >> >> > > > > > > > > chart in one repo - airflow, but release them
>>> from
>>> > > >> those
>>> > > >> >> > > separate
>>> > > >> >> > > > > > > repos.
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > Airflow source release does not have to contain
>>> > > neither
>>> > > >> >> > chart,
>>> > > >> >> > > > nor
>>> > > >> >> > > > > > > image.
>>> > > >> >> > > > > > > > > And even if it contains sources for those,
>>> they are
>>> > > >> >> not the
>>> > > >> >> > > final
>>> > > >> >> > > > > > > > > "artifacts" (installable image and installable
>>> helm
>>> > > >> chart).
>>> > > >> >> > > > > > > > > Whenever we decide to release either of them -
>>> we
>>> > > >> >> test it
>>> > > >> >> in
>>> > > >> >> > > > > > > > "development".
>>> > > >> >> > > > > > > > > Then only when it is tested, we copy the
>>> sources to
>>> > > >> those
>>> > > >> >> > > > separate
>>> > > >> >> > > > > > > repos
>>> > > >> >> > > > > > > > > and release them.
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > With git - we can even do it very easily while
>>> > > >> preserving
>>> > > >> >> > > history
>>> > > >> >> > > > > of
>>> > > >> >> > > > > > > > > commits easily (been there, done that). And
>>> then we
>>> > > >> could
>>> > > >> >> > > release
>>> > > >> >> > > > > Helm
>>> > > >> >> > > > > > > > and
>>> > > >> >> > > > > > > > > Docker image separately based on the commits
>>> and
>>> > tags
>>> > > >> in
>>> > > >> >> > those
>>> > > >> >> > > > > separate
>>> > > >> >> > > > > > > > > repositories.
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > I agree that separate repos is a more "clean"
>>> > > approach.
>>> > > >> >> But I
>>> > > >> >> > > > > think it
>>> > > >> >> > > > > > > is
>>> > > >> >> > > > > > > > > less convenient for development consistency.
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > J,
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <
>>> > > >> >> > kaxilnaik@gmail.com
>>> > > >> >> > > >
>>> > > >> >> > > > > wrote:
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > > Forgot to mention, having them in separate
>>> repo
>>> > > also
>>> > > >> >> helps
>>> > > >> >> > in
>>> > > >> >> > > > > better
>>> > > >> >> > > > > > > > > > managing each individual artifacts.
>>> > > >> >> > > > > > > > > >
>>> > > >> >> > > > > > > > > > Each repo would have a separate Github Issue
>>> > where
>>> > > >> >> we can
>>> > > >> >> > > track
>>> > > >> >> > > > > the
>>> > > >> >> > > > > > > > issue
>>> > > >> >> > > > > > > > > > specific to Helm chart or Dockerfile.
>>> > > >> >> > > > > > > > > >
>>> > > >> >> > > > > > > > > > Regards,
>>> > > >> >> > > > > > > > > > Kaxil
>>> > > >> >> > > > > > > > > >
>>> > > >> >> > > > > > > > > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <
>>> > > >> >> > > kaxilnaik@gmail.com
>>> > > >> >> > > > >
>>> > > >> >> > > > > > > wrote:
>>> > > >> >> > > > > > > > > >
>>> > > >> >> > > > > > > > > > > The PMC also needs to agree if we want
>>> separate
>>> > > >> VOTING
>>> > > >> >> > for
>>> > > >> >> > > > > Docker
>>> > > >> >> > > > > > > > Image
>>> > > >> >> > > > > > > > > > > and Helm chart, I think we do.
>>> > > >> >> > > > > > > > > > >
>>> > > >> >> > > > > > > > > > > Regards,
>>> > > >> >> > > > > > > > > > > Kaxil
>>> > > >> >> > > > > > > > > > >
>>> > > >> >> > > > > > > > > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik <
>>> > > >> >> > > > kaxilnaik@gmail.com
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > > > wrote:
>>> > > >> >> > > > > > > > > > >
>>> > > >> >> > > > > > > > > > >> Hi all,
>>> > > >> >> > > > > > > > > > >>
>>> > > >> >> > > > > > > > > > >> What do you all think about having
>>> Dockerfile
>>> > > >> >> and Helm
>>> > > >> >> > > chart
>>> > > >> >> > > > > in
>>> > > >> >> > > > > > > the
>>> > > >> >> > > > > > > > > same
>>> > > >> >> > > > > > > > > > >> "Airflow" Repo vs separate?
>>> > > >> >> > > > > > > > > > >>
>>> > > >> >> > > > > > > > > > >> I feel having a separate repo for Airflow
>>> > > >> Dockerfile
>>> > > >> >> and
>>> > > >> >> > > > Helm
>>> > > >> >> > > > > > > chart
>>> > > >> >> > > > > > > > > have
>>> > > >> >> > > > > > > > > > >> more benefits like easy to track changes
>>> (via
>>> > > >> >> > Changelog),
>>> > > >> >> > > > > easy for
>>> > > >> >> > > > > > > > new
>>> > > >> >> > > > > > > > > > >> contributors, separate release cadence.
>>> > > >> >> > > > > > > > > > >>
>>> > > >> >> > > > > > > > > > >> Currently, docker file and Helm Chart are
>>> > inside
>>> > > >> the
>>> > > >> >> > same
>>> > > >> >> > > > > repo and
>>> > > >> >> > > > > > > > > when
>>> > > >> >> > > > > > > > > > >> we release changelog for a new Airflow
>>> > version,
>>> > > it
>>> > > >> >> would
>>> > > >> >> > > > > include
>>> > > >> >> > > > > > > all
>>> > > >> >> > > > > > > > > > >> changes (Airflow + Dockerfile + Helm
>>> chart)
>>> > > >> >> which I
>>> > > >> >> > think
>>> > > >> >> > > is
>>> > > >> >> > > > > not
>>> > > >> >> > > > > > > > that
>>> > > >> >> > > > > > > > > > great.
>>> > > >> >> > > > > > > > > > >>
>>> > > >> >> > > > > > > > > > >> Also having them all inside a single repo
>>> > means
>>> > > >> >> changes
>>> > > >> >> > in
>>> > > >> >> > > > > Helm
>>> > > >> >> > > > > > > > Chart
>>> > > >> >> > > > > > > > > > and
>>> > > >> >> > > > > > > > > > >> Dockerfile can block Airflow release. We
>>> could
>>> > > use
>>> > > >> >> > stable
>>> > > >> >> > > > Helm
>>> > > >> >> > > > > > > Chart
>>> > > >> >> > > > > > > > > > >> version and Dockerfile version to test
>>> Airflow
>>> > > >> >> so that
>>> > > >> >> > > they
>>> > > >> >> > > > > are
>>> > > >> >> > > > > > > > > > blockers to
>>> > > >> >> > > > > > > > > > >> release too.
>>> > > >> >> > > > > > > > > > >>
>>> > > >> >> > > > > > > > > > >> Happy to hear the thoughts from the
>>> community.
>>> > > >> >> > > > > > > > > > >>
>>> > > >> >> > > > > > > > > > >> Regards,
>>> > > >> >> > > > > > > > > > >> Kaxil
>>> > > >> >> > > > > > > > > > >>
>>> > > >> >> > > > > > > > > > >
>>> > > >> >> > > > > > > > > >
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > --
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > Jarek Potiuk
>>> > > >> >> > > > > > > > > Polidea <https://www.polidea.com/> | Principal
>>> > > >> Software
>>> > > >> >> > > Engineer
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > > > M: +48 660 796 129 <+48660796129>
>>> > > >> >> > > > > > > > > [image: Polidea] <https://www.polidea.com/>
>>> > > >> >> > > > > > > > >
>>> > > >> >> > > > > > > >
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > --
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > Jarek Potiuk
>>> > > >> >> > > > > > > Polidea <https://www.polidea.com/> | Principal
>>> > Software
>>> > > >> >> Engineer
>>> > > >> >> > > > > > >
>>> > > >> >> > > > > > > M: +48 660 796 129 <+48660796129>
>>> > > >> >> > > > > > > [image: Polidea] <https://www.polidea.com/>
>>> > > >> >> > > > > >
>>> > > >> >> > > > > >
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > --
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > Jarek Potiuk
>>> > > >> >> > > > > > Polidea <https://www.polidea.com/> | Principal
>>> Software
>>> > > >> Engineer
>>> > > >> >> > > > > >
>>> > > >> >> > > > > > M: +48 660 796 129 <+48660796129>
>>> > > >> >> > > > > > [image: Polidea] <https://www.polidea.com/>
>>> > > >> >> > > >
>>> > > >> >> > > >
>>> > > >> >> > > >
>>> > > >> >> > > > --
>>> > > >> >> > > >
>>> > > >> >> > > > Jarek Potiuk
>>> > > >> >> > > > Polidea <https://www.polidea.com/> | Principal Software
>>> > > Engineer
>>> > > >> >> > > >
>>> > > >> >> > > > M: +48 660 796 129 <+48660796129>
>>> > > >> >> > > > [image: Polidea] <https://www.polidea.com/>
>>> > > >> >> > > >
>>> > > >> >> > >
>>> > > >> >> >
>>> > > >> >>
>>> > > >> >>
>>> > > >> >> --
>>> > > >> >>
>>> > > >> >> Jarek Potiuk
>>> > > >> >> Polidea <https://www.polidea.com/> | Principal Software
>>> Engineer
>>> > > >> >>
>>> > > >> >> M: +48 660 796 129 <+48660796129>
>>> > > >> >> [image: Polidea] <https://www.polidea.com/>
>>> > > >> >>
>>> > > >> >
>>> > > >>
>>> > > >
>>> > > >
>>> > > > --
>>> > > >
>>> > > > Jarek Potiuk
>>> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>>> > > >
>>> > > > M: +48 660 796 129 <+48660796129>
>>> > > > [image: Polidea] <https://www.polidea.com/>
>>> > > >
>>> > > >
>>> > >
>>> > > --
>>> > >
>>> > > Jarek Potiuk
>>> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>>> > >
>>> > > M: +48 660 796 129 <+48660796129>
>>> > > [image: Polidea] <https://www.polidea.com/>
>>> > >
>>> >
>>>
>>>
>>> --
>>>
>>> Jarek Potiuk
>>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>>
>>> M: +48 660 796 129 <+48660796129>
>>> [image: Polidea] <https://www.polidea.com/>
>>
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>>
>>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Separate Repo vs MonoRepo for Dockerfile & Helm Chart

Posted by Kamil Breguła <ka...@polidea.com>.
I took a quick look and I like the overall concept, but I'm just wondering
if it will be clear enough for users. Currently, these scripts copy
different files from different directories and the mapping of the source to
the destination is written in the scripts. This will make it difficult to
contribute to this "sub-project". In my opinion, if we want to create new
repositories from some files, we should only do it for one directory. If
this directory has dependencies, we should try to break them down. The
end-user should not get the impression that they are in contact with the
copied repository at the first glance. Otherwise, we will not achieve our
primary goal - to facilitate end-user use.

In this case, it means that we should create a new directory in
apache/airflow named "prod-docker-image" or similar and move to it the
necessary Dockerfiles, documentation, scripts, and all other assets. In
particular, this directory should contain README.md which actually
describes the contents of that directory.

A good example is /chart directory. It only has one dependency which is not
is "/chart" directory  - the "Contributing" section in README.md refers to
the file in the root directory of the repository. This link will stop
working if we create a new repository from the entire directory. It will be
trivial to fix.

On Sun, Oct 25, 2020 at 9:18 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Hello Everyone,
>
> I would like to come back to the discussion as I have *JUST* implemented
> the solution (very simple but 100% working) to this monorepo vs. separate
> repos.
>
> You can take a look at this repo of mine:
> https://github.com/potiuk/airflow-docker. It is very simple and works
> like a charm. I implemented it to solve the issue
> https://github.com/apache/airflow/issues/11740
>
> This is a separate repo that people can use to have a separate "read-only"
> repository that **only** keeps our Dockerfile-related stuff - including the
> full history of changes related (and only those), full traceability, and
> incremental, automated synchronization from our "airflow" repo.
>
> I can - any time - set it up as "apache/airflow-docker" and get it to
> synchronize every day or every hour.
>
> Here, how it works:
>
> * The "master" and "v1-10-stable" branches are filtered to only contain
> files that are needed to build Prod Docker image
> * We keep history of all relevant commits in those branches
> * In the "main" branch we only keep the "scheduled" Github Actions
> workflow that does the synchronization and README.md which explains what
> needs to be done to build the docker image
> * I am using the excellent "git-filter-repo" tool which does the job
> really well and fast. Git-filter-repo is recommended by Git maintainers
> over the old, slow and much worse built-in git-filter-branch:
> https://git-scm.com/docs/git-filter-branch#_warning
> * the jobs to synchronize the repo takes 1m30 s to run - it is rather fast
> despite analyzing 13500 commits :)
> * it runs incrementally - just adding new commits when they appear
> * it is very simple, few lines script + few steps in Github Action to
> checkout/push the right branches
> * we keep all the commit mapping in the repo as well, so we have 1-1
> relationship between the commits in the "docker repo" and the original ones
> in Airflow repo
> * synchronization is 1-way - airflow -> airlfow-docker
> * we can use a very similar approach for synchronizing:
>     * Helm chart
>     * Open API clients
>     * other stuff
>
> It also follows our source release strategy - it has the same "properties"
> as our main repo - so it is merely a "convenience" way of accessing the
> Docker customization options, but the same functionality is available in
> our officially released sources.
>
> Do you think we should turn it into the "apache/airflow-docker" repo?
>
> J.
>
>
>
> On Sun, Jul 5, 2020 at 8:12 PM Daniel Imberman <da...@gmail.com>
> wrote:
>
>> Worth noting that git has the ability to cherry-pick only specific
>> directories. If we keep all of helm + tests in one directory, docker +
>> tests in another, and core + tests in a third directory it would be pretty
>> simple to automate splitting them.
>>
>>
>> https://stackoverflow.com/questions/19821749/git-cherry-pick-or-merge-specific-directory-from-another-branch
>>
>> via Newton Mail [
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>> ]
>> On Sun, Jul 5, 2020 at 9:57 AM, Daniel Imberman <
>> daniel.imberman@gmail.com> wrote:
>> I can’t agree with this enough :). I think writing a few bots to separate
>> out sections will be MUCH easier in the long run than maintaining multiple
>> repos. Will also prevent the difficulty of setting up a proper dev
>> environment for new contributors.
>> via Newton Mail [
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>> ]
>> On Sun, Jul 5, 2020 at 9:53 AM, Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>> Yeah. I think that the "monorepo" is the only way for now - until (or if)
>> we reach the size (and maturity) that different teams take care of the
>> different projects. Which might even not happen.
>>
>> But I would love to try the separate repos to publish/release still (maybe
>> not immediately, but it is a nice concept). I think it should be rather
>> easy (I will try it on my own repo first). Also, I think it has another
>> advantage - those separate repos might actually run other kinds of tests -
>> for example, to test if there is "everything" in that repo to release it
>> (for example build helm chart) and whether there are no accidental use of
>> stuff from outside of those dirs.
>>
>> I already thought about how to do it - it should be rather easy. Of course
>> - like most of the time - there is a ready-to-use git command doing it for
>> us. We simply need a bot running for that rep executing a variant of this
>> command:
>>
>> https://docs.github.com/en/github/using-git/splitting-a-subfolder-out-into-a-new-repository
>> (it
>> should only take commits from the commit merged last time). So level of
>> automation here is rather minimal.
>>
>> And if have those repos and at some point of time we decide to split
>> eventually - we will have already repos with all history as a starting
>> point.
>>
>> J.
>>
>>
>>
>>
>>
>>
>>
>> J.
>>
>>
>> On Sun, Jul 5, 2020 at 4:42 PM Kaxil Naik <ka...@gmail.com> wrote:
>>
>> > Hmm.. I agree the git-sync would have been a difficult one to solve if
>> we
>> > had separate repositories.
>> >
>> > Well, in that case, the mono repo approach (like we have now) indeed
>> makes
>> > more sense.
>> >
>> > Regarding the Kubernetes approach, I feel the ones in staging (
>> > https://github.com/kubernetes/kubernetes/tree/master/staging) are part
>> of
>> > the actual product itself but in our case we were discussing between
>> Helm
>> > chart and Dockerfile which are not actually part of the product. And we
>> > will need a good deal of automation if we go down that route.
>> > I think the plain mono-repo approach is better than that one.
>> >
>> > Regards,
>> > Kaxil
>> >
>> >
>> > On Sun, Jul 5, 2020 at 9:19 AM Jarek Potiuk <Ja...@polidea.com>
>> > wrote:
>> >
>> > > And one more perfect illustration of what I am talking about.
>> > >
>> > > A very good thing just happened. I was running the PR while writing
>> the
>> > > email (long time as you might imagine) and the new K8S tests with
>> 1.10.11
>> > > just failed. https://github.com/apache/airflow/pull/9663
>> > >
>> > > If had released the helm chart before we would've clear (small)
>> > > incompatibility here. And by seeing the test failing we could make
>> > decision
>> > > what to do:
>> > >
>> > > 1) fix it differently
>> > > 2) document it as a breaking Helm change, "1.10.12+ image" and make
>> test
>> > > work in both cases
>> > > 3) revert ...
>> > >
>> > > But at least we have na early warning that something is wrong. This is
>> > the
>> > > clear value of running the tests at every commit.
>> > >
>> > > J.
>> > >
>> > > On Sun, Jul 5, 2020 at 10:08 AM Jarek Potiuk <
>> Jarek.Potiuk@polidea.com>
>> > > wrote:
>> > >
>> > > > I just have another example of a case where splitting the repos and
>> > using
>> > > > only "released versions" across repositories might be a complete
>> > overkill
>> > > > when it comes to development complexity.
>> > > >
>> > > > We have this change from Aneesh:
>> > > > https://github.com/apache/airflow/pull/9371 about adding a git-sync
>> > > > option to the helm chart.
>> > > >
>> > > > That's a new feature, but we would like to test both 1.10 and the
>> > master
>> > > > version of KubernetesExecutor with that. It should work for both of
>> > them
>> > > -
>> > > > there is no coupling/dependency in the "airflow' code for it.
>> > > >
>> > > > However, there is a strong coupling in the tests. We have the
>> > > > "kubernetes_tests" running tests using all three: chart, production
>> > > docker,
>> > > > and Airflow, Those tests will have to be likely adapted to work with
>> > the
>> > > > new git-sync option. They were disabled previously as we had
>> problems
>> > > with
>> > > > them before the helm chart was used for tests but we can turn them
>> back
>> > > on
>> > > > now when git-sync is added to the helm chart. Those tests are part
>> of
>> > > > airflow test suite and we discussed with Daniel that they should
>> stay
>> > > there
>> > > > - those tests are importing airflow code, they are using latest
>> example
>> > > > dags which are also in the airflow code.
>> > > >
>> > > > So we have two ways how we can develop this -
>> > > > A) monorepo (current)
>> > > > B) separate repos.
>> > > >
>> > > > Just to remind - he goal is that our change is tested against:
>> > > >
>> > > > 1) Released Airflow version (say 1.10.11).
>> > > > 2) Development airflow version (master - soon possibly development)
>> > > > 3) Development docker image built with either "development" or
>> > "1.10.11"
>> > > > (we can release the Docker image for 1.10.11 independently from the
>> > > current
>> > > > development HEAD). The docker image is supposed to work with any
>> > version
>> > > of
>> > > > airflow
>> > > >
>> > > > In the case of A) Monorepo we have all that as a given.
>> > > >
>> > > > I just sent this really small PR that should do the job:
>> > > > https://github.com/apache/airflow/pull/9663. What it does, it takes
>> > the
>> > > > latest "development" docker image, "development" chart, bakes in the
>> > > latest
>> > > > "example dags" from "development branch". The image uses either
>> > > > "development" or released (from PyPI) "1.10.11" Airflow version -
>> and
>> > run
>> > > > the "development" tests against it. This is exactly what we want.
>> If we
>> > > add
>> > > > new features to the helm chart, the Kubernetes tests will have to be
>> > > > updated to include that - and this will happen in the airflow
>> > > "development"
>> > > > branch. The REALLY good thing in it - since we are running those
>> tests
>> > in
>> > > > CI build of airflow development branch - we prevent anyone from
>> making
>> > > > breaking changes. It is a given that both - the "development" of
>> > airflow
>> > > > and the "1.10.11" version of airflow will continue to work with the
>> > image
>> > > > and chart.
>> > > >
>> > > >
>> > > > In the case of B) where we split the repos:
>> > > >
>> > > > We have to decide where to keep the "kubernetes_tests" - should
>> they be
>> > > in
>> > > > "Airflow" or in "Helm". They are testing BOTH so we can choose
>> either
>> > > way.
>> > > > Together with Daniel we plan to expand those tests to cover all the
>> > > > different options we have in the Chart - testing all of it -
>> Kubernetes
>> > > > Executor, Celery Executor running on Kubernetes, MySQL (once we add
>> > it),
>> > > > etc. etc. So we want to make sure we have a matrix of tests
>> covering a
>> > > > number of deployment options. Those tests do not exist yet, and they
>> > will
>> > > > have to be written. In principle - they can be moved to the "Helm"
>> > > > repository. That's where they conceptually belong. However - there
>> is a
>> > > > Huge value in running the tests in airflow "development" - the
>> value is
>> > > > that no-one will be able to break the "development" airflow, because
>> > > those
>> > > > tests are run with every PR. I think we have no choice but to run
>> those
>> > > > tests always in development. Otherwise, people maintaining the helm
>> > chart
>> > > > will have to fix the problems introduced by people changing Airflow
>> > > code. I
>> > > > think this is a pretty bad idea to allow that. So if we move those
>> > tests
>> > > to
>> > > > Helm Chart repo we have to figure out how to run those "kubernetes"
>> > tests
>> > > > in CI for every build. This is quite possible - by getting the
>> latest
>> > > > master from helm chart and running the build, but it has several
>> > > problems:
>> > > >
>> > > > 1) The test code for CI will have to continue to stay in Airflow (to
>> > run
>> > > > CI builds) - this means that we already have coupling and some code
>> > > related
>> > > > to the execution of the helm tests has to be any way in Airflow.
>> > > >
>> > > > 2) Bigger problem. What happens if as "Airflow developer" you DO
>> > > introduce
>> > > > a change that breaks the helm chart? You will see a CI error
>> and.....
>> > You
>> > > > will not know what to do. Do you involve people who maintain the
>> helm
>> > > chart
>> > > > and wait for them? I think not. You should be able to reproduce the
>> > > problem
>> > > > locally and fix it yourself (maybe with the help of others - but you
>> > > should
>> > > > be able to fix your own commit). We would have to teach people how
>> to
>> > > bring
>> > > > the docker image and helm chart code from the latest version and run
>> > the
>> > > > tests. We could do it automatically with Breeze (similarly as we do
>> > with
>> > > > other integrations - where we bring in Kerberos, Mongo, and a
>> multitude
>> > > of
>> > > > others) without them even knowing it, but this might be fairly
>> complex
>> > > and
>> > > > prone to errors. In Monorepo - we already have a simple way of
>> > > reproducing
>> > > > and running the tests locally and everything is in one place.
>> > > >
>> > > > 3) There is a chance that someone makes a change in Helm in parallel
>> > to a
>> > > > change in Airflow that breaks it. This could easily happen in the
>> > > "git-sync
>> > > > case" or when we add "MySQL" for example in the future. And there
>> is no
>> > > way
>> > > > to prevent it.
>> > > >
>> > > > 4) If we only test against "released" Helm and Airflow (that was
>> one of
>> > > > the suggestions), the problem is even bigger. How do you know that
>> you
>> > do
>> > > > not break the currently "developed" helm chart? Or how do you know
>> that
>> > > the
>> > > > currently "developed" helm chart works with latest Airflow release?
>> If
>> > > you
>> > > > do not do those checks at the "commit" time, then you defer this to
>> > > > "release time" and only then you might find out that decisions you
>> made
>> > > > during development have to be reverted. This is a very, very bad
>> idea
>> > > IMHO
>> > > > again leading to the case that the release manager will have to fix
>> > > > problems introduced by others.
>> > > >
>> > > > J,
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Jul 3, 2020 at 10:28 PM Ash Berlin-Taylor <as...@apache.org>
>> > > wrote:
>> > > >
>> > > >> Monorepo FTW.
>> > > >>
>> > > >> Yes, it gets a little bit messier around release, but the approach
>> of
>> > > >> automatically extracting out the commits (or parts of commits) to a
>> > > >> separate repo for releasing may be the solution to that problem
>> > > >>
>> > > >>
>> > > >> -ash
>> > > >>
>> > > >> On Jul 3 2020, at 7:51 pm, Kaxil Naik <ka...@gmail.com> wrote:
>> > > >>
>> > > >> > I will take a look at the Kubernetes approach and get back to
>> this
>> > > >> thread.
>> > > >> >
>> > > >> > We had a discussion with Daniel yesterday and we are both
>> concerned
>> > > >> about
>> > > >> >> all the overhead for people like us who work on all three
>> > "entities"
>> > > >> >> at the
>> > > >> >> same time. Even just explaining how to work with Pull Requests
>> and
>> > in
>> > > >> what
>> > > >> >> sequence those PRs would have to be opened and merged in case of
>> > > >> changes
>> > > >> >> that are spanning across several "entities" - was a challenge. I
>> > was
>> > > >> unable
>> > > >> >> to clearly explain the sequence and way of reviewing/merging the
>> > PRs
>> > > >> that
>> > > >> >> will have to be made if we have submodules. This is a bad sign
>> as I
>> > > was
>> > > >> >> using submodules in the past and know how it works but I was
>> unable
>> > > to
>> > > >> >> explain it clearly.
>> > > >> >
>> > > >> >
>> > > >> > We don't even need submodules tbh. We can just use Bash Script
>> that
>> > > >> > pulls a
>> > > >> > pinned Helm Chart version.
>> > > >> > We only need Helm chart to run integration test for k8s (atleast
>> for
>> > > >> now).
>> > > >> > We already use tons of Bash scripts.
>> > > >> >
>> > > >> > One of the important benefits of separation that changes in one
>> > > >> component
>> > > >> > should not need change in other component, atleast
>> > > >> > not immediately.
>> > > >> >
>> > > >> > Changes in Helm chart and Docker file should never need changes
>> in
>> > > >> Airflow
>> > > >> > Changes in Airflow should only ever need a change in Dockerfile
>> and
>> > > Helm
>> > > >> > Chart after a new version is released.
>> > > >> >
>> > > >> > I just had a talk with Daniel too and still didn't find a good
>> > enough
>> > > >> > reason to have them in the same repo.
>> > > >> >
>> > > >> > I will definitely look at the Kubernetes approach (maybe it is
>> > better)
>> > > >> and
>> > > >> > get back to this thread. But as of now I don't see any major PROs
>> > > >> > for having them in the same repo.
>> > > >> >
>> > > >> > Regards,
>> > > >> > Kaxil
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > On Fri, Jul 3, 2020 at 5:00 PM Jarek Potiuk <
>> > Jarek.Potiuk@polidea.com
>> > > >
>> > > >> > wrote:
>> > > >> >
>> > > >> >> I think Ry's point is an important one - I thought about
>> writing a
>> > > >> longer
>> > > >> >> post but I looked at the Kubernetes structure and I really like
>> it
>> > so
>> > > >> just
>> > > >> >> wanted to comment on this last one.
>> > > >> >>
>> > > >> >> Seems that it is simply one "authoritative" (or source of truth)
>> > repo
>> > > >> where
>> > > >> >> everything is developed in monorepo fashion but then there is a
>> bot
>> > > >> >> that moves every commit related to subdirectories to those
>> > > "split-out"
>> > > >> >> repos. There are never direct commits of people or PRs in the
>> > > >> "split-out"
>> > > >> >> repositories. This is very similar to my original proposal to
>> have
>> > > >> >> dedicated repos used for releases - but with an automated way of
>> > > >> publishing
>> > > >> >> the commits to the "separated" repos at the moment, they are
>> merged
>> > > to
>> > > >> >> master in the main repo. I love it.
>> > > >> >>
>> > > >> >> I think it's really good and "pragmatic" solution. The code is
>> > > >> >> available in
>> > > >> >> separate repos, including the history of commits related to each
>> > > >> "entity"
>> > > >> >> (so only chart-related commits in chart repo). Issues for
>> > particular
>> > > >> >> "entities" are in those separate repos as well (something that
>> > Kaxil
>> > > >> >> mentioned). Users (not developers!) who are interested only in
>> > > >> Dockerfile
>> > > >> >> or Helm Chart have separate repos they can look at - with only
>> > > relevant
>> > > >> >> changes and history of releases for that particular entity. They
>> > can
>> > > >> raise
>> > > >> >> issues there (and in GitHub, we can easily refer to those issues
>> > from
>> > > >> the
>> > > >> >> main "airflow" repo). All the discussion from "user issues" are
>> > kept
>> > > >> >> in the
>> > > >> >> relevant repositories. Still - comments about development
>> changes
>> > > (and
>> > > >> >> related issues) might still be kept in the main "airflow" repo -
>> > next
>> > > >> to
>> > > >> >> other "development" changes.
>> > > >> >>
>> > > >> >> We can run separate releases from those linked repositories and
>> > even
>> > > >> >> publish sources directly from those repositories rather than
>> from
>> > the
>> > > >> main
>> > > >> >> one. At the same time - we avoid all the hassle of submodules.
>> > > >> >>
>> > > >> >> We had a discussion with Daniel yesterday and we are both
>> concerned
>> > > >> about
>> > > >> >> all the overhead for people like us who work on all three
>> > "entities"
>> > > >> >> at the
>> > > >> >> same time. Even just explaining how to work with Pull Requests
>> and
>> > in
>> > > >> what
>> > > >> >> sequence those PRs would have to be opened and merged in case of
>> > > >> changes
>> > > >> >> that are spanning across several "entities" - was a challenge. I
>> > was
>> > > >> unable
>> > > >> >> to clearly explain the sequence and way of reviewing/merging the
>> > PRs
>> > > >> that
>> > > >> >> will have to be made if we have submodules. This is a bad sign
>> as I
>> > > was
>> > > >> >> using submodules in the past and know how it works but I was
>> unable
>> > > to
>> > > >> >> explain it clearly.
>> > > >> >>
>> > > >> >> I really, really like Kubernetes approach - seems that it's one
>> of
>> > > the
>> > > >> >> cases where we can "eat cake and have it too".
>> > > >> >>
>> > > >> >> J.
>> > > >> >>
>> > > >> >>
>> > > >> >> On Thu, Jul 2, 2020 at 5:59 PM Ry Walker <ry...@rywalker.com>
>> wrote:
>> > > >> >>
>> > > >> >> > One reason to have a monorepo is for project branding, and end
>> > user
>> > > >> >> > experience. But for component development experience, it's
>> nice
>> > to
>> > > >> >> have a
>> > > >> >> > small, dedicated repo.
>> > > >> >> >
>> > > >> >> > I think the git submodule approach is technically sound, but
>> is
>> > at
>> > > >> odds
>> > > >> >> > with making the project easy to consume/understand from the
>> end
>> > > user
>> > > >> >> > perspective, especially if we expand the use of subprojects.
>> And
>> > > >> >> the main
>> > > >> >> > Airflow commit graph would appear to be slowing down which is
>> bad
>> > > for
>> > > >> >> > Airflow brand perception.
>> > > >> >> >
>> > > >> >> > Kubernetes has many sub-repos that are integrated into the
>> main
>> > > >> >> repo -
>> > > >> >> > which I think could be the best of both worlds:
>> > > >> >> > Example:
>> > > >> https://github.com/kubernetes/kubernetes/tree/master/staging
>> > > >> >> >
>> > > >> >> > I haven't dug in very deeply, and I won't pretend to
>> understand
>> > how
>> > > >> >> > challenging it may be to maintain this structure, but I'd
>> support
>> > > >> >> breaking
>> > > >> >> > more components out of the main Airflow repo for dev purposes
>> > (for
>> > > >> >> example,
>> > > >> >> > in the future, it'd be nice to have airflow-cli, airflow-api,
>> > > >> >> > airflow-scheduler, individual provider repos that are cleanly
>> > > >> separated)
>> > > >> >> as
>> > > >> >> > long as we bring the commits/contributions back into the
>> monorepo
>> > > >> with
>> > > >> >> > automation.
>> > > >> >> >
>> > > >> >> > Maybe we could dive a little deeper into how K8s is operating,
>> > > before
>> > > >> >> going
>> > > >> >> > with submodules?
>> > > >> >> >
>> > > >> >> > -Ry
>> > > >> >> >
>> > > >> >> >
>> > > >> >> >
>> > > >> >> >
>> > > >> >> > On Thu, Jul 2, 2020 at 11:24 AM Kaxil Naik <
>> kaxilnaik@gmail.com>
>> > > >> wrote:
>> > > >> >> >
>> > > >> >> > > Let's come to a consensus first before we do anything :-)
>> > > >> >> > >
>> > > >> >> > > Is everyone happy with separate repo approach? Let's wait
>> for
>> > 72
>> > > >> hours
>> > > >> >> to
>> > > >> >> > > hear from all and then have a plan on how we do it? WDYT?
>> > > >> >> > >
>> > > >> >> > > But indeed git submodules approach sounds good. We do it for
>> > for
>> > > >> >> *Airflow
>> > > >> >> > > Site *(
>> > > >> >> > >
>> > > >> >> > >
>> > > >> >> >
>> > > >> >>
>> > > >>
>> > >
>> >
>> https://github.com/apache/airflow-site/tree/master/landing-pages/site/themes
>> > > >> >> > > )
>> > > >> >> > > too.
>> > > >> >> > >
>> > > >> >> > > Regards,
>> > > >> >> > > Kaxil
>> > > >> >> > >
>> > > >> >> > > On Thu, Jul 2, 2020 at 4:15 PM Jarek Potiuk <
>> > > >> Jarek.Potiuk@polidea.com>
>> > > >> >> > > wrote:
>> > > >> >> > >
>> > > >> >> > > > Absolutely - I am happy to add "best practices" and short
>> > > >> >> "howto do
>> > > >> >> > stuff
>> > > >> >> > > > with git submodules" - and this knowledge will only be
>> > needed
>> > > >> for
>> > > >> >> > > > interacting with prod image/helmchart/running kubernetes
>> > tests.
>> > > >> For
>> > > >> >> all
>> > > >> >> > > the
>> > > >> >> > > > other purposes it should be "business as usual".
>> > > >> >> > > >
>> > > >> >> > > > On Thu, Jul 2, 2020 at 4:53 PM Daniel Imberman <
>> > > >> >> > > daniel.imberman@gmail.com>
>> > > >> >> > > > wrote:
>> > > >> >> > > >
>> > > >> >> > > > > I think git submodules sounds like a great idea. We
>> would
>> > > >> >> need to
>> > > >> >> > write
>> > > >> >> > > > > this into the CONTRIBUTING.md to let people know how to
>> do
>> > it
>> > > >> but
>> > > >> >> > It’s
>> > > >> >> > > a
>> > > >> >> > > > > “teach once” situation.
>> > > >> >> > > > >
>> > > >> >> > > > > via Newton Mail [
>> > > >> >> > > > >
>> > > >> >> > > >
>> > > >> >> > >
>> > > >> >> >
>> > > >> >>
>> > > >>
>> > >
>> >
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>> > > >> >> > > > > ]
>> > > >> >> > > > > On Thu, Jul 2, 2020 at 2:44 AM, Tomasz Urbaszek <
>> > > >> >> > turbaszek@apache.org>
>> > > >> >> > > > > wrote:
>> > > >> >> > > > > I support the idea of separate repos. The git submodules
>> > > >> mentioned
>> > > >> >> by
>> > > >> >> > > > > Jarek sounds like an interesting solution. It may add
>> some
>> > > >> >> complexity
>> > > >> >> > > > > for new contributors but it's not rocket science. If we
>> > agree
>> > > >> on
>> > > >> >> > using
>> > > >> >> > > > > this we should add small how-to in contributing.rst I
>> think
>> > > >> (i.e.
>> > > >> >> do
>> > > >> >> > I
>> > > >> >> > > > > have to have fork of each repo?).
>> > > >> >> > > > >
>> > > >> >> > > > > As stressed previously if we go this route we should
>> make
>> > > >> >> sure we
>> > > >> >> > have
>> > > >> >> > > > > nice testing of all those three components. Regarding
>> the
>> > > >> >> versioning,
>> > > >> >> > > > > I have no strong opinion but I fully support using
>> separate
>> > > >> issues
>> > > >> >> > for
>> > > >> >> > > > > airflow, docker, and helm.
>> > > >> >> > > > >
>> > > >> >> > > > > Tomek
>> > > >> >> > > > >
>> > > >> >> > > > >
>> > > >> >> > > > > On Thu, Jul 2, 2020 at 9:26 AM Jarek Potiuk <
>> > > >> >> > Jarek.Potiuk@polidea.com>
>> > > >> >> > > > > wrote:
>> > > >> >> > > > > >
>> > > >> >> > > > > > On Thu, Jul 2, 2020 at 3:16 AM Daniel Imberman <
>> > > >> >> > > > > daniel.imberman@gmail.com>
>> > > >> >> > > > > > wrote:
>> > > >> >> > > > > >
>> > > >> >> > > > > > I’m fine with keeping it as three separate repos but
>> > > merging
>> > > >> >> > testing
>> > > >> >> > > > > > > somehow (e.g. the source code chart would pull the
>> > > >> helm/docker
>> > > >> >> > > chart
>> > > >> >> > > > > into
>> > > >> >> > > > > > > .build) but we need to do it in a way that doesn’t
>> make
>> > > >> testing
>> > > >> >> > too
>> > > >> >> > > > > > > difficult.
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > So for example: How do I test/integration test a
>> change
>> > > >> that
>> > > >> >> > > > involves a
>> > > >> >> > > > > > > change to all three and has to be done at the same
>> > time?
>> > > >> >> Perhaps
>> > > >> >> > a
>> > > >> >> > > > > user can
>> > > >> >> > > > > > > “register” a branch of helm and docker when they
>> start
>> > up
>> > > >> >> breeze?
>> > > >> >> > > Or
>> > > >> >> > > > > > > perhaps we create a “parent” integration test that
>> uses
>> > > the
>> > > >> >> three
>> > > >> >> > > > > together?
>> > > >> >> > > > > > >
>> > > >> >> > > > > >
>> > > >> >> > > > > > Yes, those are exactly my concerns when splitting the
>> > > repos.
>> > > >> >> > > > > >
>> > > >> >> > > > > > I think testing for development should remain in the
>> > > >> "airflow"
>> > > >> >> > repo.
>> > > >> >> > > It
>> > > >> >> > > > > is
>> > > >> >> > > > > > the "central one" in fact. I slept it over and I think
>> > > using
>> > > >> >> > > "released"
>> > > >> >> > > > > > versions for development testing will suffer from this
>> > "we
>> > > >> >> need a
>> > > >> >> > > > change
>> > > >> >> > > > > in
>> > > >> >> > > > > > all three of those".
>> > > >> >> > > > > >
>> > > >> >> > > > > > But we have an easy solution I think.
>> > > >> >> > > > > >
>> > > >> >> > > > > > I think that simply setting submodules properly
>> should do
>> > > >> >> to the
>> > > >> >> > job:
>> > > >> >> > > > > > https://git-scm.com/book/en/v2/Git-Tools-Submodules.
>> > They
>> > > >> seem
>> > > >> >> to
>> > > >> >> > be
>> > > >> >> > > > > > perfect for our case.
>> > > >> >> > > > > >
>> > > >> >> > > > > > For those who have not used it - in short - submodules
>> > work
>> > > >> in
>> > > >> >> the
>> > > >> >> > > way
>> > > >> >> > > > > that
>> > > >> >> > > > > > they register the "linked repos" and store related
>> "hash"
>> > > >> >> of the
>> > > >> >> > > commit
>> > > >> >> > > > > > from that linked repo. For example, the "chart" folder
>> > will
>> > > >> >> be a
>> > > >> >> > link
>> > > >> >> > > > to
>> > > >> >> > > > > > "apache/airflow-helm-chart". We can also move the prod
>> > > >> Dockerfile
>> > > >> >> > to
>> > > >> >> > > a
>> > > >> >> > > > > > subfolder and link it to the separate repo. Git
>> submodule
>> > > >> >> has a
>> > > >> >> > > > > > built-in mechanism to a) update to the latest version
>> of
>> > > the
>> > > >> >> repo,
>> > > >> >> > b)
>> > > >> >> > > > > > commit your changes to the linked repo from there
>> which
>> > is
>> > > >> >> all we
>> > > >> >> > > > need. I
>> > > >> >> > > > > > used those few times - I never liked submodules for
>> > sharing
>> > > >> >> > "library"
>> > > >> >> > > > > code,
>> > > >> >> > > > > > but for sharing helm/Docker It seems perfect.
>> > > >> >> > > > > >
>> > > >> >> > > > > > From the "regular" developer point of view - you do
>> not
>> > > >> >> need to
>> > > >> >> > > > > get/update
>> > > >> >> > > > > > submodules if you do not need to use them - so for all
>> > the
>> > > >> >> > > development
>> > > >> >> > > > > > purposes if you only change the "airflow" code, you
>> would
>> > > not
>> > > >> >> even
>> > > >> >> > > need
>> > > >> >> > > > > to
>> > > >> >> > > > > > sync chart or Dockerfile. You do "git checkout" as
>> usual
>> > > >> >> and it
>> > > >> >> > > should
>> > > >> >> > > > > > work. So basically - no change for "regular" airflow
>> > > >> development.
>> > > >> >> > > > > >
>> > > >> >> > > > > > However, if you do need to work on helm + Docker +
>> code,
>> > > >> >> then you
>> > > >> >> > > > simply
>> > > >> >> > > > > to
>> > > >> >> > > > > > "git submodule update", go to the linked "helm" or
>> > "docker"
>> > > >> >> folder,
>> > > >> >> > > > > > checkout the "master" version and you start making
>> > changes.
>> > > >> The
>> > > >> >> > only
>> > > >> >> > > > > thing
>> > > >> >> > > > > > to remember when you want to push your changes is to
>> do
>> > > >> >> `git push
>> > > >> >> > > > > > --recurse-sumbodules="check" ` and it will make sure
>> that
>> > > >> >> all the
>> > > >> >> > > repos
>> > > >> >> > > > > are
>> > > >> >> > > > > > updated, It is a bit involved, but latest git version
>> > have
>> > > >> >> a very
>> > > >> >> > > good
>> > > >> >> > > > > > support and it must only be used by people who work on
>> > > >> >> airflow +
>> > > >> >> > > > docker +
>> > > >> >> > > > > > helm - all the others are unaffected.
>> > > >> >> > > > > >
>> > > >> >> > > > > > From the CI perspective also nothing changes - when we
>> > > >> checkout
>> > > >> >> the
>> > > >> >> > > > code
>> > > >> >> > > > > we
>> > > >> >> > > > > > will include submodules and our test harness will be
>> > > largely
>> > > >> >> > > unchanged.
>> > > >> >> > > > > > Submodule provides us with the right mechanism for
>> cross
>> > > >> >> dependency
>> > > >> >> > > > even
>> > > >> >> > > > > if
>> > > >> >> > > > > > we use branches.
>> > > >> >> > > > > >
>> > > >> >> > > > > > If everyone will be ok with that - I am happy to set
>> it
>> > up,
>> > > >> With
>> > > >> >> > > > > submodules
>> > > >> >> > > > > > - we can switch to separate repos even without
>> releasing
>> > > >> >> helm and
>> > > >> >> > > Prod
>> > > >> >> > > > > > chart "officially".
>> > > >> >> > > > > >
>> > > >> >> > > > > > J.
>> > > >> >> > > > > >
>> > > >> >> > > > > >
>> > > >> >> > > > > >
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > via Newton Mail [
>> > > >> >> > > > > > >
>> > > >> >> > > > >
>> > > >> >> > > >
>> > > >> >> > >
>> > > >> >> >
>> > > >> >>
>> > > >>
>> > >
>> >
>> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
>> > > >> >> > > > > > > ]
>> > > >> >> > > > > > > On Wed, Jul 1, 2020 at 3:20 PM, Jarek Potiuk <
>> > > >> >> > > > Jarek.Potiuk@polidea.com
>> > > >> >> > > > > >
>> > > >> >> > > > > > > wrote:
>> > > >> >> > > > > > > Sure. We can work with such an approach. There will
>> be
>> > > some
>> > > >> >> > > > > dependencies
>> > > >> >> > > > > > > that we might find are problematic, but If we all
>> see
>> > > >> >> that it's
>> > > >> >> > > > > > > worth trying, there is a clear benefit that it makes
>> > for
>> > > a
>> > > >> >> > "clean"
>> > > >> >> > > > > > > split between those different "entities". And
>> possibly
>> > > >> >> once we
>> > > >> >> > > > release
>> > > >> >> > > > > > > first versions of both image and chart, such
>> problems
>> > > >> >> will be
>> > > >> >> > rare
>> > > >> >> > > > and
>> > > >> >> > > > > easy
>> > > >> >> > > > > > > to fix.
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > I personally think such split is inevitable
>> eventually,
>> > > >> it's
>> > > >> >> > just a
>> > > >> >> > > > > matter
>> > > >> >> > > > > > > when to do it. If we decide to make this happen
>> soon -
>> > I
>> > > am
>> > > >> >> more
>> > > >> >> > > than
>> > > >> >> > > > > happy
>> > > >> >> > > > > > > to work on making the split reality.
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > One prerequisite to that is that all those - Helm
>> > Chart,
>> > > >> Prod
>> > > >> >> > Image
>> > > >> >> > > > and
>> > > >> >> > > > > > > Airflow are released in stable versions separately
>> > > >> >> "officially" -
>> > > >> >> > > > from
>> > > >> >> > > > > the
>> > > >> >> > > > > > > current sources (otherwise there will be no way to
>> test
>> > > >> >> > > cross-repo).
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > I think for that we will need to agree on the
>> > versioning
>> > > >> scheme
>> > > >> >> > and
>> > > >> >> > > > > cadence
>> > > >> >> > > > > > > for the Image and Helm Chart, then copy sources from
>> > > >> airflow
>> > > >> >> and
>> > > >> >> > > > > release
>> > > >> >> > > > > > > them as "baseline" including setup the tests for
>> all of
>> > > >> >> those -
>> > > >> >> > > then
>> > > >> >> > > > we
>> > > >> >> > > > > > > can remove both Helm and Dockerfile from the airflow
>> > > repo.
>> > > >> >> Happy
>> > > >> >> > to
>> > > >> >> > > > > help
>> > > >> >> > > > > > > with that if that's the direction we choose as a
>> > > >> >> community. It
>> > > >> >> is
>> > > >> >> > > > > important
>> > > >> >> > > > > > > though that we keep the cross-repo testing working.
>> We
>> > > >> >> have it
>> > > >> >> > > > working
>> > > >> >> > > > > as
>> > > >> >> > > > > > > of yesterday, so now the matter is - whatever we do
>> we
>> > > >> >> keep it
>> > > >> >> > > > running
>> > > >> >> > > > > and
>> > > >> >> > > > > > > have development environment support easy
>> development
>> > and
>> > > >> >> testing
>> > > >> >> > > of
>> > > >> >> > > > > > > either of the three (including CI testing
>> cross-repos)
>> > ,
>> > > >> That's
>> > > >> >> > the
>> > > >> >> > > > > only
>> > > >> >> > > > > > > really important thing to me - the rest is more of
>> > > >> technicality
>> > > >> >> > how
>> > > >> >> > > > we
>> > > >> >> > > > > link
>> > > >> >> > > > > > > the repos, but principle remains.
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > Do we have an idea for the versioning scheme that we
>> > > >> >> would like
>> > > >> >> > to
>> > > >> >> > > > use
>> > > >> >> > > > > for
>> > > >> >> > > > > > > the Helm Chart and prod image ?
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > Should we make it CalVer
>> > > >> >> <https://calver.org/overview.html> or
>> > > >> >> > > > SemVer
>> > > >> >> > > > > > > <https://semver.org/> (or some other scheme)? And
>> how
>> > > >> should
>> > > >> >> we
>> > > >> >> > > > treat
>> > > >> >> > > > > the
>> > > >> >> > > > > > > combinations with Airflow?
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > My thoughts (but I have no strong opinions as long
>> as
>> > > >> someone
>> > > >> >> > > > proposes
>> > > >> >> > > > > more
>> > > >> >> > > > > > > sensible versioning schemes):
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > 1) Airflow code - we continue the release scheme we
>> > have
>> > > >> (with
>> > > >> >> > > > > deciding on
>> > > >> >> > > > > > > 2.* scheme for the release). I expect in the future
>> we
>> > > >> might
>> > > >> >> > decide
>> > > >> >> > > > on
>> > > >> >> > > > > > > doing branches or patches so for 2.* I'd opt for
>> going
>> > > full
>> > > >> >> > SemVer
>> > > >> >> > > > > approach
>> > > >> >> > > > > > > and patches released from branches.
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > 2) I believe that Helm Chart can be versioned with
>> its
>> > > own
>> > > >> >> > version
>> > > >> >> > > > > (then
>> > > >> >> > > > > > > you specify the image version as helm parameter).
>> For
>> > the
>> > > >> Helm
>> > > >> >> > > Chart
>> > > >> >> > > > I
>> > > >> >> > > > > > > think CalVer might be OK as I do not expect any
>> > > >> >> branching/patches
>> > > >> >> > > in
>> > > >> >> > > > > the
>> > > >> >> > > > > > > future - I'd expect that there will be a single
>> stream
>> > of
>> > > >> >> > releases.
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > 3) Dockerfile (+ related files such as
>> .dockerignore,
>> > > empty
>> > > >> >> dir,
>> > > >> >> > > > > > > entrypoints etc). i do not imagine a lot of
>> branching
>> > for
>> > > >> >> those -
>> > > >> >> > > we
>> > > >> >> > > > > > > should be able to release a new version of a
>> Dockerfile
>> > > (+
>> > > >> >> > related
>> > > >> >> > > > > files)
>> > > >> >> > > > > > > working with nearly any earlier Airflow release, so
>> > > CalVer
>> > > >> >> seems
>> > > >> >> > > > like a
>> > > >> >> > > > > > > good choice.
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > 4) Image versioning becomes a bit most complex
>> because
>> > > the
>> > > >> >> image
>> > > >> >> > > tag
>> > > >> >> > > > is
>> > > >> >> > > > > > > always combination of:
>> > > >> >> > > > > > > * Dockerfile (+ related files) version
>> > > >> >> > > > > > > * Airflow Version
>> > > >> >> > > > > > > * Python Version
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > An example versioning I can imagine:
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > *Airflow*: 1.10.11, 1.10.12, 2.0.0, 2.1.0, 2.1.1 -
>> > patch
>> > > >> level
>> > > >> >> > (if
>> > > >> >> > > we
>> > > >> >> > > > > > > decide to have patches).
>> > > >> >> > > > > > > *Dockerfile: *2020.07.12, 2020.08.20...... ->
>> depending
>> > > >> >> when we
>> > > >> >> > > > release
>> > > >> >> > > > > > > them
>> > > >> >> > > > > > > *Helm Chart*: 2020.07.10, 2020.08.09 ...... Each
>> Helm
>> > > Chart
>> > > >> >> has a
>> > > >> >> > > > > minimum
>> > > >> >> > > > > > > version of both Dockerfile and Airflow versions it
>> > works
>> > > >> with.
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > *Example Docker Image tags:*
>> > > >> >> > > > > > >
>> > > >> apache/airlflow:dockerfile2020.07.10-airflow1.10.10-python3.6
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > WDYT?
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > J,
>> > > >> >> > > > > > >
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > On Wed, Jul 1, 2020 at 11:12 PM Kaxil Naik <
>> > > >> >> kaxilnaik@gmail.com>
>> > > >> >> > > > > wrote:
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > > I think we should have "separate repos for
>> > development"
>> > > >> too.
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > > 3 Repos in total:
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > > 1) apache/airflow
>> > > >> >> > > > > > > > 2) apache/airflow-docker-image
>> > > >> >> > > > > > > > 3) apache/airflow-helm-chart
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > > (1) *apache/airflow* should use a pinned stable
>> > version
>> > > >> of
>> > > >> >> > > Airflow
>> > > >> >> > > > > Helm
>> > > >> >> > > > > > > > chart to run Kubernetes tests
>> > > >> >> > > > > > > > (2) *apache/airflow* already has *Dockerfile.ci*
>> file
>> > > >> which
>> > > >> >> it
>> > > >> >> > > can
>> > > >> >> > > > > use to
>> > > >> >> > > > > > > > run airflow tests on docker images.
>> > > >> >> > > > > > > > (3) *apache/airflow-docker-image *should use the
>> > latest
>> > > >> >> > available
>> > > >> >> > > > > stable
>> > > >> >> > > > > > > > version of airflow
>> > > >> >> > > > > > > > (4) *apache/airflow-helm-chart *should use the
>> latest
>> > > >> >> available
>> > > >> >> > > > > stable
>> > > >> >> > > > > > > > version of airflow
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > > Having such split also makes some updates more
>> > > >> >> difficult -
>> > > >> >> for
>> > > >> >> > > > > example if
>> > > >> >> > > > > > > > > we add new "extra" to Airflow that will require
>> to
>> > > >> install
>> > > >> >> > > "apt"
>> > > >> >> > > > > > > > dependency
>> > > >> >> > > > > > > > > in Dockerfile, we will have to split it into
>> first
>> > > >> adding
>> > > >> >> the
>> > > >> >> > > > > > > dependency
>> > > >> >> > > > > > > > to
>> > > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add
>> the
>> > > >> >> extra to
>> > > >> >> > > > airflow
>> > > >> >> > > > > with
>> > > >> >> > > > > > > > > setup.py.
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > > Adding a new extra to setup.py would not (and
>> should
>> > > not)
>> > > >> >> > impact
>> > > >> >> > > > the
>> > > >> >> > > > > > > > development of *apache/airflow-docker-image*
>> > > >> >> > > > > > > > Once an RC is cut for apache/airflow or after a
>> new
>> > > >> version
>> > > >> >> is
>> > > >> >> > > > > released
>> > > >> >> > > > > > > for
>> > > >> >> > > > > > > > apache/airflow, we can work on supporting the new
>> > > airflow
>> > > >> >> > version
>> > > >> >> > > > in
>> > > >> >> > > > > the
>> > > >> >> > > > > > > > Production Docker Image.
>> > > >> >> > > > > > > > While doing that we can add all the libraries that
>> > are
>> > > >> needed
>> > > >> >> > by
>> > > >> >> > > > the
>> > > >> >> > > > > new
>> > > >> >> > > > > > > > Airflow Version and we will have a clean commit
>> > history
>> > > >> and
>> > > >> >> > > > > changelog for
>> > > >> >> > > > > > > > Docker image.
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > > We definitely do not need to work parallelly on
>> both
>> > > the
>> > > >> >> repos.
>> > > >> >> > > By
>> > > >> >> > > > > doing
>> > > >> >> > > > > > > > development in a separate repo we keep consistent
>> > > >> "source"
>> > > >> >> > files
>> > > >> >> > > > and
>> > > >> >> > > > > we
>> > > >> >> > > > > > > can
>> > > >> >> > > > > > > > release each artifact with a
>> > > >> >> > > > > > > > separate cadence. If someone discovers bug in
>> newly
>> > > >> released
>> > > >> >> > > > > Dockerimage,
>> > > >> >> > > > > > > > we should be easily able to cut out a new release
>> > with
>> > > >> the
>> > > >> >> > patch
>> > > >> >> > > > > without
>> > > >> >> > > > > > > > worrying about how development is
>> > > >> >> > > > > > > > going in the apache/airflow repo.
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > > *Apache Flink & Apache CoucheDB *does it in the
>> > similar
>> > > >> >> manner:
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > > https://github.com/apache/flink &
>> > > >> >> > > > > https://github.com/apache/flink-docker
>> > > >> >> > > > > > > > https://github.com/apache/couchdb &
>> > > >> >> > > > > > > > https://github.com/apache/couchdb-docker
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > > Regards,
>> > > >> >> > > > > > > > Kaxil
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > > On Wed, Jul 1, 2020 at 9:50 PM Jarek Potiuk <
>> > > >> >> > > > > Jarek.Potiuk@polidea.com>
>> > > >> >> > > > > > > > wrote:
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > > > > I do not think it's only the question of
>> Mono/Multi
>> > > >> repos.
>> > > >> >> > > While
>> > > >> >> > > > I
>> > > >> >> > > > > > > > clearly
>> > > >> >> > > > > > > > > see the benefit of separate repos I also see
>> some
>> > > >> >> drawbacks.
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > And if it bothers others, I am happy to follow
>> the
>> > > >> >> majority.
>> > > >> >> > If
>> > > >> >> > > > we
>> > > >> >> > > > > > > think
>> > > >> >> > > > > > > > > that a bit more complexity in testing justifies
>> > > >> separating
>> > > >> >> > > those
>> > > >> >> > > > > three
>> > > >> >> > > > > > > > > completely and having more "clean"- it's also
>> > > >> >> workable but
>> > > >> >> > IMHO
>> > > >> >> > > > > > > > introduces
>> > > >> >> > > > > > > > > certain complexity in development.
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > However I think this is not 0/1 a kind of Hybrid
>> > > >> approach
>> > > >> >> in
>> > > >> >> > my
>> > > >> >> > > > > opinion
>> > > >> >> > > > > > > > > might be best of both worlds - development and
>> > > >> >> releases .
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > Let me explain what I mean by "Hybrid":
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > I think we definitely should have separate
>> > > >> >> repositories to
>> > > >> >> > > > release
>> > > >> >> > > > > > > those
>> > > >> >> > > > > > > > > artifacts and I think there is no doubt about
>> it:
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > * airflow (apache/airflow)
>> > > >> >> > > > > > > > > * prod docker image (apache/airflow-docker)
>> > > >> >> > > > > > > > > * helm chart (apache/airflow-helm)
>> > > >> >> > > > > > > > > * api clients (we already have separate repos
>> for
>> > > >> those)
>> > > >> >> > > > > > > > > (apache/airflow-client-*)
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > I think the only question is where we develop
>> all
>> > > those
>> > > >> >> > > (develop
>> > > >> >> > > > !=
>> > > >> >> > > > > > > > > release). There are certain benefits of having a
>> > > single
>> > > >> >> > > "master"
>> > > >> >> > > > > (let's
>> > > >> >> > > > > > > > > call it "development" further) for all those
>> > > artifacts.
>> > > >> >> > > Currently
>> > > >> >> > > > > the
>> > > >> >> > > > > > > > > "development" version for all of those is in one
>> > repo
>> > > >> >> - and
>> > > >> >> > > while
>> > > >> >> > > > > > > > > developing one depends on the other, we also
>> test
>> > all
>> > > >> of
>> > > >> >> > those
>> > > >> >> > > > > together
>> > > >> >> > > > > > > > and
>> > > >> >> > > > > > > > > this means that "current best" set of airflow
>> > sources
>> > > >> >> > > (including
>> > > >> >> > > > > > > > > dependencies in setup.py), Dockerfile and Helm
>> > chart
>> > > >> work.
>> > > >> >> > This
>> > > >> >> > > > > means
>> > > >> >> > > > > > > for
>> > > >> >> > > > > > > > > example that you will not be able to break the
>> Helm
>> > > >> Chart
>> > > >> >> by
>> > > >> >> > > > > changing
>> > > >> >> > > > > > > > > anything that the helm chart depends on in
>> airflow.
>> > > For
>> > > >> >> > example
>> > > >> >> > > > if
>> > > >> >> > > > > you
>> > > >> >> > > > > > > > > change "airflow webserver" into "airflow server"
>> > the
>> > > >> >> current
>> > > >> >> > > helm
>> > > >> >> > > > > chart
>> > > >> >> > > > > > > > > will break. Similarly if you change
>> entrypoint,sh
>> > in
>> > > >> Docker
>> > > >> >> > > image
>> > > >> >> > > > > in a
>> > > >> >> > > > > > > > way
>> > > >> >> > > > > > > > > that is not compatible with Helm chart, we will
>> not
>> > > let
>> > > >> >> that
>> > > >> >> > > > > happen -
>> > > >> >> > > > > > > the
>> > > >> >> > > > > > > > > CI tests will break if either of those changes
>> in
>> > an
>> > > >> >> > > incompatible
>> > > >> >> > > > > way.
>> > > >> >> > > > > > > > And
>> > > >> >> > > > > > > > > we can have dependencies in any direction
>> between
>> > > those
>> > > >> >> > three.
>> > > >> >> > > > > When we
>> > > >> >> > > > > > > > see
>> > > >> >> > > > > > > > > a commit break either of the three - we can
>> make a
>> > > >> decision
>> > > >> >> > > about
>> > > >> >> > > > > what
>> > > >> >> > > > > > > to
>> > > >> >> > > > > > > > > do - either accept and document the
>> incompatibility
>> > > >> >> or fix
>> > > >> >> > it.
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > Of course keeping that property (testing it all
>> > > >> together)
>> > > >> >> is
>> > > >> >> > > also
>> > > >> >> > > > > > > > possible
>> > > >> >> > > > > > > > > if they are in completely separate repos. There
>> are
>> > > >> several
>> > > >> >> > > > > > > > > cross-dependencies - Docker image building
>> depends
>> > on
>> > > >> >> > > > dependencies
>> > > >> >> > > > > in
>> > > >> >> > > > > > > > > setup.py for example, you cannot build Docker
>> image
>> > > >> from
>> > > >> >> only
>> > > >> >> > > > > > > Dockerfile
>> > > >> >> > > > > > > > > without the sources of airflow nor build and
>> test
>> > > helm
>> > > >> >> charts
>> > > >> >> > > > > without
>> > > >> >> > > > > > > the
>> > > >> >> > > > > > > > > image (and sources - because that's where the
>> > current
>> > > >> >> > > kubernetes
>> > > >> >> > > > > tests
>> > > >> >> > > > > > > > > are). If we want to continue doing it for both
>> Helm
>> > > and
>> > > >> >> > > > > Dockerfile, we
>> > > >> >> > > > > > > > > would have to basically check out the latest
>> > sources
>> > > of
>> > > >> >> > Airflow
>> > > >> >> > > > > and run
>> > > >> >> > > > > > > > the
>> > > >> >> > > > > > > > > CI tests before merging any Docker or Helm Chart
>> > > >> changes
>> > > >> >> and
>> > > >> >> > > the
>> > > >> >> > > > > > > > opposite -
>> > > >> >> > > > > > > > > we will have to download Dockerfile/Helm chart
>> and
>> > > >> build
>> > > >> >> > > > > image/install
>> > > >> >> > > > > > > > Helm
>> > > >> >> > > > > > > > > chart when we are running CI tests for Airflow.
>> > This
>> > > is
>> > > >> >> > > possible
>> > > >> >> > > > > and we
>> > > >> >> > > > > > > > > could do it, but it adds complexity to the
>> build/CI
>> > > >> >> process.
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > Having such split also makes some updates more
>> > > >> >> difficult -
>> > > >> >> > for
>> > > >> >> > > > > example
>> > > >> >> > > > > > > if
>> > > >> >> > > > > > > > > we add new "extra" to Airflow that will require
>> to
>> > > >> install
>> > > >> >> > > "apt"
>> > > >> >> > > > > > > > dependency
>> > > >> >> > > > > > > > > in Dockerfile, we will have to split it into
>> first
>> > > >> adding
>> > > >> >> the
>> > > >> >> > > > > > > dependency
>> > > >> >> > > > > > > > to
>> > > >> >> > > > > > > > > Dockerfile, and once it is merged, we can add
>> the
>> > > >> >> extra to
>> > > >> >> > > > airflow
>> > > >> >> > > > > with
>> > > >> >> > > > > > > > > setup.py. This makes it quite difficult to test
>> it
>> > > >> together
>> > > >> >> > > > though
>> > > >> >> > > > > (the
>> > > >> >> > > > > > > > > Dockerfile change can only be tested fully after
>> > > >> >> merging it
>> > > >> >> > to
>> > > >> >> > > > > master).
>> > > >> >> > > > > > > > Not
>> > > >> >> > > > > > > > > mentioning complexity of managing different
>> > versions
>> > > >> >> - your
>> > > >> >> > > local
>> > > >> >> > > > > > > > > development Dockerfile version vs sources of
>> > Airflow
>> > > >> for
>> > > >> >> > > example.
>> > > >> >> > > > > > > Imagine
>> > > >> >> > > > > > > > > switching between branches where you add two
>> > > >> >> different apt
>> > > >> >> > > > > dependencies
>> > > >> >> > > > > > > > to
>> > > >> >> > > > > > > > > the Dockerfile. There are more similar
>> scenarios I
>> > > can
>> > > >> >> > imagine
>> > > >> >> > > -
>> > > >> >> > > > > > > > especially
>> > > >> >> > > > > > > > > for parallel changes in those repos.
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > This is of course doable to keep them separate,
>> but
>> > > >> >> it is
>> > > >> >> > > quite a
>> > > >> >> > > > > bit
>> > > >> >> > > > > > > > more
>> > > >> >> > > > > > > > > complex to set up (especially for a consistent
>> > > >> development
>> > > >> >> > > > > environment)
>> > > >> >> > > > > > > > > when you have separate repos and prevent
>> > > cross-breaking
>> > > >> >> > changes
>> > > >> >> > > > > might
>> > > >> >> > > > > > > be
>> > > >> >> > > > > > > > > more difficult.
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > I believe that the best way is to continue
>> > developing
>> > > >> >> > airflow +
>> > > >> >> > > > > image +
>> > > >> >> > > > > > > > > chart in one repo - airflow, but release them
>> from
>> > > >> those
>> > > >> >> > > separate
>> > > >> >> > > > > > > repos.
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > Airflow source release does not have to contain
>> > > neither
>> > > >> >> > chart,
>> > > >> >> > > > nor
>> > > >> >> > > > > > > image.
>> > > >> >> > > > > > > > > And even if it contains sources for those, they
>> are
>> > > >> >> not the
>> > > >> >> > > final
>> > > >> >> > > > > > > > > "artifacts" (installable image and installable
>> helm
>> > > >> chart).
>> > > >> >> > > > > > > > > Whenever we decide to release either of them -
>> we
>> > > >> >> test it
>> > > >> >> in
>> > > >> >> > > > > > > > "development".
>> > > >> >> > > > > > > > > Then only when it is tested, we copy the
>> sources to
>> > > >> those
>> > > >> >> > > > separate
>> > > >> >> > > > > > > repos
>> > > >> >> > > > > > > > > and release them.
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > With git - we can even do it very easily while
>> > > >> preserving
>> > > >> >> > > history
>> > > >> >> > > > > of
>> > > >> >> > > > > > > > > commits easily (been there, done that). And
>> then we
>> > > >> could
>> > > >> >> > > release
>> > > >> >> > > > > Helm
>> > > >> >> > > > > > > > and
>> > > >> >> > > > > > > > > Docker image separately based on the commits and
>> > tags
>> > > >> in
>> > > >> >> > those
>> > > >> >> > > > > separate
>> > > >> >> > > > > > > > > repositories.
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > I agree that separate repos is a more "clean"
>> > > approach.
>> > > >> >> But I
>> > > >> >> > > > > think it
>> > > >> >> > > > > > > is
>> > > >> >> > > > > > > > > less convenient for development consistency.
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > J,
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > On Wed, Jul 1, 2020 at 9:35 PM Kaxil Naik <
>> > > >> >> > kaxilnaik@gmail.com
>> > > >> >> > > >
>> > > >> >> > > > > wrote:
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > > Forgot to mention, having them in separate
>> repo
>> > > also
>> > > >> >> helps
>> > > >> >> > in
>> > > >> >> > > > > better
>> > > >> >> > > > > > > > > > managing each individual artifacts.
>> > > >> >> > > > > > > > > >
>> > > >> >> > > > > > > > > > Each repo would have a separate Github Issue
>> > where
>> > > >> >> we can
>> > > >> >> > > track
>> > > >> >> > > > > the
>> > > >> >> > > > > > > > issue
>> > > >> >> > > > > > > > > > specific to Helm chart or Dockerfile.
>> > > >> >> > > > > > > > > >
>> > > >> >> > > > > > > > > > Regards,
>> > > >> >> > > > > > > > > > Kaxil
>> > > >> >> > > > > > > > > >
>> > > >> >> > > > > > > > > > On Wed, Jul 1, 2020 at 8:30 PM Kaxil Naik <
>> > > >> >> > > kaxilnaik@gmail.com
>> > > >> >> > > > >
>> > > >> >> > > > > > > wrote:
>> > > >> >> > > > > > > > > >
>> > > >> >> > > > > > > > > > > The PMC also needs to agree if we want
>> separate
>> > > >> VOTING
>> > > >> >> > for
>> > > >> >> > > > > Docker
>> > > >> >> > > > > > > > Image
>> > > >> >> > > > > > > > > > > and Helm chart, I think we do.
>> > > >> >> > > > > > > > > > >
>> > > >> >> > > > > > > > > > > Regards,
>> > > >> >> > > > > > > > > > > Kaxil
>> > > >> >> > > > > > > > > > >
>> > > >> >> > > > > > > > > > > On Wed, Jul 1, 2020 at 8:06 PM Kaxil Naik <
>> > > >> >> > > > kaxilnaik@gmail.com
>> > > >> >> > > > > >
>> > > >> >> > > > > > > > wrote:
>> > > >> >> > > > > > > > > > >
>> > > >> >> > > > > > > > > > >> Hi all,
>> > > >> >> > > > > > > > > > >>
>> > > >> >> > > > > > > > > > >> What do you all think about having
>> Dockerfile
>> > > >> >> and Helm
>> > > >> >> > > chart
>> > > >> >> > > > > in
>> > > >> >> > > > > > > the
>> > > >> >> > > > > > > > > same
>> > > >> >> > > > > > > > > > >> "Airflow" Repo vs separate?
>> > > >> >> > > > > > > > > > >>
>> > > >> >> > > > > > > > > > >> I feel having a separate repo for Airflow
>> > > >> Dockerfile
>> > > >> >> and
>> > > >> >> > > > Helm
>> > > >> >> > > > > > > chart
>> > > >> >> > > > > > > > > have
>> > > >> >> > > > > > > > > > >> more benefits like easy to track changes
>> (via
>> > > >> >> > Changelog),
>> > > >> >> > > > > easy for
>> > > >> >> > > > > > > > new
>> > > >> >> > > > > > > > > > >> contributors, separate release cadence.
>> > > >> >> > > > > > > > > > >>
>> > > >> >> > > > > > > > > > >> Currently, docker file and Helm Chart are
>> > inside
>> > > >> the
>> > > >> >> > same
>> > > >> >> > > > > repo and
>> > > >> >> > > > > > > > > when
>> > > >> >> > > > > > > > > > >> we release changelog for a new Airflow
>> > version,
>> > > it
>> > > >> >> would
>> > > >> >> > > > > include
>> > > >> >> > > > > > > all
>> > > >> >> > > > > > > > > > >> changes (Airflow + Dockerfile + Helm chart)
>> > > >> >> which I
>> > > >> >> > think
>> > > >> >> > > is
>> > > >> >> > > > > not
>> > > >> >> > > > > > > > that
>> > > >> >> > > > > > > > > > great.
>> > > >> >> > > > > > > > > > >>
>> > > >> >> > > > > > > > > > >> Also having them all inside a single repo
>> > means
>> > > >> >> changes
>> > > >> >> > in
>> > > >> >> > > > > Helm
>> > > >> >> > > > > > > > Chart
>> > > >> >> > > > > > > > > > and
>> > > >> >> > > > > > > > > > >> Dockerfile can block Airflow release. We
>> could
>> > > use
>> > > >> >> > stable
>> > > >> >> > > > Helm
>> > > >> >> > > > > > > Chart
>> > > >> >> > > > > > > > > > >> version and Dockerfile version to test
>> Airflow
>> > > >> >> so that
>> > > >> >> > > they
>> > > >> >> > > > > are
>> > > >> >> > > > > > > > > > blockers to
>> > > >> >> > > > > > > > > > >> release too.
>> > > >> >> > > > > > > > > > >>
>> > > >> >> > > > > > > > > > >> Happy to hear the thoughts from the
>> community.
>> > > >> >> > > > > > > > > > >>
>> > > >> >> > > > > > > > > > >> Regards,
>> > > >> >> > > > > > > > > > >> Kaxil
>> > > >> >> > > > > > > > > > >>
>> > > >> >> > > > > > > > > > >
>> > > >> >> > > > > > > > > >
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > --
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > Jarek Potiuk
>> > > >> >> > > > > > > > > Polidea <https://www.polidea.com/> | Principal
>> > > >> Software
>> > > >> >> > > Engineer
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > > > M: +48 660 796 129 <+48660796129>
>> > > >> >> > > > > > > > > [image: Polidea] <https://www.polidea.com/>
>> > > >> >> > > > > > > > >
>> > > >> >> > > > > > > >
>> > > >> >> > > > > > >
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > --
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > Jarek Potiuk
>> > > >> >> > > > > > > Polidea <https://www.polidea.com/> | Principal
>> > Software
>> > > >> >> Engineer
>> > > >> >> > > > > > >
>> > > >> >> > > > > > > M: +48 660 796 129 <+48660796129>
>> > > >> >> > > > > > > [image: Polidea] <https://www.polidea.com/>
>> > > >> >> > > > > >
>> > > >> >> > > > > >
>> > > >> >> > > > > >
>> > > >> >> > > > > > --
>> > > >> >> > > > > >
>> > > >> >> > > > > > Jarek Potiuk
>> > > >> >> > > > > > Polidea <https://www.polidea.com/> | Principal
>> Software
>> > > >> Engineer
>> > > >> >> > > > > >
>> > > >> >> > > > > > M: +48 660 796 129 <+48660796129>
>> > > >> >> > > > > > [image: Polidea] <https://www.polidea.com/>
>> > > >> >> > > >
>> > > >> >> > > >
>> > > >> >> > > >
>> > > >> >> > > > --
>> > > >> >> > > >
>> > > >> >> > > > Jarek Potiuk
>> > > >> >> > > > Polidea <https://www.polidea.com/> | Principal Software
>> > > Engineer
>> > > >> >> > > >
>> > > >> >> > > > M: +48 660 796 129 <+48660796129>
>> > > >> >> > > > [image: Polidea] <https://www.polidea.com/>
>> > > >> >> > > >
>> > > >> >> > >
>> > > >> >> >
>> > > >> >>
>> > > >> >>
>> > > >> >> --
>> > > >> >>
>> > > >> >> Jarek Potiuk
>> > > >> >> Polidea <https://www.polidea.com/> | Principal Software
>> Engineer
>> > > >> >>
>> > > >> >> M: +48 660 796 129 <+48660796129>
>> > > >> >> [image: Polidea] <https://www.polidea.com/>
>> > > >> >>
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > > > --
>> > > >
>> > > > Jarek Potiuk
>> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > > >
>> > > > M: +48 660 796 129 <+48660796129>
>> > > > [image: Polidea] <https://www.polidea.com/>
>> > > >
>> > > >
>> > >
>> > > --
>> > >
>> > > Jarek Potiuk
>> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> > >
>> > > M: +48 660 796 129 <+48660796129>
>> > > [image: Polidea] <https://www.polidea.com/>
>> > >
>> >
>>
>>
>> --
>>
>> Jarek Potiuk
>> Polidea <https://www.polidea.com/> | Principal Software Engineer
>>
>> M: +48 660 796 129 <+48660796129>
>> [image: Polidea] <https://www.polidea.com/>
>
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>
>