You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <Ja...@polidea.com> on 2019/11/04 14:00:24 UTC

Re: [PROPOSE] Ease future migration path to 2.0 by provider's operators/hook backporting to 1.10.*

Hey Ash,

Thanks for the offer. I must admin pkgutil and package namespaces are not
the best documented part of python.

I dug a deep deeper and I found a similar problem -
https://github.com/pypa/setuptools/issues/895.  Seems that even if it is
not explicitly explained in pkgutil documentation, this comment (assuming
it is right) explains everything:

*"That's right. All parents of a namespace package must also be namespace
packages, as they will necessarily share that parent name space (farm and
farm.deps in this example)."*

There are few possibilities mentioned in the issue on how this can be
"workarounded", but those are by far not perfect solutions. They would
require patching already installed airflow's __init__.py to work - to
manipulate the search path, Still from my tests I do not know if this would
be possible at all because of the non-trivial __init__.py we have (and use)
in the *airflow* package.

We have a few PRs now waiting for decision on that one I think, so maybe we
can simply agree that we should use another package (I really like
*"airflow_ext"
*:D  and use it from now on? What do you (and others) think.

I'd love to start voting on it soon.

J.



On Thu, Oct 31, 2019 at 5:37 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> Let me run some tests too - I've used them a bit in the past. I thought
> since we only want to make airflow.providers a namespace package it might
> work for us.
>
> Will report back next week.
>
> -ash
>
> On 31 October 2019 15:58:22 GMT, Jarek Potiuk <Ja...@polidea.com>
> wrote:
> >The same repo (so mono-repo approach). All packages would be in
> >"airflow_integrations" directory. It's mainly about moving the
> >operators/hooks/sensor files to different directory structure.
> >
> >It might be done pretty much without changing the current
> >installation/development model:
> >
> >1) We can add setup.py command to install all the packages in -e mode
> >in
> >the main setup.py (to make it easier to install all deps in one go).
> >2) We can add dependencies in setup.py extras to install appropriate
> >packages. For example [google] extra will 'require
> >apache-airflow-integrations-providers-google' package - or
> >apache-airflow-providers-google if we decide to skip -integrations from
> >the
> >package name to make it shorter.
> >
> >The only potential drawback I see is a bit more involved setup of the
> >IDE.
> >
> >This way installation method for both dev and prod remains simple.
> >
> >In the future we can have separate release schedule for the packages
> >(AIP-8) but for now we can stick to the same version for
> >'apache-airflow'
> >and 'apache-airflow-integrations-*' package (+ separate release
> >schedule
> >for backporting needs)
> >Here again the structure of repo (we will likely be able to use native
> >namespaces so I removed some needles __init__.py).
> >
> >|-- airflow
> >|   |- __init__.py|   |- operators -> fundamental operators are here
> >|-- tests -> tests for core airflow are here (optionally we can move
> >them under "airflow")|-- setup.py -> setup.py for the "apache-airflow"
> >package|-- airflow_integrations
> >|   |-providers
> >|   | |-google
> >|   |   |-setup.py -> setup.py for the
> >"apache-airflow-integrations-providers-google" package
> >|   |   |-airflow_integrations
> >|   |     |-providers
> >|   |       |-google
> >|   |         |-__init__.py
> >|   |         | tests -> tests for the
> >"apache-airflow-integrations-providers-google" package|   |
> >|-__init__.py|   |-protocols
> >|     |-setup.py -> setup.py for the
> >"apache-airflow-integrations-protocols" package
> >|     |-airflow_integrations
> >|        |-protocols
> >|          |-__init__.py|          |-tests -> tests for the
> >"apache-airflow-integrations-protocols" package
> >
> >
> >J.
> >
> >On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> >> So create another package in a different repo? or the same repo with
> >a
> >> separate setup.py file that has airflow has dependency?
> >>
> >>
> >>
> >>
> >> On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk
> ><Ja...@polidea.com>
> >> wrote:
> >>
> >> > TL;DR; I did some more testing on how namespaces work. I still
> >believe
> >> the
> >> > only way to use namespaces is to have separate (for example
> >> > "airflow_integrations") package for all backportable packages.
> >> >
> >> > I am not sue if someone used namespaces before, but after reading
> >and
> >> > trying out , the main blocker seems to be that we have non-trivial
> >code
> >> in
> >> > airflow's "__init__.py"  (including class definitions, imported
> >> > sub-packages and plugin initialisation).
> >> >
> >> > Details are in
> >> > https://packaging.python.org/guides/packaging-namespace-packages/
> >but
> >> it's
> >> > a long one so let me summarize my findings:
> >> >
> >> >    - In order to use "airflow.providers" package we would have to
> >declare
> >> >    "airflow" as namespace
> >> >    - It can be done in three different ways:
> >> >       - omitting __init__.py in this package (native/implicit
> >namespace)
> >> >       - making __init__.py  of the "airflow" package in main
> >airflow (and
> >> >       other packages) must be "*__path__ =
> >> >       __import__('pkgutil').extend_path(__path__, __name__)*"
> >(pkgutil
> >> >       style) or
> >> "*__import__('pkg_resources').declare_namespace(__name__)*"
> >> >       (pkg_resources style)
> >> >
> >> > The first is not possible (we already have __init__.py  in
> >"airflow".
> >> > The second case is not possible because we already have quite a lot
> >in
> >> the
> >> > airflow's "__init__.py" and both pkgutil and pkg_resources style
> >state:
> >> >
> >> > "*Every* distribution that uses the namespace package must include
> >an
> >> > identical *__init__.py*. If any distribution does not, it will
> >cause the
> >> > namespace logic to fail and the other sub-packages will not be
> >> importable.
> >> > *Any
> >> > additional code in __init__.py will be inaccessible."*
> >> >
> >> > I even tried to add those pkgutil/pkg_resources to airflow and do
> >some
> >> > experimenting with it - but it does not work. Pip install fails at
> >the
> >> > plugins_manager as "airflow.plugins" is not accessible (kind of
> >> expected),
> >> > but I am sure there will be other problems as well. :(
> >> >
> >> > Basically - we cannot turn "airflow" into namespace because it has
> >some
> >> > "__init__.py" logic :(.
> >> >
> >> > So I think it still holds that if we want to use namespaces, we
> >should
> >> use
> >> > another package. The *"airflow_integrations"* is current candidate,
> >but
> >> we
> >> > can think of some nicer/shorter one: "airflow_ext", "airflow_int",
> >> > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt",
> >"airflow_",
> >> > "ext_airflow", ....  Interestingly "airflow_" is the one suggested
> >by
> >> PEP8
> >> > to avoid conflicts with Python names (which is a different case but
> >kind
> >> of
> >> > close).
> >> >
> >> > What do you think?
> >> >
> >> > J.
> >> >
> >> > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <ka...@gmail.com>
> >wrote:
> >> >
> >> > > The namespace feature looks promising and from your tests, it
> >looks
> >> like
> >> > it
> >> > > would work well from Airflow 2.0 and onwards.
> >> > >
> >> > > I will look at it in-depth and see if I have more suggestions or
> >> opinion
> >> > on
> >> > > it
> >> > >
> >> > > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk
> ><Jarek.Potiuk@polidea.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > TL;DR; We did some testing about namespaces and packaging (and
> >> > potential
> >> > > > backporting options for 1.10.* python3 Airflows) and we think
> >it's
> >> best
> >> > > to
> >> > > > use namespaces quickly and use different package name
> >> > > > "airflow-integrations" for all non-fundamental integrations.
> >> > > >
> >> > > > Unless we missed some tricks, we cannot use airflow.*
> >sub-packages
> >> for
> >> > > the
> >> > > > 1.10.* backportable packages. Example:
> >> > > >
> >> > > >    - "*apache-airflow"* package provides: "airflow.*" (this is
> >what
> >> we
> >> > > have
> >> > > >    today)
> >> > > >    - "*apache-airflow-providers-google*": provides
> >> > > >    "airflow.providers.google.*" packages
> >> > > >
> >> > > > If we install both packages (old apache-airflow 1.10.6  and new
> >> > > > apache-airflow-providers-google from 2.0) - it seems that
> >> > > > the "airflow.providers.google.*" package cannot be imported.
> >This is
> >> a
> >> > > bit
> >> > > > of a problem if we would like to backport the operators from
> >Airflow
> >> > 2.0
> >> > > to
> >> > > > Airflow 1.10 in a way that will be forward-compatible We really
> >want
> >> > > users
> >> > > > who started using backported operators in 1.10.* do not have to
> >> change
> >> > > > imports in their DAGs to run them in Airflow 2.0.
> >> > > >
> >> > > > We discussed it internally in our team and considered several
> >> options,
> >> > > but
> >> > > > we think the best way will be to go straight to "namespaces" in
> >> Airflow
> >> > > 2.0
> >> > > > and to have the integrations (as discussed in AIP-21
> >discussion) to
> >> be
> >> > > in a
> >> > > > separate "*airflow_integrations*" package.  It might be even
> >more
> >> > towards
> >> > > > the AIP-8 implementation and plays together very well in terms
> >of
> >> > > > "stewardship" discussed in AIP-21 now. But we will still keep
> >(for
> >> now)
> >> > > > single release process for all packages for 2.0 (except for the
> >> > > backporting
> >> > > > which can be done per-provider before 2.0 release) and provide
> >a
> >> > > foundation
> >> > > > for future more complex release cycles in future versions.
> >> > > >
> >> > > > Herre is the way how the new Airflow 2.0 repository could look
> >like
> >> (i
> >> > > only
> >> > > > show subset of dirs but they are representative). For those
> >whose
> >> email
> >> > > > fixed/colorfont will get corrupted here is an image of this
> >structure
> >> > > > https://pasteboard.co/IEesTih.png:
> >> > > >
> >> > > > |-- airflow
> >> > > > |   |- __init__.py|   |- operators -> fundamental operators are
> >here
> >> > > > |-- tests -> tests for core airflow are here (optionally we can
> >move
> >> > > > them under "airflow")|-- setup.py -> setup.py for the
> >> "apache-airflow"
> >> > > > package|-- airflow_integrations
> >> > > > |   |-providers
> >> > > > |   | |-google
> >> > > > |   |   |-setup.py -> setup.py for the
> >> > > > "apache-airflow-integrations-providers-google" package
> >> > > > |   |   |-airflow_integrations
> >> > > > |   |     |-__init__.py
> >> > > > |   |     |-providers
> >> > > > |   |       |-__init__.py
> >> > > > |   |       |-google
> >> > > > |   |         |-__init__.py
> >> > > > |   |         | tests -> tests for the
> >> > > > "apache-airflow-integrations-providers-google" package|   |
> >> > > > |-__init__.py|   |-protocols
> >> > > > |     |-setup.py -> setup.py for the
> >> > > > "apache-airflow-integrations-protocols" package
> >> > > > |     |-airflow_integrations
> >> > > > |        |-protocols
> >> > > > |          |-__init__.py|          |-tests -> tests for the
> >> > > > "apache-airflow-integrations-protocols" package
> >> > > >
> >> > > > There are a number of pros for this solution:
> >> > > >
> >> > > >    - We could use the standard namespaces feature of python to
> >build
> >> > > >    multiple packages:
> >> > > >
> >https://packaging.python.org/guides/packaging-namespace-packages/
> >> > > >    - Installation for users will be the same as previously. We
> >could
> >> > > >    install the needed packages automatically when particular
> >extras
> >> are
> >> > > > used
> >> > > >    (pip install apache-airflow[google] could install both
> >> > > "apache-airflow"
> >> > > > and
> >> > > >    "apache-airflow-integrations-providers-google")
> >> > > >    - We could have custom setup.py installation process for
> >> developers
> >> > > that
> >> > > >    could install all the packages in development ("-e ." mode)
> >in a
> >> > > single
> >> > > >    operation.
> >> > > >    - In case of transfer packages we could have nice error
> >messages
> >> > > >    informing that the other package needs to be installed (for
> >> example
> >> > > > S3->GCS
> >> > > >    operator would import
> >"airflow-integrations.providers.amazon.*"
> >> and
> >> > if
> >> > > > it
> >> > > >    fails it could raise ("Please install [amazon] extra to use
> >me.")
> >> > > >    - We could implement numerous optimisations in the way how
> >we run
> >> > > tests
> >> > > >    in CI (for example run all the "providers" tests only with
> >sqlite,
> >> > run
> >> > > >    tests in parallel etc.)
> >> > > >    - We could implement it gradually - we do not have to have a
> >"big
> >> > > bang"
> >> > > >    approach - we can implement it in "provider-by-provider" way
> >and
> >> > test
> >> > > it
> >> > > >    with one provider (Google) first to make sure that all the
> >> > mechanisms
> >> > > > are
> >> > > >    working
> >> > > >    - For now we could have the monorepo approach where all the
> >> packages
> >> > > >    will be developed in concert - for now avoiding the
> >dependency
> >> > > problems
> >> > > >    (but allowing for back-portability to 1.10).
> >> > > >    - We will have clear boundaries between packages and ability
> >to
> >> test
> >> > > for
> >> > > >    some unwanted/hidden dependencies between packages.
> >> > > >    - We could switch to (much better) sphinx-apidoc package to
> >> continue
> >> > > >    building single documentation for all of those (sphinx
> >apidoc has
> >> > > > support
> >> > > >    for namespaces).
> >> > > >
> >> > > > As we are working on GCP move from contrib to core, we could
> >make all
> >> > the
> >> > > > effort to test it and try it before we merge it to master so
> >that it
> >> > will
> >> > > > be ready for others (and we could help with most of the moves
> >> > > afterwards).
> >> > > > It seems complex, but in fact in most cases it will be very
> >simple
> >> move
> >> > > > between the packages and can be done incrementally so there is
> >little
> >> > > risk
> >> > > > in doing this I think.
> >> > > >
> >> > > > J.
> >> > > >
> >> > > >
> >> > > > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yr...@gmail.com>
> >> wrote:
> >> > > >
> >> > > > > Tomasz and Ash got good points about the overhead of having
> >> separate
> >> > > > repos.
> >> > > > > But while we grow bigger and more mature, I would prefer to
> >have
> >> what
> >> > > was
> >> > > > > described in AIP-8. It shouldn't be extremely hard for us to
> >come
> >> up
> >> > > with
> >> > > > > good strategies to handle the overhead. AIP-8 already talked
> >about
> >> > how
> >> > > it
> >> > > > > can benefit us. IMO on a high level, having clearly
> >seperation on
> >> > core
> >> > > > vs.
> >> > > > > hooks/operators would make the project much more scalable and
> >the
> >> > gains
> >> > > > > would outweigh the cost we pay.
> >> > > > >
> >> > > > > That being said, I'm supportive to this moving towards AIP-8
> >while
> >> > > > learning
> >> > > > > approach, quite a good practise to tackle a big project.
> >Looking
> >> > > forward
> >> > > > to
> >> > > > > read the AIP.
> >> > > > >
> >> > > > >
> >> > > > > Cheers,
> >> > > > > Kevin Y
> >> > > > >
> >> > > > > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <
> >> > Jarek.Potiuk@polidea.com
> >> > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > We are checking how we can use namespaces in back-portable
> >way
> >> and
> >> > we
> >> > > > > will
> >> > > > > > have POC soon so that we all will be able to see how it
> >will look
> >> > > like.
> >> > > > > >
> >> > > > > > J.
> >> > > > > >
> >> > > > > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <
> >> ash@apache.org>
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > I'll have to read your proposal in detail (sorry, no time
> >right
> >> > > > now!),
> >> > > > > > but
> >> > > > > > > I'm broadly in favour of this approach, and I think
> >keeping
> >> them
> >> > > _in_
> >> > > > > the
> >> > > > > > > same repo is the best plan -- that makes writing and
> >testing
> >> > > > > > cross-cutting
> >> > > > > > > changes  easier.
> >> > > > > > >
> >> > > > > > > -a
> >> > > > > > >
> >> > > > > > > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
> >> > > > > tomasz.urbaszek@polidea.com
> >> > > > > > >
> >> > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > > I think utilizing namespaces should reduce a lot of
> >problems
> >> > > raised
> >> > > > > by
> >> > > > > > > > using separate repos (who will manage it? how to
> >release?
> >> where
> >> > > > > should
> >> > > > > > be
> >> > > > > > > > the repo?).
> >> > > > > > > >
> >> > > > > > > > Bests,
> >> > > > > > > > Tomek
> >> > > > > > > >
> >> > > > > > > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
> >> > > > > > Jarek.Potiuk@polidea.com>
> >> > > > > > > > wrote:
> >> > > > > > > >
> >> > > > > > > >> Thanks Bas for comments! Let me share my thoughts
> >below.
> >> > > > > > > >>
> >> > > > > > > >> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> >> > > > > > > >> basharenslak@godatadriven.com>
> >> > > > > > > >> wrote:
> >> > > > > > > >>
> >> > > > > > > >>> Hi Jarek, I definitely see a future in creating
> >separate
> >> > > > > installable
> >> > > > > > > >>> packages for various operators/hooks/etc (as in
> >AIP-8).
> >> This
> >> > > > would
> >> > > > > > IMO
> >> > > > > > > >>> strip the “core” Airflow to only what’s needed and
> >result
> >> in
> >> > a
> >> > > > > small
> >> > > > > > > >>> package without a ton of dependencies (and make it
> >more
> >> > > > > maintainable,
> >> > > > > > > >>> shorter tests, etc etc etc). Not exactly sure though
> >what
> >> > > you’re
> >> > > > > > > >> proposing
> >> > > > > > > >>> in your e-mail, is it a new AIP for an intermediate
> >step
> >> > > towards
> >> > > > > > AIP-8?
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > > >> It's a new AIP I am proposing.  For now it's only for
> >> > > backporting
> >> > > > > the
> >> > > > > > > new
> >> > > > > > > >> 2.0 import paths to 1.10.* series.
> >> > > > > > > >>
> >> > > > > > > >> It's more of "incremental going in direction of AIP-8
> >and
> >> > > learning
> >> > > > > > some
> >> > > > > > > >> difficulties involved" than implementing AIP-8 fully.
> >We are
> >> > > > taking
> >> > > > > > > >> advantage of changes in import paths from AIP-21 which
> >make
> >> it
> >> > > > > > possible
> >> > > > > > > to
> >> > > > > > > >> have both old and new (optional) operators available
> >in
> >> 1.10.*
> >> > > > > series
> >> > > > > > of
> >> > > > > > > >> Airflow. I think there is a lot more to do for full
> >> > > implementation
> >> > > > > of
> >> > > > > > > >> AIP-8: decisions how to maintain, install those
> >operator
> >> > groups
> >> > > > > > > separately,
> >> > > > > > > >> stewardship model/organisation for the separate
> >groups, how
> >> to
> >> > > > > manage
> >> > > > > > > >> cross-dependencies, procedures for releasing the
> >packages
> >> etc.
> >> > > > > > > >>
> >> > > > > > > >> I think about this new AIP also as a learning effort -
> >we
> >> > would
> >> > > > > learn
> >> > > > > > > more
> >> > > > > > > >> how separate packaging works and then we can follow up
> >with
> >> > > AIP-8
> >> > > > > full
> >> > > > > > > >> implementation for "modular" Airflow. Then AIP-8 could
> >be
> >> > > > > implemented
> >> > > > > > in
> >> > > > > > > >> Airflow 2.1 for example - or 3.0 if we start following
> >> > semantic
> >> > > > > > > versioning
> >> > > > > > > >> - based on those learnings. It's a bit of good example
> >of
> >> > having
> >> > > > > cake
> >> > > > > > > and
> >> > > > > > > >> eating it too. We can try out modularity in 1.10.*
> >while
> >> > cutting
> >> > > > the
> >> > > > > > > scope
> >> > > > > > > >> of 2.0 and not implementing full management/release
> >> procedure
> >> > > for
> >> > > > > > AIP-8
> >> > > > > > > >> yet.
> >> > > > > > > >>
> >> > > > > > > >>
> >> > > > > > > >>> Thinking about this, I think there are still a few
> >grey
> >> areas
> >> > > > > (which
> >> > > > > > > >> would
> >> > > > > > > >>> be good to discuss in a new AIP, or continue on
> >AIP-8):
> >> > > > > > > >>>
> >> > > > > > > >>>  *   In your email you only speak only about the 3
> >big
> >> cloud
> >> > > > > > providers
> >> > > > > > > >>> (btw I made a PR for migrating all AWS components ->
> >> > > > > > > >>> https://github.com/apache/airflow/pull/6439). Is
> >there a
> >> > plan
> >> > > > for
> >> > > > > > > >>> splitting other components than Google/AWS/Azure?
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > > >> We could add more groups as part of this new AIP
> >indeed (as
> >> an
> >> > > > > > > extension to
> >> > > > > > > >> AIP-21 and pre-requisite to AIP-8). We already see how
> >> > > > > > > moving/deprecation
> >> > > > > > > >> works for the providers package - it works for
> >GCP/Google
> >> > rather
> >> > > > > > nicely.
> >> > > > > > > >> But there is nothing to prevent us from extending it
> >to
> >> cover
> >> > > > other
> >> > > > > > > groups
> >> > > > > > > >> of operators/hooks. If you look at the current
> >structure of
> >> > > > > > > documentation
> >> > > > > > > >> done by Kamil, we can follow the structure there and
> >move
> >> the
> >> > > > > > > >> operators/hooks accordingly (
> >> > > > > > > >>
> >> > > > >
> >> >
> >https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
> >> > > > > > ):
> >> > > > > > > >>
> >> > > > > > > >>      Fundamentals, ASF: Apache Software Foundation,
> >Azure:
> >> > > > Microsoft
> >> > > > > > > >> Azure, AWS: Amazon Web Services, GCP: Google Cloud
> >Platform,
> >> > > > Service
> >> > > > > > > >> integrations, Software integrations, Protocol
> >integrations.
> >> > > > > > > >>
> >> > > > > > > >> I am happy to include that in the AIP - if others
> >agree
> >> it's a
> >> > > > good
> >> > > > > > > idea.
> >> > > > > > > >> Out of those groups -  I think only Fundamentals
> >should not
> >> be
> >> > > > > > > back-ported.
> >> > > > > > > >> Others should be rather easy to port (if we decide
> >to). We
> >> > > already
> >> > > > > > have
> >> > > > > > > >> quite a lot of those in the new GCP operators for 2.0.
> >So
> >> > > starting
> >> > > > > > with
> >> > > > > > > >> GCP/Google group is a good idea. Also following with
> >Cloud
> >> > > > Providers
> >> > > > > > > first
> >> > > > > > > >> is a good thing. For example we have now support from
> >Google
> >> > > > > Composer
> >> > > > > > > team
> >> > > > > > > >> to do this separation for GCP (and we learn from it)
> >and
> >> then
> >> > we
> >> > > > can
> >> > > > > > > claim
> >> > > > > > > >> the stewardship in our team for releasing the python
> >3/
> >> > Airflow
> >> > > > > > > >> 1.10-compatible "airflow-google" packages. Possibly
> >other
> >> > Cloud
> >> > > > > > > >> Providers/teams might follow this (if they see the
> >value in
> >> > it)
> >> > > > and
> >> > > > > > > there
> >> > > > > > > >> could be different stewards for those. And then we can
> >do
> >> > other
> >> > > > > groups
> >> > > > > > > if
> >> > > > > > > >> we decide to. I think this way we can learn whether
> >AIP-8 is
> >> > > > > > manageable
> >> > > > > > > and
> >> > > > > > > >> what real problems we are going to face.
> >> > > > > > > >>
> >> > > > > > > >>  *   Each “plugin” e.g. GCP would be a separate repo,
> >should
> >> > we
> >> > > > > create
> >> > > > > > > >>> some sort of blueprint for such packages?
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > > >> I think we do not need separate repos (at all) but in
> >this
> >> new
> >> > > AIP
> >> > > > > we
> >> > > > > > > can
> >> > > > > > > >> test it before we decide to go for AIP-8. IMHO -
> >monorepo
> >> > > approach
> >> > > > > > will
> >> > > > > > > >> work here rather nicely. We could use python-3 native
> >> > namespaces
> >> > > > > > > >> <
> >> > > >
> >https://packaging.python.org/guides/packaging-namespace-packages/>
> >> > > > > > for
> >> > > > > > > >> the
> >> > > > > > > >> sub-packages when we go full AIP-8. For now we could
> >simply
> >> > > > package
> >> > > > > > the
> >> > > > > > > new
> >> > > > > > > >> operators in separate pip package for Python 3 version
> >> 1.10.*
> >> > > > series
> >> > > > > > > only.
> >> > > > > > > >> We only need to test if it works well with another
> >package
> >> > > > providing
> >> > > > > > > >> 'airflow.providers.*' after apache-airflow is
> >installed
> >> > > (providing
> >> > > > > > > >> 'airflow' package). But I think we can make it work. I
> >don't
> >> > > think
> >> > > > > we
> >> > > > > > > >> really need to split the repos, namespaces will work
> >just
> >> fine
> >> > > and
> >> > > > > has
> >> > > > > > > >> easier management of cross-repository dependencies
> >(but we
> >> can
> >> > > > learn
> >> > > > > > > >> otherwise). For sure we will not need it for the new
> >> proposed
> >> > > AIP
> >> > > > of
> >> > > > > > > >> backporting groups to 1.10 and we can defer that
> >decision to
> >> > > AIP-8
> >> > > > > > > >> implementation time.
> >> > > > > > > >>
> >> > > > > > > >>
> >> > > > > > > >>>  *   In which Airflow version do we start raising
> >> deprecation
> >> > > > > > warnings
> >> > > > > > > >>> and in which version would we remove the original?
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > > >> I think we should do what we did in GCP case already.
> >Those
> >> > old
> >> > > > > > > "imports"
> >> > > > > > > >> for operators can be made as deprecated in Airflow 2.0
> >(and
> >> > > > removed
> >> > > > > in
> >> > > > > > > 2.1
> >> > > > > > > >> or 3.0 if we start following semantic versioning). We
> >can
> >> > > however
> >> > > > do
> >> > > > > > it
> >> > > > > > > >> before in 1.10.7 or 1.10.8 if we release those
> >(without
> >> > removing
> >> > > > the
> >> > > > > > old
> >> > > > > > > >> operators yet - just raise deprecation warnings and
> >inform
> >> > that
> >> > > > for
> >> > > > > > > python3
> >> > > > > > > >> the new "airflow-google", "airflow-aws" etc. packages
> >can be
> >> > > > > installed
> >> > > > > > > and
> >> > > > > > > >> users can switch to it).
> >> > > > > > > >>
> >> > > > > > > >> J.
> >> > > > > > > >>
> >> > > > > > > >>
> >> > > > > > > >>>
> >> > > > > > > >>> Cheers,
> >> > > > > > > >>> Bas
> >> > > > > > > >>>
> >> > > > > > > >>> On 27 Oct 2019, at 08:33, Jarek Potiuk <
> >> > > Jarek.Potiuk@polidea.com
> >> > > > > > > <mailto:
> >> > > > > > > >>> Jarek.Potiuk@polidea.com>> wrote:
> >> > > > > > > >>>
> >> > > > > > > >>> Hello - any comments on that? I am happy to make it
> >into an
> >> > AIP
> >> > > > :)?
> >> > > > > > > >>>
> >> > > > > > > >>> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
> >> > > > > > Jarek.Potiuk@polidea.com
> >> > > > > > > >>> <ma...@polidea.com>>
> >> > > > > > > >>> wrote:
> >> > > > > > > >>>
> >> > > > > > > >>> *Motivation*
> >> > > > > > > >>>
> >> > > > > > > >>> I think we really should start thinking about making
> >it
> >> > easier
> >> > > to
> >> > > > > > > migrate
> >> > > > > > > >>> to 2.0 for our users. After implementing some recent
> >> changes
> >> > > > > related
> >> > > > > > to
> >> > > > > > > >>> AIP-21-
> >> > > > > > > >>> Changes in import paths
> >> > > > > > > >>> <
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> >> > > > > > > >>>
> >> > > > > > > >>> I
> >> > > > > > > >>> think I have an idea that might help with it.
> >> > > > > > > >>>
> >> > > > > > > >>> *Proposal*
> >> > > > > > > >>>
> >> > > > > > > >>> We could package some of the new and improved 2.0
> >operators
> >> > > > (moved
> >> > > > > to
> >> > > > > > > >>> "providers" package) and let them be used in Python 3
> >> > > environment
> >> > > > > of
> >> > > > > > > >>> airflow 1.10.x.
> >> > > > > > > >>>
> >> > > > > > > >>> This can be done case-by-case per "cloud provider".
> >It
> >> should
> >> > > not
> >> > > > > be
> >> > > > > > > >>> obligatory, should be largely driven by each
> >provider. It's
> >> > not
> >> > > > yet
> >> > > > > > > full
> >> > > > > > > >>> AIP-8
> >> > > > > > > >>> Split Hooks/Operators into separate packages
> >> > > > > > > >>> <
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> >> > > > > > > >>> .
> >> > > > > > > >>> It's
> >> > > > > > > >>> merely backporting of some operators/hooks to get it
> >work
> >> in
> >> > > > 1.10.
> >> > > > > > But
> >> > > > > > > by
> >> > > > > > > >>> doing it we might try out the concept of splitting,
> >learn
> >> > about
> >> > > > > > > >> maintenance
> >> > > > > > > >>> problems and maybe implement full *AIP-8 *approach in
> >2.1
> >> > > > > > consistently
> >> > > > > > > >>> across the board.
> >> > > > > > > >>>
> >> > > > > > > >>> *Context*
> >> > > > > > > >>>
> >> > > > > > > >>> Part of the AIP-21 was to move import paths for Cloud
> >> > providers
> >> > > > to
> >> > > > > > > >>> separate providers/<PROVIDER> package. An example for
> >that
> >> > (the
> >> > > > > first
> >> > > > > > > >>> provider we already almost migrated) was
> >providers/google
> >> > > package
> >> > > > > > > >> (further
> >> > > > > > > >>> divided into gcp/gsuite etc).
> >> > > > > > > >>>
> >> > > > > > > >>> We've done a massive migration of all the
> >Google-related
> >> > > > operators,
> >> > > > > > > >>> created a few missing ones and retrofitted some old
> >> operators
> >> > > to
> >> > > > > > follow
> >> > > > > > > >> GCP
> >> > > > > > > >>> best practices and fixing a number of problems - also
> >> > > > implementing
> >> > > > > > > >> Python3
> >> > > > > > > >>> and Pylint compatibility. Some of these
> >operators/hooks are
> >> > not
> >> > > > > > > backwards
> >> > > > > > > >>> compatible. Those that are compatible are still
> >available
> >> via
> >> > > the
> >> > > > > old
> >> > > > > > > >>> imports with deprecation warning.
> >> > > > > > > >>>
> >> > > > > > > >>> We've added missing tests (including system tests)
> >and
> >> > missing
> >> > > > > > > features -
> >> > > > > > > >>> improving some of the Google operators - giving the
> >users
> >> > more
> >> > > > > > > >> capabilities
> >> > > > > > > >>> and fixing some issues. Those operators should pretty
> >much
> >> > > "just
> >> > > > > > work"
> >> > > > > > > in
> >> > > > > > > >>> Airflow 1.10.x (any recent version) for Python 3. We
> >should
> >> > be
> >> > > > able
> >> > > > > > to
> >> > > > > > > >>> release a separate pip-installable package for those
> >> > operators
> >> > > > that
> >> > > > > > > users
> >> > > > > > > >>> should be able to install in Airflow 1.10.x.
> >> > > > > > > >>>
> >> > > > > > > >>> Any user will be able to install this separate
> >package in
> >> > their
> >> > > > > > Airflow
> >> > > > > > > >>> 1.10.x installation and start using those new
> >"provider"
> >> > > > operators
> >> > > > > in
> >> > > > > > > >>> parallel to the old 1.10.x operators. Other providers
> >> > > > ("microsoft",
> >> > > > > > > >>> "amazon") might follow the same approach if they
> >want. We
> >> > could
> >> > > > > even
> >> > > > > > at
> >> > > > > > > >>> some point decide to move some of the core operators
> >in
> >> > similar
> >> > > > > > fashion
> >> > > > > > > >>> (for example following the structure proposed in the
> >latest
> >> > > > > > > >> documentation:
> >> > > > > > > >>> fundamentals / software / etc.
> >> > > > > > > >>>
> >> > > > > >
> >> > >
> >https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
> >> > > > > > > >>>
> >> > > > > > > >>> *Pros and cons*
> >> > > > > > > >>>
> >> > > > > > > >>> There are a number of pros:
> >> > > > > > > >>>
> >> > > > > > > >>>  - Users will have an easier migration path if they
> >are
> >> > deeply
> >> > > > > vested
> >> > > > > > > >>>  into 1.10.* version
> >> > > > > > > >>>  - It's possible to migrate in stages for people who
> >are
> >> also
> >> > > > > vested
> >> > > > > > in
> >> > > > > > > >>>  py2: *py2 (1.10) -> py3 (1.10) -> py3 + new
> >operators
> >> (1.10)
> >> > > ->
> >> > > > > py3
> >> > > > > > +
> >> > > > > > > >>>  2.0*
> >> > > > > > > >>>  - Moving to new operators in py3 + new operators can
> >be
> >> done
> >> > > > > > > >>>  gradually. Old operators will continue to work while
> >new
> >> can
> >> > > be
> >> > > > > used
> >> > > > > > > >> more
> >> > > > > > > >>>  and more
> >> > > > > > > >>>  - People will get incentivised to migrate to python
> >3
> >> before
> >> > > 2.0
> >> > > > > is
> >> > > > > > > >>>  out (by using new operators)
> >> > > > > > > >>>  - Each provider "package" can have independent
> >release
> >> > > schedule
> >> > > > -
> >> > > > > > and
> >> > > > > > > >>>  add functionality in already released Airflow
> >versions.
> >> > > > > > > >>>  - We do not take out any functionality from the
> >users - we
> >> > > just
> >> > > > > add
> >> > > > > > > >>>  more options
> >> > > > > > > >>>  - The releases can be - similarly as main airflow
> >> releases -
> >> > > > voted
> >> > > > > > > >>>  separately by PMC after "stewards" of the package
> >(per
> >> > > provider)
> >> > > > > > > >> perform
> >> > > > > > > >>>  round of testing on 1.10.* versions.
> >> > > > > > > >>>  - Users will start migrating to new operators
> >earlier and
> >> > have
> >> > > > > > > >>>  smoother switch to 2.0 later
> >> > > > > > > >>>  - The latest improved operators will start
> >> > > > > > > >>>
> >> > > > > > > >>> There are three cons I could think of:
> >> > > > > > > >>>
> >> > > > > > > >>>  - There will be quite a lot of duplication between
> >old and
> >> > new
> >> > > > > > > >>>  operators (they will co-exist in 1.10). That might
> >lead to
> >> > > > > confusion
> >> > > > > > > of
> >> > > > > > > >>>  users and problems with cooperation between
> >different
> >> > > > > > operators/hooks
> >> > > > > > > >>>  - Having new operators in 1.10 python 3 might keep
> >people
> >> > from
> >> > > > > > > >>>  migrating to 2.0
> >> > > > > > > >>>  - It will require some maintenance and separate
> >release
> >> > > > overhead.
> >> > > > > > > >>>
> >> > > > > > > >>> I already spoke to Composer team @Google and they are
> >very
> >> > > > positive
> >> > > > > > > about
> >> > > > > > > >>> this. I also spoke to Ash and seems it might also be
> >OK for
> >> > > > > > Astronomer
> >> > > > > > > >>> team. We have Google's backing and support, and we
> >can
> >> > provide
> >> > > > > > > >> maintenance
> >> > > > > > > >>> and support for those packages - being an example for
> >other
> >> > > > > providers
> >> > > > > > > how
> >> > > > > > > >>> they can do it.
> >> > > > > > > >>>
> >> > > > > > > >>> Let me know what you think - and whether I should
> >make it
> >> > into
> >> > > an
> >> > > > > > > >> official
> >> > > > > > > >>> AIP maybe?
> >> > > > > > > >>>
> >> > > > > > > >>> J.
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>> --
> >> > > > > > > >>>
> >> > > > > > > >>> Jarek Potiuk
> >> > > > > > > >>> Polidea <https://www.polidea.com/> | Principal
> >Software
> >> > > Engineer
> >> > > > > > > >>>
> >> > > > > > > >>> M: +48 660 796 129 <+48660796129>
> >> > > > > > > >>> [image: Polidea] <https://www.polidea.com/>
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>> --
> >> > > > > > > >>>
> >> > > > > > > >>> Jarek Potiuk
> >> > > > > > > >>> Polidea <https://www.polidea.com/> | Principal
> >Software
> >> > > Engineer
> >> > > > > > > >>>
> >> > > > > > > >>> M: +48 660 796 129 <+48660796129>
> >> > > > > > > >>> [image: Polidea] <https://www.polidea.com/>
> >> > > > > > > >>>
> >> > > > > > > >>>
> >> > > > > > > >>
> >> > > > > > > >> --
> >> > > > > > > >>
> >> > > > > > > >> Jarek Potiuk
> >> > > > > > > >> Polidea <https://www.polidea.com/> | Principal
> >Software
> >> > > Engineer
> >> > > > > > > >>
> >> > > > > > > >> M: +48 660 796 129 <+48660796129>
> >> > > > > > > >> [image: Polidea] <https://www.polidea.com/>
> >> > > > > > > >>
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > --
> >> > > > > > > >
> >> > > > > > > > Tomasz Urbaszek
> >> > > > > > > > Polidea <https://www.polidea.com/> | Junior Software
> >> Engineer
> >> > > > > > > >
> >> > > > > > > > M: +48 505 628 493 <+48505628493>
> >> > > > > > > > E: tomasz.urbaszek@polidea.com
> ><tomasz.urbaszeki@polidea.com
> >> >
> >> > > > > > > >
> >> > > > > > > > Unique Tech
> >> > > > > > > > Check out our projects!
> ><https://www.polidea.com/our-work>
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > > > --
> >> > > > > >
> >> > > > > > Jarek Potiuk
> >> > > > > > Polidea <https://www.polidea.com/> | Principal Software
> >Engineer
> >> > > > > >
> >> > > > > > M: +48 660 796 129 <+48660796129>
> >> > > > > > [image: Polidea] <https://www.polidea.com/>
> >> > > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > >
> >> > > > Jarek Potiuk
> >> > > > Polidea <https://www.polidea.com/> | Principal Software
> >Engineer
> >> > > >
> >> > > > M: +48 660 796 129 <+48660796129>
> >> > > > [image: Polidea] <https://www.polidea.com/>
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> >
> >> > Jarek Potiuk
> >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> >
> >> > M: +48 660 796 129 <+48660796129>
> >> > [image: Polidea] <https://www.polidea.com/>
> >> >
> >>
> >
> >
> >--
> >
> >Jarek Potiuk
> >Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> >M: +48 660 796 129 <+48660796129>
> >[image: Polidea] <https://www.polidea.com/>
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [PROPOSE] Ease future migration path to 2.0 by provider's operators/hook backporting to 1.10.*

Posted by Jarek Potiuk <Ja...@polidea.com>.
Thanks Ash! It seems it works really well and is super simple!

I have a POC working for Airflow:
https://github.com/apache/airflow/pull/6507

I managed to build and pip-install two packages:

1) apache_airflow 2.0 -> which is the same as today - containing everything
- including providers and gcp.

2) apache-airflow-providers-google package which has apache-airflow-1.10.*
as installation prerequisite.

I managed to actually schedule the example_gcp_pubsub dag from
airflow.providers.google.example_dags - which uses
airflow.providers.google.cloud.operators.pubsub operators and the results
are attached (Hope you can see pictures).
It worked very nicely - when I just did 'pip install
apache-airflow-providers-google' it downloaded and installed from pip
apache-airflow-1.10.6 + all prerequisites from the [gcp] extra (which I
added as needed for the google package).

So we seem to have a working solution now. I will cast a final vote for
what I think is a consensus now as update to AIP-21 (there is no point in
creating a separate AIP).

J.


On Tue, Nov 5, 2019 at 11:34 AM Kaxil Naik <ka...@gmail.com> wrote:

> Yes let's just do (1) for now.
>
>
>
> On Tue, Nov 5, 2019, 08:48 Jarek Potiuk <Ja...@polidea.com> wrote:
>
> > Thanks Ash! It might indeed work. I will take it from there and try to
> make
> > a POC PR with airflow.
> >
> > It's a bit different approach than google-python libraries (they keep all
> > the libraries as separate sub-packages/mini projects inside the main
> > project). The approach you propose is far less invasive in terms of
> > changing structure of the main repo. I like it this way much more. It
> makes
> > it much easier to import project in IDE even if it is less modular in
> > nature.
> >
> > From what I understand with this structure - if it works - we have two
> > options:
> >
> > (1) For Airflow 2.0 we will be able to install Airflow and all
> > "integrations" in single (apache-airflow == 2.0.0) package and build
> > separate backporting integration packages for 1.10.* only.
> > (2) We will split Airflow 2.0 into separate "core" and "integration"
> > packages as well while preparing packages.
> >
> > I think (1) is a bit more reasonable for now, until we work full AIP-8
> > solution (including dependency hell solving). Let me know what you think
> > (and others as well).
> >
> > J.
> >
> > On Mon, Nov 4, 2019 at 9:24 PM Ash Berlin-Taylor <as...@apache.org> wrote:
> >
> > > https://github.com/ashb/airflow-submodule-test <
> > > https://github.com/ashb/airflow-submodule-test>
> > >
> > > That seems to work in any order things are installed, at least on
> python
> > > 3.7. I've had a stressful few days so I may have missed something.
> Please
> > > tell me if there's a case I've missed, or if this is not a suitable
> proxy
> > > for our situation.
> > >
> > > -a
> > >
> > > > On 4 Nov 2019, at 20:08, Ash Berlin-Taylor <as...@apache.org> wrote:
> > > >
> > > > Pretty hard pass from me in airflow_ext. If it's released by airflow
> I
> > > want it to live under airflow.* (Anyone else is free to release
> packages
> > > under any namespace they choose)
> > > >
> > > > That said I think I've got something that works:
> > > >
> > > >
> > >
> >
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/__init__.py
> > > module level code running
> > > >
> > >
> >
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/providers/gcp/__init__.py
> > > module level code running
> > > >
> > > > Let me test it again in a few different cases etc.
> > > >
> > > > -a
> > > >
> > > > On 4 November 2019 14:00:24 GMT, Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > > wrote:
> > > > Hey Ash,
> > > >
> > > > Thanks for the offer. I must admin pkgutil and package namespaces are
> > not
> > > > the best documented part of python.
> > > >
> > > > I dug a deep deeper and I found a similar problem -
> > > > https://github.com/pypa/setuptools/issues/895. <
> > > https://github.com/pypa/setuptools/issues/895.>  Seems that even if it
> > is
> > > > not explicitly explained in pkgutil documentation, this comment
> > (assuming
> > > > it is right) explains everything:
> > > >
> > > > *"That's right. All parents of a namespace package must also be
> > namespace
> > > > packages, as they will necessarily share that parent name space (farm
> > and
> > > > farm.deps in this example)."*
> > > >
> > > > There are few possibilities mentioned in the issue on how this can be
> > > > "workarounded", but those are by far not perfect solutions. They
> would
> > > > require patching already installed airflow's __init__.py to work - to
> > > > manipulate the search path, Still from my tests I do not know if this
> > > would
> > > > be possible at all because of the non-trivial __init__.py we have
> (and
> > > use)
> > > > in the *airflow* package.
> > > >
> > > > We have a few PRs now waiting for decision on that one I think, so
> > maybe
> > > we
> > > > can simply agree that we should use another package (I really like
> > > > *"airflow_ext"
> > > > *:D  and use it from now on? What do you (and others) think.
> > > >
> > > > I'd love to start voting on it soon.
> > > >
> > > > J.
> > > >
> > > >
> > > >
> > > > On Thu, Oct 31, 2019 at 5:37 PM Ash Berlin-Taylor <as...@apache.org>
> > > wrote:
> > > >
> > > > Let me run some tests too - I've used them a bit in the past. I
> thought
> > > > since we only want to make airflow.providers a namespace package it
> > might
> > > > work for us.
> > > >
> > > > Will report back next week.
> > > >
> > > > -ash
> > > >
> > > > On 31 October 2019 15:58:22 GMT, Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > > > wrote:
> > > > The same repo (so mono-repo approach). All packages would be in
> > > > "airflow_integrations" directory. It's mainly about moving the
> > > > operators/hooks/sensor files to different directory structure.
> > > >
> > > > It might be done pretty much without changing the current
> > > > installation/development model:
> > > >
> > > > 1) We can add setup.py command to install all the packages in -e mode
> > > > in
> > > > the main setup.py (to make it easier to install all deps in one go).
> > > > 2) We can add dependencies in setup.py extras to install appropriate
> > > > packages. For example [google] extra will 'require
> > > > apache-airflow-integrations-providers-google' package - or
> > > > apache-airflow-providers-google if we decide to skip -integrations
> from
> > > > the
> > > > package name to make it shorter.
> > > >
> > > > The only potential drawback I see is a bit more involved setup of the
> > > > IDE.
> > > >
> > > > This way installation method for both dev and prod remains simple.
> > > >
> > > > In the future we can have separate release schedule for the packages
> > > > (AIP-8) but for now we can stick to the same version for
> > > > 'apache-airflow'
> > > > and 'apache-airflow-integrations-*' package (+ separate release
> > > > schedule
> > > > for backporting needs)
> > > > Here again the structure of repo (we will likely be able to use
> native
> > > > namespaces so I removed some needles __init__.py).
> > > >
> > > > |-- airflow
> > > > |   |- __init__.py|   |- operators -> fundamental operators are here
> > > > |-- tests -> tests for core airflow are here (optionally we can move
> > > > them under "airflow")|-- setup.py -> setup.py for the
> "apache-airflow"
> > > > package|-- airflow_integrations
> > > > |   |-providers
> > > > |   | |-google
> > > > |   |   |-setup.py -> setup.py for the
> > > > "apache-airflow-integrations-providers-google" package
> > > > |   |   |-airflow_integrations
> > > > |   |     |-providers
> > > > |   |       |-google
> > > > |   |         |-__init__.py
> > > > |   |         | tests -> tests for the
> > > > "apache-airflow-integrations-providers-google" package|   |
> > > > |-__init__.py|   |-protocols
> > > > |     |-setup.py -> setup.py for the
> > > > "apache-airflow-integrations-protocols" package
> > > > |     |-airflow_integrations
> > > > |        |-protocols
> > > > |          |-__init__.py|          |-tests -> tests for the
> > > > "apache-airflow-integrations-protocols" package
> > > >
> > > >
> > > > J.
> > > >
> > > > On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <ka...@gmail.com>
> > wrote:
> > > >
> > > > So create another package in a different repo? or the same repo with
> > > > a
> > > > separate setup.py file that has airflow has dependency?
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk
> > > > <Ja...@polidea.com>
> > > > wrote:
> > > >
> > > > TL;DR; I did some more testing on how namespaces work. I still
> > > > believe
> > > > the
> > > > only way to use namespaces is to have separate (for example
> > > > "airflow_integrations") package for all backportable packages.
> > > >
> > > > I am not sue if someone used namespaces before, but after reading
> > > > and
> > > > trying out , the main blocker seems to be that we have non-trivial
> > > > code
> > > > in
> > > > airflow's "__init__.py"  (including class definitions, imported
> > > > sub-packages and plugin initialisation).
> > > >
> > > > Details are in
> > > > https://packaging.python.org/guides/packaging-namespace-packages/ <
> > > https://packaging.python.org/guides/packaging-namespace-packages/>
> > > > but
> > > > it's
> > > > a long one so let me summarize my findings:
> > > >
> > > >    - In order to use "airflow.providers" package we would have to
> > > > declare
> > > > "airflow" as namespace
> > > > - It can be done in three different ways:
> > > >   - omitting __init__.py in this package (native/implicit
> > > > namespace)
> > > > - making __init__.py  of the "airflow" package in main
> > > > airflow (and
> > > > other packages) must be "*__path__ =
> > > > __import__('pkgutil').extend_path(__path__, __name__)*"
> > > > (pkgutil
> > > > style) or
> > > > "*__import__('pkg_resources').declare_namespace(__name__)*"
> > > >       (pkg_resources style)
> > > >
> > > > The first is not possible (we already have __init__.py  in
> > > > "airflow".
> > > > The second case is not possible because we already have quite a lot
> > > > in
> > > > the
> > > > airflow's "__init__.py" and both pkgutil and pkg_resources style
> > > > state:
> > > >
> > > > "*Every* distribution that uses the namespace package must include
> > > > an
> > > > identical *__init__.py*. If any distribution does not, it will
> > > > cause the
> > > > namespace logic to fail and the other sub-packages will not be
> > > > importable.
> > > > *Any
> > > > additional code in __init__.py will be inaccessible."*
> > > >
> > > > I even tried to add those pkgutil/pkg_resources to airflow and do
> > > > some
> > > > experimenting with it - but it does not work. Pip install fails at
> > > > the
> > > > plugins_manager as "airflow.plugins" is not accessible (kind of
> > > > expected),
> > > > but I am sure there will be other problems as well. :(
> > > >
> > > > Basically - we cannot turn "airflow" into namespace because it has
> > > > some
> > > > "__init__.py" logic :(.
> > > >
> > > > So I think it still holds that if we want to use namespaces, we
> > > > should
> > > > use
> > > > another package. The *"airflow_integrations"* is current candidate,
> > > > but
> > > > we
> > > > can think of some nicer/shorter one: "airflow_ext", "airflow_int",
> > > > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt",
> > > > "airflow_",
> > > > "ext_airflow", ....  Interestingly "airflow_" is the one suggested
> > > > by
> > > > PEP8
> > > > to avoid conflicts with Python names (which is a different case but
> > > > kind
> > > > of
> > > > close).
> > > >
> > > > What do you think?
> > > >
> > > > J.
> > > >
> > > > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <ka...@gmail.com>
> > > > wrote:
> > > >
> > > > The namespace feature looks promising and from your tests, it
> > > > looks
> > > > like
> > > > it
> > > > would work well from Airflow 2.0 and onwards.
> > > >
> > > > I will look at it in-depth and see if I have more suggestions or
> > > > opinion
> > > > on
> > > > it
> > > >
> > > > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk
> > > > <Jarek.Potiuk@polidea.com
> > > >
> > > > wrote:
> > > >
> > > > TL;DR; We did some testing about namespaces and packaging (and
> > > > potential
> > > > backporting options for 1.10.* python3 Airflows) and we think
> > > > it's
> > > > best
> > > > to
> > > > use namespaces quickly and use different package name
> > > > "airflow-integrations" for all non-fundamental integrations.
> > > >
> > > > Unless we missed some tricks, we cannot use airflow.*
> > > > sub-packages
> > > > for
> > > > the
> > > > 1.10.* backportable packages. Example:
> > > >
> > > >    - "*apache-airflow"* package provides: "airflow.*" (this is
> > > > what
> > > > we
> > > > have
> > > >    today)
> > > >    - "*apache-airflow-providers-google*": provides
> > > >    "airflow.providers.google.*" packages
> > > >
> > > > If we install both packages (old apache-airflow 1.10.6  and new
> > > > apache-airflow-providers-google from 2.0) - it seems that
> > > > the "airflow.providers.google.*" package cannot be imported.
> > > > This is
> > > > a
> > > > bit
> > > > of a problem if we would like to backport the operators from
> > > > Airflow
> > > > 2.0
> > > > to
> > > > Airflow 1.10 in a way that will be forward-compatible We really
> > > > want
> > > > users
> > > > who started using backported operators in 1.10.* do not have to
> > > > change
> > > > imports in their DAGs to run them in Airflow 2.0.
> > > >
> > > > We discussed it internally in our team and considered several
> > > > options,
> > > > but
> > > > we think the best way will be to go straight to "namespaces" in
> > > > Airflow
> > > > 2.0
> > > > and to have the integrations (as discussed in AIP-21
> > > > discussion) to
> > > > be
> > > > in a
> > > > separate "*airflow_integrations*" package.  It might be even
> > > > more
> > > > towards
> > > > the AIP-8 implementation and plays together very well in terms
> > > > of
> > > > "stewardship" discussed in AIP-21 now. But we will still keep
> > > > (for
> > > > now)
> > > > single release process for all packages for 2.0 (except for the
> > > > backporting
> > > > which can be done per-provider before 2.0 release) and provide
> > > > a
> > > > foundation
> > > > for future more complex release cycles in future versions.
> > > >
> > > > Herre is the way how the new Airflow 2.0 repository could look
> > > > like
> > > > (i
> > > > only
> > > > show subset of dirs but they are representative). For those
> > > > whose
> > > > email
> > > > fixed/colorfont will get corrupted here is an image of this
> > > > structure
> > > > https://pasteboard.co/IEesTih.png: <
> https://pasteboard.co/IEesTih.png
> > :>
> > > >
> > > > |-- airflow
> > > > |   |- __init__.py|   |- operators -> fundamental operators are
> > > > here
> > > > |-- tests -> tests for core airflow are here (optionally we can
> > > > move
> > > > them under "airflow")|-- setup.py -> setup.py for the
> > > > "apache-airflow"
> > > > package|-- airflow_integrations
> > > > |   |-providers
> > > > |   | |-google
> > > > |   |   |-setup.py -> setup.py for the
> > > > "apache-airflow-integrations-providers-google" package
> > > > |   |   |-airflow_integrations
> > > > |   |     |-__init__.py
> > > > |   |     |-providers
> > > > |   |       |-__init__.py
> > > > |   |       |-google
> > > > |   |         |-__init__.py
> > > > |   |         | tests -> tests for the
> > > > "apache-airflow-integrations-providers-google" package|   |
> > > > |-__init__.py|   |-protocols
> > > > |     |-setup.py -> setup.py for the
> > > > "apache-airflow-integrations-protocols" package
> > > > |     |-airflow_integrations
> > > > |        |-protocols
> > > > |          |-__init__.py|          |-tests -> tests for the
> > > > "apache-airflow-integrations-protocols" package
> > > >
> > > > There are a number of pros for this solution:
> > > >
> > > >    - We could use the standard namespaces feature of python to
> > > > build
> > > >    multiple packages:
> > > >
> > > > https://packaging.python.org/guides/packaging-namespace-packages/ <
> > > https://packaging.python.org/guides/packaging-namespace-packages/>
> > > > - Installation for users will be the same as previously. We
> > > > could
> > > > install the needed packages automatically when particular
> > > > extras
> > > > are
> > > > used
> > > >   (pip install apache-airflow[google] could install both
> > > > "apache-airflow"
> > > > and
> > > >   "apache-airflow-integrations-providers-google")
> > > >   - We could have custom setup.py installation process for
> > > > developers
> > > > that
> > > > could install all the packages in development ("-e ." mode)
> > > > in a
> > > > single
> > > > operation.
> > > > - In case of transfer packages we could have nice error
> > > > messages
> > > > informing that the other package needs to be installed (for
> > > > example
> > > > S3->GCS
> > > >   operator would import
> > > > "airflow-integrations.providers.amazon.*"
> > > > and
> > > > if
> > > > it
> > > >   fails it could raise ("Please install [amazon] extra to use
> > > > me.")
> > > > - We could implement numerous optimisations in the way how
> > > > we run
> > > > tests
> > > > in CI (for example run all the "providers" tests only with
> > > > sqlite,
> > > > run
> > > > tests in parallel etc.)
> > > > - We could implement it gradually - we do not have to have a
> > > > "big
> > > > bang"
> > > > approach - we can implement it in "provider-by-provider" way
> > > > and
> > > > test
> > > > it
> > > > with one provider (Google) first to make sure that all the
> > > > mechanisms
> > > > are
> > > >   working
> > > >   - For now we could have the monorepo approach where all the
> > > > packages
> > > > will be developed in concert - for now avoiding the
> > > > dependency
> > > > problems
> > > > (but allowing for back-portability to 1.10).
> > > > - We will have clear boundaries between packages and ability
> > > > to
> > > > test
> > > > for
> > > > some unwanted/hidden dependencies between packages.
> > > > - We could switch to (much better) sphinx-apidoc package to
> > > > continue
> > > > building single documentation for all of those (sphinx
> > > > apidoc has
> > > > support
> > > >    for namespaces).
> > > >
> > > > As we are working on GCP move from contrib to core, we could
> > > > make all
> > > > the
> > > > effort to test it and try it before we merge it to master so
> > > > that it
> > > > will
> > > > be ready for others (and we could help with most of the moves
> > > > afterwards).
> > > > It seems complex, but in fact in most cases it will be very
> > > > simple
> > > > move
> > > > between the packages and can be done incrementally so there is
> > > > little
> > > > risk
> > > > in doing this I think.
> > > >
> > > > J.
> > > >
> > > >
> > > > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yr...@gmail.com>
> > > > wrote:
> > > >
> > > > Tomasz and Ash got good points about the overhead of having
> > > > separate
> > > > repos.
> > > > But while we grow bigger and more mature, I would prefer to
> > > > have
> > > > what
> > > > was
> > > > described in AIP-8. It shouldn't be extremely hard for us to
> > > > come
> > > > up
> > > > with
> > > > good strategies to handle the overhead. AIP-8 already talked
> > > > about
> > > > how
> > > > it
> > > > can benefit us. IMO on a high level, having clearly
> > > > seperation on
> > > > core
> > > > vs.
> > > > hooks/operators would make the project much more scalable and
> > > > the
> > > > gains
> > > > would outweigh the cost we pay.
> > > >
> > > > That being said, I'm supportive to this moving towards AIP-8
> > > > while
> > > > learning
> > > > approach, quite a good practise to tackle a big project.
> > > > Looking
> > > > forward
> > > > to
> > > > read the AIP.
> > > >
> > > >
> > > > Cheers,
> > > > Kevin Y
> > > >
> > > > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <
> > > > Jarek.Potiuk@polidea.com
> > > >
> > > > wrote:
> > > >
> > > > We are checking how we can use namespaces in back-portable
> > > > way
> > > > and
> > > > we
> > > > will
> > > > have POC soon so that we all will be able to see how it
> > > > will look
> > > > like.
> > > >
> > > > J.
> > > >
> > > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <
> > > > ash@apache.org>
> > > > wrote:
> > > >
> > > > I'll have to read your proposal in detail (sorry, no time
> > > > right
> > > > now!),
> > > > but
> > > > I'm broadly in favour of this approach, and I think
> > > > keeping
> > > > them
> > > > _in_
> > > > the
> > > > same repo is the best plan -- that makes writing and
> > > > testing
> > > > cross-cutting
> > > > changes  easier.
> > > >
> > > > -a
> > > >
> > > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
> > > > tomasz.urbaszek@polidea.com
> > > >
> > > > wrote:
> > > >
> > > > I think utilizing namespaces should reduce a lot of
> > > > problems
> > > > raised
> > > > by
> > > > using separate repos (who will manage it? how to
> > > > release?
> > > > where
> > > > should
> > > > be
> > > > the repo?).
> > > >
> > > > Bests,
> > > > Tomek
> > > >
> > > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
> > > > Jarek.Potiuk@polidea.com>
> > > > wrote:
> > > >
> > > > Thanks Bas for comments! Let me share my thoughts
> > > > below.
> > > >
> > > > On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> > > > basharenslak@godatadriven.com>
> > > > wrote:
> > > >
> > > > Hi Jarek, I definitely see a future in creating
> > > > separate
> > > > installable
> > > > packages for various operators/hooks/etc (as in
> > > > AIP-8).
> > > > This
> > > > would
> > > > IMO
> > > > strip the “core” Airflow to only what’s needed and
> > > > result
> > > > in
> > > > a
> > > > small
> > > > package without a ton of dependencies (and make it
> > > > more
> > > > maintainable,
> > > > shorter tests, etc etc etc). Not exactly sure though
> > > > what
> > > > you’re
> > > > proposing
> > > > in your e-mail, is it a new AIP for an intermediate
> > > > step
> > > > towards
> > > > AIP-8?
> > > >
> > > >
> > > > It's a new AIP I am proposing.  For now it's only for
> > > > backporting
> > > > the
> > > > new
> > > > 2.0 import paths to 1.10.* series.
> > > >
> > > > It's more of "incremental going in direction of AIP-8
> > > > and
> > > > learning
> > > > some
> > > > difficulties involved" than implementing AIP-8 fully.
> > > > We are
> > > > taking
> > > > advantage of changes in import paths from AIP-21 which
> > > > make
> > > > it
> > > > possible
> > > > to
> > > > have both old and new (optional) operators available
> > > > in
> > > > 1.10.*
> > > > series
> > > > of
> > > > Airflow. I think there is a lot more to do for full
> > > > implementation
> > > > of
> > > > AIP-8: decisions how to maintain, install those
> > > > operator
> > > > groups
> > > > separately,
> > > > stewardship model/organisation for the separate
> > > > groups, how
> > > > to
> > > > manage
> > > > cross-dependencies, procedures for releasing the
> > > > packages
> > > > etc.
> > > >
> > > > I think about this new AIP also as a learning effort -
> > > > we
> > > > would
> > > > learn
> > > > more
> > > > how separate packaging works and then we can follow up
> > > > with
> > > > AIP-8
> > > > full
> > > > implementation for "modular" Airflow. Then AIP-8 could
> > > > be
> > > > implemented
> > > > in
> > > > Airflow 2.1 for example - or 3.0 if we start following
> > > > semantic
> > > > versioning
> > > > - based on those learnings. It's a bit of good example
> > > > of
> > > > having
> > > > cake
> > > > and
> > > > eating it too. We can try out modularity in 1.10.*
> > > > while
> > > > cutting
> > > > the
> > > > scope
> > > > of 2.0 and not implementing full management/release
> > > > procedure
> > > > for
> > > > AIP-8
> > > > yet.
> > > >
> > > >
> > > > Thinking about this, I think there are still a few
> > > > grey
> > > > areas
> > > > (which
> > > > would
> > > > be good to discuss in a new AIP, or continue on
> > > > AIP-8):
> > > >
> > > >  *   In your email you only speak only about the 3
> > > > big
> > > > cloud
> > > > providers
> > > > (btw I made a PR for migrating all AWS components ->
> > > > https://github.com/apache/airflow/pull/6439). <
> > > https://github.com/apache/airflow/pull/6439).> Is
> > > > there a
> > > > plan
> > > > for
> > > > splitting other components than Google/AWS/Azure?
> > > >
> > > >
> > > > We could add more groups as part of this new AIP
> > > > indeed (as
> > > > an
> > > > extension to
> > > > AIP-21 and pre-requisite to AIP-8). We already see how
> > > > moving/deprecation
> > > > works for the providers package - it works for
> > > > GCP/Google
> > > > rather
> > > > nicely.
> > > > But there is nothing to prevent us from extending it
> > > > to
> > > > cover
> > > > other
> > > > groups
> > > > of operators/hooks. If you look at the current
> > > > structure of
> > > > documentation
> > > > done by Kamil, we can follow the structure there and
> > > > move
> > > > the
> > > > operators/hooks accordingly (
> > > >
> > > >
> > > >
> > > >
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
> > <
> > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html>
> > > > ):
> > > >
> > > >      Fundamentals, ASF: Apache Software Foundation,
> > > > Azure:
> > > > Microsoft
> > > > Azure, AWS: Amazon Web Services, GCP: Google Cloud
> > > > Platform,
> > > > Service
> > > > integrations, Software integrations, Protocol
> > > > integrations.
> > > >
> > > > I am happy to include that in the AIP - if others
> > > > agree
> > > > it's a
> > > > good
> > > > idea.
> > > > Out of those groups -  I think only Fundamentals
> > > > should not
> > > > be
> > > > back-ported.
> > > > Others should be rather easy to port (if we decide
> > > > to). We
> > > > already
> > > > have
> > > > quite a lot of those in the new GCP operators for 2.0.
> > > > So
> > > > starting
> > > > with
> > > > GCP/Google group is a good idea. Also following with
> > > > Cloud
> > > > Providers
> > > > first
> > > > is a good thing. For example we have now support from
> > > > Google
> > > > Composer
> > > > team
> > > > to do this separation for GCP (and we learn from it)
> > > > and
> > > > then
> > > > we
> > > > can
> > > > claim
> > > > the stewardship in our team for releasing the python
> > > > 3/
> > > > Airflow
> > > > 1.10-compatible "airflow-google" packages. Possibly
> > > > other
> > > > Cloud
> > > > Providers/teams might follow this (if they see the
> > > > value in
> > > > it)
> > > > and
> > > > there
> > > > could be different stewards for those. And then we can
> > > > do
> > > > other
> > > > groups
> > > > if
> > > > we decide to. I think this way we can learn whether
> > > > AIP-8 is
> > > > manageable
> > > > and
> > > > what real problems we are going to face.
> > > >
> > > >  *   Each “plugin” e.g. GCP would be a separate repo,
> > > > should
> > > > we
> > > > create
> > > > some sort of blueprint for such packages?
> > > >
> > > >
> > > > I think we do not need separate repos (at all) but in
> > > > this
> > > > new
> > > > AIP
> > > > we
> > > > can
> > > > test it before we decide to go for AIP-8. IMHO -
> > > > monorepo
> > > > approach
> > > > will
> > > > work here rather nicely. We could use python-3 native
> > > > namespaces
> > > > <
> > > >
> > > > https://packaging.python.org/guides/packaging-namespace-packages/ <
> > > https://packaging.python.org/guides/packaging-namespace-packages/>>
> > > > for
> > > > the
> > > > sub-packages when we go full AIP-8. For now we could
> > > > simply
> > > > package
> > > > the
> > > > new
> > > > operators in separate pip package for Python 3 version
> > > > 1.10.*
> > > > series
> > > > only.
> > > > We only need to test if it works well with another
> > > > package
> > > > providing
> > > > 'airflow.providers.*' after apache-airflow is
> > > > installed
> > > > (providing
> > > > 'airflow' package). But I think we can make it work. I
> > > > don't
> > > > think
> > > > we
> > > > really need to split the repos, namespaces will work
> > > > just
> > > > fine
> > > > and
> > > > has
> > > > easier management of cross-repository dependencies
> > > > (but we
> > > > can
> > > > learn
> > > > otherwise). For sure we will not need it for the new
> > > > proposed
> > > > AIP
> > > > of
> > > > backporting groups to 1.10 and we can defer that
> > > > decision to
> > > > AIP-8
> > > > implementation time.
> > > >
> > > >
> > > > *   In which Airflow version do we start raising
> > > > deprecation
> > > > warnings
> > > > and in which version would we remove the original?
> > > >
> > > >
> > > > I think we should do what we did in GCP case already.
> > > > Those
> > > > old
> > > > "imports"
> > > > for operators can be made as deprecated in Airflow 2.0
> > > > (and
> > > > removed
> > > > in
> > > > 2.1
> > > > or 3.0 if we start following semantic versioning). We
> > > > can
> > > > however
> > > > do
> > > > it
> > > > before in 1.10.7 or 1.10.8 if we release those
> > > > (without
> > > > removing
> > > > the
> > > > old
> > > > operators yet - just raise deprecation warnings and
> > > > inform
> > > > that
> > > > for
> > > > python3
> > > > the new "airflow-google", "airflow-aws" etc. packages
> > > > can be
> > > > installed
> > > > and
> > > > users can switch to it).
> > > >
> > > > J.
> > > >
> > > >
> > > >
> > > > Cheers,
> > > > Bas
> > > >
> > > > On 27 Oct 2019, at 08:33, Jarek Potiuk <
> > > > Jarek.Potiuk@polidea.com
> > > > <mailto:
> > > > Jarek.Potiuk@polidea.com>> wrote:
> > > >
> > > > Hello - any comments on that? I am happy to make it
> > > > into an
> > > > AIP
> > > > :)?
> > > >
> > > > On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
> > > > Jarek.Potiuk@polidea.com
> > > > <ma...@polidea.com>>
> > > > wrote:
> > > >
> > > > *Motivation*
> > > >
> > > > I think we really should start thinking about making
> > > > it
> > > > easier
> > > > to
> > > > migrate
> > > > to 2.0 for our users. After implementing some recent
> > > > changes
> > > > related
> > > > to
> > > > AIP-21-
> > > > Changes in import paths
> > > > <
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > > >
> > > >
> > > > I
> > > > think I have an idea that might help with it.
> > > >
> > > > *Proposal*
> > > >
> > > > We could package some of the new and improved 2.0
> > > > operators
> > > > (moved
> > > > to
> > > > "providers" package) and let them be used in Python 3
> > > > environment
> > > > of
> > > > airflow 1.10.x.
> > > >
> > > > This can be done case-by-case per "cloud provider".
> > > > It
> > > > should
> > > > not
> > > > be
> > > > obligatory, should be largely driven by each
> > > > provider. It's
> > > > not
> > > > yet
> > > > full
> > > > AIP-8
> > > > Split Hooks/Operators into separate packages
> > > > <
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> > > >
> > > > .
> > > > It's
> > > > merely backporting of some operators/hooks to get it
> > > > work
> > > > in
> > > > 1.10.
> > > > But
> > > > by
> > > > doing it we might try out the concept of splitting,
> > > > learn
> > > > about
> > > > maintenance
> > > > problems and maybe implement full *AIP-8 *approach in
> > > > 2.1
> > > > consistently
> > > > across the board.
> > > >
> > > > *Context*
> > > >
> > > > Part of the AIP-21 was to move import paths for Cloud
> > > > providers
> > > > to
> > > > separate providers/<PROVIDER> package. An example for
> > > > that
> > > > (the
> > > > first
> > > > provider we already almost migrated) was
> > > > providers/google
> > > > package
> > > > (further
> > > > divided into gcp/gsuite etc).
> > > >
> > > > We've done a massive migration of all the
> > > > Google-related
> > > > operators,
> > > > created a few missing ones and retrofitted some old
> > > > operators
> > > > to
> > > > follow
> > > > GCP
> > > > best practices and fixing a number of problems - also
> > > > implementing
> > > > Python3
> > > > and Pylint compatibility. Some of these
> > > > operators/hooks are
> > > > not
> > > > backwards
> > > > compatible. Those that are compatible are still
> > > > available
> > > > via
> > > > the
> > > > old
> > > > imports with deprecation warning.
> > > >
> > > > We've added missing tests (including system tests)
> > > > and
> > > > missing
> > > > features -
> > > > improving some of the Google operators - giving the
> > > > users
> > > > more
> > > > capabilities
> > > > and fixing some issues. Those operators should pretty
> > > > much
> > > > "just
> > > > work"
> > > > in
> > > > Airflow 1.10.x (any recent version) for Python 3. We
> > > > should
> > > > be
> > > > able
> > > > to
> > > > release a separate pip-installable package for those
> > > > operators
> > > > that
> > > > users
> > > > should be able to install in Airflow 1.10.x.
> > > >
> > > > Any user will be able to install this separate
> > > > package in
> > > > their
> > > > Airflow
> > > > 1.10.x installation and start using those new
> > > > "provider"
> > > > operators
> > > > in
> > > > parallel to the old 1.10.x operators. Other providers
> > > > ("microsoft",
> > > > "amazon") might follow the same approach if they
> > > > want. We
> > > > could
> > > > even
> > > > at
> > > > some point decide to move some of the core operators
> > > > in
> > > > similar
> > > > fashion
> > > > (for example following the structure proposed in the
> > > > latest
> > > > documentation:
> > > > fundamentals / software / etc.
> > > >
> > > >
> > > >
> > > >
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
> > <
> > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
> )>
> > > >
> > > > *Pros and cons*
> > > >
> > > > There are a number of pros:
> > > >
> > > >  - Users will have an easier migration path if they
> > > > are
> > > > deeply
> > > > vested
> > > > into 1.10.* version
> > > > - It's possible to migrate in stages for people who
> > > > are
> > > > also
> > > > vested
> > > > in
> > > > py2: *py2 (1.10) -> py3 (1.10) -> py3 + new
> > > > operators
> > > > (1.10)
> > > > ->
> > > > py3
> > > > +
> > > > 2.0*
> > > > - Moving to new operators in py3 + new operators can
> > > > be
> > > > done
> > > > gradually. Old operators will continue to work while
> > > > new
> > > > can
> > > > be
> > > > used
> > > > more
> > > > and more
> > > > - People will get incentivised to migrate to python
> > > > 3
> > > > before
> > > > 2.0
> > > > is
> > > > out (by using new operators)
> > > > - Each provider "package" can have independent
> > > > release
> > > > schedule
> > > > -
> > > > and
> > > > add functionality in already released Airflow
> > > > versions.
> > > > - We do not take out any functionality from the
> > > > users - we
> > > > just
> > > > add
> > > > more options
> > > > - The releases can be - similarly as main airflow
> > > > releases -
> > > > voted
> > > > separately by PMC after "stewards" of the package
> > > > (per
> > > > provider)
> > > > perform
> > > > round of testing on 1.10.* versions.
> > > > - Users will start migrating to new operators
> > > > earlier and
> > > > have
> > > >  smoother switch to 2.0 later
> > > >  - The latest improved operators will start
> > > >
> > > > There are three cons I could think of:
> > > >
> > > >  - There will be quite a lot of duplication between
> > > > old and
> > > > new
> > > > operators (they will co-exist in 1.10). That might
> > > > lead to
> > > > confusion
> > > > of
> > > > users and problems with cooperation between
> > > > different
> > > > operators/hooks
> > > > - Having new operators in 1.10 python 3 might keep
> > > > people
> > > > from
> > > > migrating to 2.0
> > > > - It will require some maintenance and separate
> > > > release
> > > > overhead.
> > > >
> > > > I already spoke to Composer team @Google and they are
> > > > very
> > > > positive
> > > > about
> > > > this. I also spoke to Ash and seems it might also be
> > > > OK for
> > > > Astronomer
> > > > team. We have Google's backing and support, and we
> > > > can
> > > > provide
> > > > maintenance
> > > > and support for those packages - being an example for
> > > > other
> > > > providers
> > > > how
> > > > they can do it.
> > > >
> > > > Let me know what you think - and whether I should
> > > > make it
> > > > into
> > > > an
> > > > official
> > > > AIP maybe?
> > > >
> > > > J.
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal
> > > > Software
> > > > Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal
> > > > Software
> > > > Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal
> > > > Software
> > > > Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Tomasz Urbaszek
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Junior
> > > Software
> > > > Engineer
> > > >
> > > > M: +48 505 628 493 <+48505628493>
> > > > E: tomasz.urbaszek@polidea.com
> > > > <tomasz.urbaszeki@polidea.com
> > > >
> > > >
> > > > Unique Tech
> > > > Check out our projects!
> > > > <https://www.polidea.com/our-work <https://www.polidea.com/our-work
> >>
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal Software
> > > > Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal Software
> > > > Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > > Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/
> >>
> > > >
> > > >
> > >
> > >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [PROPOSE] Ease future migration path to 2.0 by provider's operators/hook backporting to 1.10.*

Posted by Kaxil Naik <ka...@gmail.com>.
Yes let's just do (1) for now.



On Tue, Nov 5, 2019, 08:48 Jarek Potiuk <Ja...@polidea.com> wrote:

> Thanks Ash! It might indeed work. I will take it from there and try to make
> a POC PR with airflow.
>
> It's a bit different approach than google-python libraries (they keep all
> the libraries as separate sub-packages/mini projects inside the main
> project). The approach you propose is far less invasive in terms of
> changing structure of the main repo. I like it this way much more. It makes
> it much easier to import project in IDE even if it is less modular in
> nature.
>
> From what I understand with this structure - if it works - we have two
> options:
>
> (1) For Airflow 2.0 we will be able to install Airflow and all
> "integrations" in single (apache-airflow == 2.0.0) package and build
> separate backporting integration packages for 1.10.* only.
> (2) We will split Airflow 2.0 into separate "core" and "integration"
> packages as well while preparing packages.
>
> I think (1) is a bit more reasonable for now, until we work full AIP-8
> solution (including dependency hell solving). Let me know what you think
> (and others as well).
>
> J.
>
> On Mon, Nov 4, 2019 at 9:24 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
> > https://github.com/ashb/airflow-submodule-test <
> > https://github.com/ashb/airflow-submodule-test>
> >
> > That seems to work in any order things are installed, at least on python
> > 3.7. I've had a stressful few days so I may have missed something. Please
> > tell me if there's a case I've missed, or if this is not a suitable proxy
> > for our situation.
> >
> > -a
> >
> > > On 4 Nov 2019, at 20:08, Ash Berlin-Taylor <as...@apache.org> wrote:
> > >
> > > Pretty hard pass from me in airflow_ext. If it's released by airflow I
> > want it to live under airflow.* (Anyone else is free to release packages
> > under any namespace they choose)
> > >
> > > That said I think I've got something that works:
> > >
> > >
> >
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/__init__.py
> > module level code running
> > >
> >
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/providers/gcp/__init__.py
> > module level code running
> > >
> > > Let me test it again in a few different cases etc.
> > >
> > > -a
> > >
> > > On 4 November 2019 14:00:24 GMT, Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > wrote:
> > > Hey Ash,
> > >
> > > Thanks for the offer. I must admin pkgutil and package namespaces are
> not
> > > the best documented part of python.
> > >
> > > I dug a deep deeper and I found a similar problem -
> > > https://github.com/pypa/setuptools/issues/895. <
> > https://github.com/pypa/setuptools/issues/895.>  Seems that even if it
> is
> > > not explicitly explained in pkgutil documentation, this comment
> (assuming
> > > it is right) explains everything:
> > >
> > > *"That's right. All parents of a namespace package must also be
> namespace
> > > packages, as they will necessarily share that parent name space (farm
> and
> > > farm.deps in this example)."*
> > >
> > > There are few possibilities mentioned in the issue on how this can be
> > > "workarounded", but those are by far not perfect solutions. They would
> > > require patching already installed airflow's __init__.py to work - to
> > > manipulate the search path, Still from my tests I do not know if this
> > would
> > > be possible at all because of the non-trivial __init__.py we have (and
> > use)
> > > in the *airflow* package.
> > >
> > > We have a few PRs now waiting for decision on that one I think, so
> maybe
> > we
> > > can simply agree that we should use another package (I really like
> > > *"airflow_ext"
> > > *:D  and use it from now on? What do you (and others) think.
> > >
> > > I'd love to start voting on it soon.
> > >
> > > J.
> > >
> > >
> > >
> > > On Thu, Oct 31, 2019 at 5:37 PM Ash Berlin-Taylor <as...@apache.org>
> > wrote:
> > >
> > > Let me run some tests too - I've used them a bit in the past. I thought
> > > since we only want to make airflow.providers a namespace package it
> might
> > > work for us.
> > >
> > > Will report back next week.
> > >
> > > -ash
> > >
> > > On 31 October 2019 15:58:22 GMT, Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> > > wrote:
> > > The same repo (so mono-repo approach). All packages would be in
> > > "airflow_integrations" directory. It's mainly about moving the
> > > operators/hooks/sensor files to different directory structure.
> > >
> > > It might be done pretty much without changing the current
> > > installation/development model:
> > >
> > > 1) We can add setup.py command to install all the packages in -e mode
> > > in
> > > the main setup.py (to make it easier to install all deps in one go).
> > > 2) We can add dependencies in setup.py extras to install appropriate
> > > packages. For example [google] extra will 'require
> > > apache-airflow-integrations-providers-google' package - or
> > > apache-airflow-providers-google if we decide to skip -integrations from
> > > the
> > > package name to make it shorter.
> > >
> > > The only potential drawback I see is a bit more involved setup of the
> > > IDE.
> > >
> > > This way installation method for both dev and prod remains simple.
> > >
> > > In the future we can have separate release schedule for the packages
> > > (AIP-8) but for now we can stick to the same version for
> > > 'apache-airflow'
> > > and 'apache-airflow-integrations-*' package (+ separate release
> > > schedule
> > > for backporting needs)
> > > Here again the structure of repo (we will likely be able to use native
> > > namespaces so I removed some needles __init__.py).
> > >
> > > |-- airflow
> > > |   |- __init__.py|   |- operators -> fundamental operators are here
> > > |-- tests -> tests for core airflow are here (optionally we can move
> > > them under "airflow")|-- setup.py -> setup.py for the "apache-airflow"
> > > package|-- airflow_integrations
> > > |   |-providers
> > > |   | |-google
> > > |   |   |-setup.py -> setup.py for the
> > > "apache-airflow-integrations-providers-google" package
> > > |   |   |-airflow_integrations
> > > |   |     |-providers
> > > |   |       |-google
> > > |   |         |-__init__.py
> > > |   |         | tests -> tests for the
> > > "apache-airflow-integrations-providers-google" package|   |
> > > |-__init__.py|   |-protocols
> > > |     |-setup.py -> setup.py for the
> > > "apache-airflow-integrations-protocols" package
> > > |     |-airflow_integrations
> > > |        |-protocols
> > > |          |-__init__.py|          |-tests -> tests for the
> > > "apache-airflow-integrations-protocols" package
> > >
> > >
> > > J.
> > >
> > > On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <ka...@gmail.com>
> wrote:
> > >
> > > So create another package in a different repo? or the same repo with
> > > a
> > > separate setup.py file that has airflow has dependency?
> > >
> > >
> > >
> > >
> > > On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk
> > > <Ja...@polidea.com>
> > > wrote:
> > >
> > > TL;DR; I did some more testing on how namespaces work. I still
> > > believe
> > > the
> > > only way to use namespaces is to have separate (for example
> > > "airflow_integrations") package for all backportable packages.
> > >
> > > I am not sue if someone used namespaces before, but after reading
> > > and
> > > trying out , the main blocker seems to be that we have non-trivial
> > > code
> > > in
> > > airflow's "__init__.py"  (including class definitions, imported
> > > sub-packages and plugin initialisation).
> > >
> > > Details are in
> > > https://packaging.python.org/guides/packaging-namespace-packages/ <
> > https://packaging.python.org/guides/packaging-namespace-packages/>
> > > but
> > > it's
> > > a long one so let me summarize my findings:
> > >
> > >    - In order to use "airflow.providers" package we would have to
> > > declare
> > > "airflow" as namespace
> > > - It can be done in three different ways:
> > >   - omitting __init__.py in this package (native/implicit
> > > namespace)
> > > - making __init__.py  of the "airflow" package in main
> > > airflow (and
> > > other packages) must be "*__path__ =
> > > __import__('pkgutil').extend_path(__path__, __name__)*"
> > > (pkgutil
> > > style) or
> > > "*__import__('pkg_resources').declare_namespace(__name__)*"
> > >       (pkg_resources style)
> > >
> > > The first is not possible (we already have __init__.py  in
> > > "airflow".
> > > The second case is not possible because we already have quite a lot
> > > in
> > > the
> > > airflow's "__init__.py" and both pkgutil and pkg_resources style
> > > state:
> > >
> > > "*Every* distribution that uses the namespace package must include
> > > an
> > > identical *__init__.py*. If any distribution does not, it will
> > > cause the
> > > namespace logic to fail and the other sub-packages will not be
> > > importable.
> > > *Any
> > > additional code in __init__.py will be inaccessible."*
> > >
> > > I even tried to add those pkgutil/pkg_resources to airflow and do
> > > some
> > > experimenting with it - but it does not work. Pip install fails at
> > > the
> > > plugins_manager as "airflow.plugins" is not accessible (kind of
> > > expected),
> > > but I am sure there will be other problems as well. :(
> > >
> > > Basically - we cannot turn "airflow" into namespace because it has
> > > some
> > > "__init__.py" logic :(.
> > >
> > > So I think it still holds that if we want to use namespaces, we
> > > should
> > > use
> > > another package. The *"airflow_integrations"* is current candidate,
> > > but
> > > we
> > > can think of some nicer/shorter one: "airflow_ext", "airflow_int",
> > > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt",
> > > "airflow_",
> > > "ext_airflow", ....  Interestingly "airflow_" is the one suggested
> > > by
> > > PEP8
> > > to avoid conflicts with Python names (which is a different case but
> > > kind
> > > of
> > > close).
> > >
> > > What do you think?
> > >
> > > J.
> > >
> > > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <ka...@gmail.com>
> > > wrote:
> > >
> > > The namespace feature looks promising and from your tests, it
> > > looks
> > > like
> > > it
> > > would work well from Airflow 2.0 and onwards.
> > >
> > > I will look at it in-depth and see if I have more suggestions or
> > > opinion
> > > on
> > > it
> > >
> > > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk
> > > <Jarek.Potiuk@polidea.com
> > >
> > > wrote:
> > >
> > > TL;DR; We did some testing about namespaces and packaging (and
> > > potential
> > > backporting options for 1.10.* python3 Airflows) and we think
> > > it's
> > > best
> > > to
> > > use namespaces quickly and use different package name
> > > "airflow-integrations" for all non-fundamental integrations.
> > >
> > > Unless we missed some tricks, we cannot use airflow.*
> > > sub-packages
> > > for
> > > the
> > > 1.10.* backportable packages. Example:
> > >
> > >    - "*apache-airflow"* package provides: "airflow.*" (this is
> > > what
> > > we
> > > have
> > >    today)
> > >    - "*apache-airflow-providers-google*": provides
> > >    "airflow.providers.google.*" packages
> > >
> > > If we install both packages (old apache-airflow 1.10.6  and new
> > > apache-airflow-providers-google from 2.0) - it seems that
> > > the "airflow.providers.google.*" package cannot be imported.
> > > This is
> > > a
> > > bit
> > > of a problem if we would like to backport the operators from
> > > Airflow
> > > 2.0
> > > to
> > > Airflow 1.10 in a way that will be forward-compatible We really
> > > want
> > > users
> > > who started using backported operators in 1.10.* do not have to
> > > change
> > > imports in their DAGs to run them in Airflow 2.0.
> > >
> > > We discussed it internally in our team and considered several
> > > options,
> > > but
> > > we think the best way will be to go straight to "namespaces" in
> > > Airflow
> > > 2.0
> > > and to have the integrations (as discussed in AIP-21
> > > discussion) to
> > > be
> > > in a
> > > separate "*airflow_integrations*" package.  It might be even
> > > more
> > > towards
> > > the AIP-8 implementation and plays together very well in terms
> > > of
> > > "stewardship" discussed in AIP-21 now. But we will still keep
> > > (for
> > > now)
> > > single release process for all packages for 2.0 (except for the
> > > backporting
> > > which can be done per-provider before 2.0 release) and provide
> > > a
> > > foundation
> > > for future more complex release cycles in future versions.
> > >
> > > Herre is the way how the new Airflow 2.0 repository could look
> > > like
> > > (i
> > > only
> > > show subset of dirs but they are representative). For those
> > > whose
> > > email
> > > fixed/colorfont will get corrupted here is an image of this
> > > structure
> > > https://pasteboard.co/IEesTih.png: <https://pasteboard.co/IEesTih.png
> :>
> > >
> > > |-- airflow
> > > |   |- __init__.py|   |- operators -> fundamental operators are
> > > here
> > > |-- tests -> tests for core airflow are here (optionally we can
> > > move
> > > them under "airflow")|-- setup.py -> setup.py for the
> > > "apache-airflow"
> > > package|-- airflow_integrations
> > > |   |-providers
> > > |   | |-google
> > > |   |   |-setup.py -> setup.py for the
> > > "apache-airflow-integrations-providers-google" package
> > > |   |   |-airflow_integrations
> > > |   |     |-__init__.py
> > > |   |     |-providers
> > > |   |       |-__init__.py
> > > |   |       |-google
> > > |   |         |-__init__.py
> > > |   |         | tests -> tests for the
> > > "apache-airflow-integrations-providers-google" package|   |
> > > |-__init__.py|   |-protocols
> > > |     |-setup.py -> setup.py for the
> > > "apache-airflow-integrations-protocols" package
> > > |     |-airflow_integrations
> > > |        |-protocols
> > > |          |-__init__.py|          |-tests -> tests for the
> > > "apache-airflow-integrations-protocols" package
> > >
> > > There are a number of pros for this solution:
> > >
> > >    - We could use the standard namespaces feature of python to
> > > build
> > >    multiple packages:
> > >
> > > https://packaging.python.org/guides/packaging-namespace-packages/ <
> > https://packaging.python.org/guides/packaging-namespace-packages/>
> > > - Installation for users will be the same as previously. We
> > > could
> > > install the needed packages automatically when particular
> > > extras
> > > are
> > > used
> > >   (pip install apache-airflow[google] could install both
> > > "apache-airflow"
> > > and
> > >   "apache-airflow-integrations-providers-google")
> > >   - We could have custom setup.py installation process for
> > > developers
> > > that
> > > could install all the packages in development ("-e ." mode)
> > > in a
> > > single
> > > operation.
> > > - In case of transfer packages we could have nice error
> > > messages
> > > informing that the other package needs to be installed (for
> > > example
> > > S3->GCS
> > >   operator would import
> > > "airflow-integrations.providers.amazon.*"
> > > and
> > > if
> > > it
> > >   fails it could raise ("Please install [amazon] extra to use
> > > me.")
> > > - We could implement numerous optimisations in the way how
> > > we run
> > > tests
> > > in CI (for example run all the "providers" tests only with
> > > sqlite,
> > > run
> > > tests in parallel etc.)
> > > - We could implement it gradually - we do not have to have a
> > > "big
> > > bang"
> > > approach - we can implement it in "provider-by-provider" way
> > > and
> > > test
> > > it
> > > with one provider (Google) first to make sure that all the
> > > mechanisms
> > > are
> > >   working
> > >   - For now we could have the monorepo approach where all the
> > > packages
> > > will be developed in concert - for now avoiding the
> > > dependency
> > > problems
> > > (but allowing for back-portability to 1.10).
> > > - We will have clear boundaries between packages and ability
> > > to
> > > test
> > > for
> > > some unwanted/hidden dependencies between packages.
> > > - We could switch to (much better) sphinx-apidoc package to
> > > continue
> > > building single documentation for all of those (sphinx
> > > apidoc has
> > > support
> > >    for namespaces).
> > >
> > > As we are working on GCP move from contrib to core, we could
> > > make all
> > > the
> > > effort to test it and try it before we merge it to master so
> > > that it
> > > will
> > > be ready for others (and we could help with most of the moves
> > > afterwards).
> > > It seems complex, but in fact in most cases it will be very
> > > simple
> > > move
> > > between the packages and can be done incrementally so there is
> > > little
> > > risk
> > > in doing this I think.
> > >
> > > J.
> > >
> > >
> > > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yr...@gmail.com>
> > > wrote:
> > >
> > > Tomasz and Ash got good points about the overhead of having
> > > separate
> > > repos.
> > > But while we grow bigger and more mature, I would prefer to
> > > have
> > > what
> > > was
> > > described in AIP-8. It shouldn't be extremely hard for us to
> > > come
> > > up
> > > with
> > > good strategies to handle the overhead. AIP-8 already talked
> > > about
> > > how
> > > it
> > > can benefit us. IMO on a high level, having clearly
> > > seperation on
> > > core
> > > vs.
> > > hooks/operators would make the project much more scalable and
> > > the
> > > gains
> > > would outweigh the cost we pay.
> > >
> > > That being said, I'm supportive to this moving towards AIP-8
> > > while
> > > learning
> > > approach, quite a good practise to tackle a big project.
> > > Looking
> > > forward
> > > to
> > > read the AIP.
> > >
> > >
> > > Cheers,
> > > Kevin Y
> > >
> > > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com
> > >
> > > wrote:
> > >
> > > We are checking how we can use namespaces in back-portable
> > > way
> > > and
> > > we
> > > will
> > > have POC soon so that we all will be able to see how it
> > > will look
> > > like.
> > >
> > > J.
> > >
> > > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <
> > > ash@apache.org>
> > > wrote:
> > >
> > > I'll have to read your proposal in detail (sorry, no time
> > > right
> > > now!),
> > > but
> > > I'm broadly in favour of this approach, and I think
> > > keeping
> > > them
> > > _in_
> > > the
> > > same repo is the best plan -- that makes writing and
> > > testing
> > > cross-cutting
> > > changes  easier.
> > >
> > > -a
> > >
> > > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
> > > tomasz.urbaszek@polidea.com
> > >
> > > wrote:
> > >
> > > I think utilizing namespaces should reduce a lot of
> > > problems
> > > raised
> > > by
> > > using separate repos (who will manage it? how to
> > > release?
> > > where
> > > should
> > > be
> > > the repo?).
> > >
> > > Bests,
> > > Tomek
> > >
> > > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com>
> > > wrote:
> > >
> > > Thanks Bas for comments! Let me share my thoughts
> > > below.
> > >
> > > On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> > > basharenslak@godatadriven.com>
> > > wrote:
> > >
> > > Hi Jarek, I definitely see a future in creating
> > > separate
> > > installable
> > > packages for various operators/hooks/etc (as in
> > > AIP-8).
> > > This
> > > would
> > > IMO
> > > strip the “core” Airflow to only what’s needed and
> > > result
> > > in
> > > a
> > > small
> > > package without a ton of dependencies (and make it
> > > more
> > > maintainable,
> > > shorter tests, etc etc etc). Not exactly sure though
> > > what
> > > you’re
> > > proposing
> > > in your e-mail, is it a new AIP for an intermediate
> > > step
> > > towards
> > > AIP-8?
> > >
> > >
> > > It's a new AIP I am proposing.  For now it's only for
> > > backporting
> > > the
> > > new
> > > 2.0 import paths to 1.10.* series.
> > >
> > > It's more of "incremental going in direction of AIP-8
> > > and
> > > learning
> > > some
> > > difficulties involved" than implementing AIP-8 fully.
> > > We are
> > > taking
> > > advantage of changes in import paths from AIP-21 which
> > > make
> > > it
> > > possible
> > > to
> > > have both old and new (optional) operators available
> > > in
> > > 1.10.*
> > > series
> > > of
> > > Airflow. I think there is a lot more to do for full
> > > implementation
> > > of
> > > AIP-8: decisions how to maintain, install those
> > > operator
> > > groups
> > > separately,
> > > stewardship model/organisation for the separate
> > > groups, how
> > > to
> > > manage
> > > cross-dependencies, procedures for releasing the
> > > packages
> > > etc.
> > >
> > > I think about this new AIP also as a learning effort -
> > > we
> > > would
> > > learn
> > > more
> > > how separate packaging works and then we can follow up
> > > with
> > > AIP-8
> > > full
> > > implementation for "modular" Airflow. Then AIP-8 could
> > > be
> > > implemented
> > > in
> > > Airflow 2.1 for example - or 3.0 if we start following
> > > semantic
> > > versioning
> > > - based on those learnings. It's a bit of good example
> > > of
> > > having
> > > cake
> > > and
> > > eating it too. We can try out modularity in 1.10.*
> > > while
> > > cutting
> > > the
> > > scope
> > > of 2.0 and not implementing full management/release
> > > procedure
> > > for
> > > AIP-8
> > > yet.
> > >
> > >
> > > Thinking about this, I think there are still a few
> > > grey
> > > areas
> > > (which
> > > would
> > > be good to discuss in a new AIP, or continue on
> > > AIP-8):
> > >
> > >  *   In your email you only speak only about the 3
> > > big
> > > cloud
> > > providers
> > > (btw I made a PR for migrating all AWS components ->
> > > https://github.com/apache/airflow/pull/6439). <
> > https://github.com/apache/airflow/pull/6439).> Is
> > > there a
> > > plan
> > > for
> > > splitting other components than Google/AWS/Azure?
> > >
> > >
> > > We could add more groups as part of this new AIP
> > > indeed (as
> > > an
> > > extension to
> > > AIP-21 and pre-requisite to AIP-8). We already see how
> > > moving/deprecation
> > > works for the providers package - it works for
> > > GCP/Google
> > > rather
> > > nicely.
> > > But there is nothing to prevent us from extending it
> > > to
> > > cover
> > > other
> > > groups
> > > of operators/hooks. If you look at the current
> > > structure of
> > > documentation
> > > done by Kamil, we can follow the structure there and
> > > move
> > > the
> > > operators/hooks accordingly (
> > >
> > >
> > >
> > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html
> <
> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html>
> > > ):
> > >
> > >      Fundamentals, ASF: Apache Software Foundation,
> > > Azure:
> > > Microsoft
> > > Azure, AWS: Amazon Web Services, GCP: Google Cloud
> > > Platform,
> > > Service
> > > integrations, Software integrations, Protocol
> > > integrations.
> > >
> > > I am happy to include that in the AIP - if others
> > > agree
> > > it's a
> > > good
> > > idea.
> > > Out of those groups -  I think only Fundamentals
> > > should not
> > > be
> > > back-ported.
> > > Others should be rather easy to port (if we decide
> > > to). We
> > > already
> > > have
> > > quite a lot of those in the new GCP operators for 2.0.
> > > So
> > > starting
> > > with
> > > GCP/Google group is a good idea. Also following with
> > > Cloud
> > > Providers
> > > first
> > > is a good thing. For example we have now support from
> > > Google
> > > Composer
> > > team
> > > to do this separation for GCP (and we learn from it)
> > > and
> > > then
> > > we
> > > can
> > > claim
> > > the stewardship in our team for releasing the python
> > > 3/
> > > Airflow
> > > 1.10-compatible "airflow-google" packages. Possibly
> > > other
> > > Cloud
> > > Providers/teams might follow this (if they see the
> > > value in
> > > it)
> > > and
> > > there
> > > could be different stewards for those. And then we can
> > > do
> > > other
> > > groups
> > > if
> > > we decide to. I think this way we can learn whether
> > > AIP-8 is
> > > manageable
> > > and
> > > what real problems we are going to face.
> > >
> > >  *   Each “plugin” e.g. GCP would be a separate repo,
> > > should
> > > we
> > > create
> > > some sort of blueprint for such packages?
> > >
> > >
> > > I think we do not need separate repos (at all) but in
> > > this
> > > new
> > > AIP
> > > we
> > > can
> > > test it before we decide to go for AIP-8. IMHO -
> > > monorepo
> > > approach
> > > will
> > > work here rather nicely. We could use python-3 native
> > > namespaces
> > > <
> > >
> > > https://packaging.python.org/guides/packaging-namespace-packages/ <
> > https://packaging.python.org/guides/packaging-namespace-packages/>>
> > > for
> > > the
> > > sub-packages when we go full AIP-8. For now we could
> > > simply
> > > package
> > > the
> > > new
> > > operators in separate pip package for Python 3 version
> > > 1.10.*
> > > series
> > > only.
> > > We only need to test if it works well with another
> > > package
> > > providing
> > > 'airflow.providers.*' after apache-airflow is
> > > installed
> > > (providing
> > > 'airflow' package). But I think we can make it work. I
> > > don't
> > > think
> > > we
> > > really need to split the repos, namespaces will work
> > > just
> > > fine
> > > and
> > > has
> > > easier management of cross-repository dependencies
> > > (but we
> > > can
> > > learn
> > > otherwise). For sure we will not need it for the new
> > > proposed
> > > AIP
> > > of
> > > backporting groups to 1.10 and we can defer that
> > > decision to
> > > AIP-8
> > > implementation time.
> > >
> > >
> > > *   In which Airflow version do we start raising
> > > deprecation
> > > warnings
> > > and in which version would we remove the original?
> > >
> > >
> > > I think we should do what we did in GCP case already.
> > > Those
> > > old
> > > "imports"
> > > for operators can be made as deprecated in Airflow 2.0
> > > (and
> > > removed
> > > in
> > > 2.1
> > > or 3.0 if we start following semantic versioning). We
> > > can
> > > however
> > > do
> > > it
> > > before in 1.10.7 or 1.10.8 if we release those
> > > (without
> > > removing
> > > the
> > > old
> > > operators yet - just raise deprecation warnings and
> > > inform
> > > that
> > > for
> > > python3
> > > the new "airflow-google", "airflow-aws" etc. packages
> > > can be
> > > installed
> > > and
> > > users can switch to it).
> > >
> > > J.
> > >
> > >
> > >
> > > Cheers,
> > > Bas
> > >
> > > On 27 Oct 2019, at 08:33, Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com
> > > <mailto:
> > > Jarek.Potiuk@polidea.com>> wrote:
> > >
> > > Hello - any comments on that? I am happy to make it
> > > into an
> > > AIP
> > > :)?
> > >
> > > On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
> > > Jarek.Potiuk@polidea.com
> > > <ma...@polidea.com>>
> > > wrote:
> > >
> > > *Motivation*
> > >
> > > I think we really should start thinking about making
> > > it
> > > easier
> > > to
> > > migrate
> > > to 2.0 for our users. After implementing some recent
> > > changes
> > > related
> > > to
> > > AIP-21-
> > > Changes in import paths
> > > <
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> > >
> > >
> > > I
> > > think I have an idea that might help with it.
> > >
> > > *Proposal*
> > >
> > > We could package some of the new and improved 2.0
> > > operators
> > > (moved
> > > to
> > > "providers" package) and let them be used in Python 3
> > > environment
> > > of
> > > airflow 1.10.x.
> > >
> > > This can be done case-by-case per "cloud provider".
> > > It
> > > should
> > > not
> > > be
> > > obligatory, should be largely driven by each
> > > provider. It's
> > > not
> > > yet
> > > full
> > > AIP-8
> > > Split Hooks/Operators into separate packages
> > > <
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> > <
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> > >
> > > .
> > > It's
> > > merely backporting of some operators/hooks to get it
> > > work
> > > in
> > > 1.10.
> > > But
> > > by
> > > doing it we might try out the concept of splitting,
> > > learn
> > > about
> > > maintenance
> > > problems and maybe implement full *AIP-8 *approach in
> > > 2.1
> > > consistently
> > > across the board.
> > >
> > > *Context*
> > >
> > > Part of the AIP-21 was to move import paths for Cloud
> > > providers
> > > to
> > > separate providers/<PROVIDER> package. An example for
> > > that
> > > (the
> > > first
> > > provider we already almost migrated) was
> > > providers/google
> > > package
> > > (further
> > > divided into gcp/gsuite etc).
> > >
> > > We've done a massive migration of all the
> > > Google-related
> > > operators,
> > > created a few missing ones and retrofitted some old
> > > operators
> > > to
> > > follow
> > > GCP
> > > best practices and fixing a number of problems - also
> > > implementing
> > > Python3
> > > and Pylint compatibility. Some of these
> > > operators/hooks are
> > > not
> > > backwards
> > > compatible. Those that are compatible are still
> > > available
> > > via
> > > the
> > > old
> > > imports with deprecation warning.
> > >
> > > We've added missing tests (including system tests)
> > > and
> > > missing
> > > features -
> > > improving some of the Google operators - giving the
> > > users
> > > more
> > > capabilities
> > > and fixing some issues. Those operators should pretty
> > > much
> > > "just
> > > work"
> > > in
> > > Airflow 1.10.x (any recent version) for Python 3. We
> > > should
> > > be
> > > able
> > > to
> > > release a separate pip-installable package for those
> > > operators
> > > that
> > > users
> > > should be able to install in Airflow 1.10.x.
> > >
> > > Any user will be able to install this separate
> > > package in
> > > their
> > > Airflow
> > > 1.10.x installation and start using those new
> > > "provider"
> > > operators
> > > in
> > > parallel to the old 1.10.x operators. Other providers
> > > ("microsoft",
> > > "amazon") might follow the same approach if they
> > > want. We
> > > could
> > > even
> > > at
> > > some point decide to move some of the core operators
> > > in
> > > similar
> > > fashion
> > > (for example following the structure proposed in the
> > > latest
> > > documentation:
> > > fundamentals / software / etc.
> > >
> > >
> > >
> > > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)
> <
> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)>
> > >
> > > *Pros and cons*
> > >
> > > There are a number of pros:
> > >
> > >  - Users will have an easier migration path if they
> > > are
> > > deeply
> > > vested
> > > into 1.10.* version
> > > - It's possible to migrate in stages for people who
> > > are
> > > also
> > > vested
> > > in
> > > py2: *py2 (1.10) -> py3 (1.10) -> py3 + new
> > > operators
> > > (1.10)
> > > ->
> > > py3
> > > +
> > > 2.0*
> > > - Moving to new operators in py3 + new operators can
> > > be
> > > done
> > > gradually. Old operators will continue to work while
> > > new
> > > can
> > > be
> > > used
> > > more
> > > and more
> > > - People will get incentivised to migrate to python
> > > 3
> > > before
> > > 2.0
> > > is
> > > out (by using new operators)
> > > - Each provider "package" can have independent
> > > release
> > > schedule
> > > -
> > > and
> > > add functionality in already released Airflow
> > > versions.
> > > - We do not take out any functionality from the
> > > users - we
> > > just
> > > add
> > > more options
> > > - The releases can be - similarly as main airflow
> > > releases -
> > > voted
> > > separately by PMC after "stewards" of the package
> > > (per
> > > provider)
> > > perform
> > > round of testing on 1.10.* versions.
> > > - Users will start migrating to new operators
> > > earlier and
> > > have
> > >  smoother switch to 2.0 later
> > >  - The latest improved operators will start
> > >
> > > There are three cons I could think of:
> > >
> > >  - There will be quite a lot of duplication between
> > > old and
> > > new
> > > operators (they will co-exist in 1.10). That might
> > > lead to
> > > confusion
> > > of
> > > users and problems with cooperation between
> > > different
> > > operators/hooks
> > > - Having new operators in 1.10 python 3 might keep
> > > people
> > > from
> > > migrating to 2.0
> > > - It will require some maintenance and separate
> > > release
> > > overhead.
> > >
> > > I already spoke to Composer team @Google and they are
> > > very
> > > positive
> > > about
> > > this. I also spoke to Ash and seems it might also be
> > > OK for
> > > Astronomer
> > > team. We have Google's backing and support, and we
> > > can
> > > provide
> > > maintenance
> > > and support for those packages - being an example for
> > > other
> > > providers
> > > how
> > > they can do it.
> > >
> > > Let me know what you think - and whether I should
> > > make it
> > > into
> > > an
> > > official
> > > AIP maybe?
> > >
> > > J.
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal
> > > Software
> > > Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal
> > > Software
> > > Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal
> > > Software
> > > Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> > >
> > > --
> > >
> > > Tomasz Urbaszek
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Junior
> > Software
> > > Engineer
> > >
> > > M: +48 505 628 493 <+48505628493>
> > > E: tomasz.urbaszek@polidea.com
> > > <tomasz.urbaszeki@polidea.com
> > >
> > >
> > > Unique Tech
> > > Check out our projects!
> > > <https://www.polidea.com/our-work <https://www.polidea.com/our-work>>
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal Software
> > > Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal Software
> > > Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> > Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> > >
> > >
> >
> >
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Re: [PROPOSE] Ease future migration path to 2.0 by provider's operators/hook backporting to 1.10.*

Posted by Jarek Potiuk <Ja...@polidea.com>.
Thanks Ash! It might indeed work. I will take it from there and try to make
a POC PR with airflow.

It's a bit different approach than google-python libraries (they keep all
the libraries as separate sub-packages/mini projects inside the main
project). The approach you propose is far less invasive in terms of
changing structure of the main repo. I like it this way much more. It makes
it much easier to import project in IDE even if it is less modular in
nature.

From what I understand with this structure - if it works - we have two
options:

(1) For Airflow 2.0 we will be able to install Airflow and all
"integrations" in single (apache-airflow == 2.0.0) package and build
separate backporting integration packages for 1.10.* only.
(2) We will split Airflow 2.0 into separate "core" and "integration"
packages as well while preparing packages.

I think (1) is a bit more reasonable for now, until we work full AIP-8
solution (including dependency hell solving). Let me know what you think
(and others as well).

J.

On Mon, Nov 4, 2019 at 9:24 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> https://github.com/ashb/airflow-submodule-test <
> https://github.com/ashb/airflow-submodule-test>
>
> That seems to work in any order things are installed, at least on python
> 3.7. I've had a stressful few days so I may have missed something. Please
> tell me if there's a case I've missed, or if this is not a suitable proxy
> for our situation.
>
> -a
>
> > On 4 Nov 2019, at 20:08, Ash Berlin-Taylor <as...@apache.org> wrote:
> >
> > Pretty hard pass from me in airflow_ext. If it's released by airflow I
> want it to live under airflow.* (Anyone else is free to release packages
> under any namespace they choose)
> >
> > That said I think I've got something that works:
> >
> >
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/__init__.py
> module level code running
> >
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/providers/gcp/__init__.py
> module level code running
> >
> > Let me test it again in a few different cases etc.
> >
> > -a
> >
> > On 4 November 2019 14:00:24 GMT, Jarek Potiuk <Ja...@polidea.com>
> wrote:
> > Hey Ash,
> >
> > Thanks for the offer. I must admin pkgutil and package namespaces are not
> > the best documented part of python.
> >
> > I dug a deep deeper and I found a similar problem -
> > https://github.com/pypa/setuptools/issues/895. <
> https://github.com/pypa/setuptools/issues/895.>  Seems that even if it is
> > not explicitly explained in pkgutil documentation, this comment (assuming
> > it is right) explains everything:
> >
> > *"That's right. All parents of a namespace package must also be namespace
> > packages, as they will necessarily share that parent name space (farm and
> > farm.deps in this example)."*
> >
> > There are few possibilities mentioned in the issue on how this can be
> > "workarounded", but those are by far not perfect solutions. They would
> > require patching already installed airflow's __init__.py to work - to
> > manipulate the search path, Still from my tests I do not know if this
> would
> > be possible at all because of the non-trivial __init__.py we have (and
> use)
> > in the *airflow* package.
> >
> > We have a few PRs now waiting for decision on that one I think, so maybe
> we
> > can simply agree that we should use another package (I really like
> > *"airflow_ext"
> > *:D  and use it from now on? What do you (and others) think.
> >
> > I'd love to start voting on it soon.
> >
> > J.
> >
> >
> >
> > On Thu, Oct 31, 2019 at 5:37 PM Ash Berlin-Taylor <as...@apache.org>
> wrote:
> >
> > Let me run some tests too - I've used them a bit in the past. I thought
> > since we only want to make airflow.providers a namespace package it might
> > work for us.
> >
> > Will report back next week.
> >
> > -ash
> >
> > On 31 October 2019 15:58:22 GMT, Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> > The same repo (so mono-repo approach). All packages would be in
> > "airflow_integrations" directory. It's mainly about moving the
> > operators/hooks/sensor files to different directory structure.
> >
> > It might be done pretty much without changing the current
> > installation/development model:
> >
> > 1) We can add setup.py command to install all the packages in -e mode
> > in
> > the main setup.py (to make it easier to install all deps in one go).
> > 2) We can add dependencies in setup.py extras to install appropriate
> > packages. For example [google] extra will 'require
> > apache-airflow-integrations-providers-google' package - or
> > apache-airflow-providers-google if we decide to skip -integrations from
> > the
> > package name to make it shorter.
> >
> > The only potential drawback I see is a bit more involved setup of the
> > IDE.
> >
> > This way installation method for both dev and prod remains simple.
> >
> > In the future we can have separate release schedule for the packages
> > (AIP-8) but for now we can stick to the same version for
> > 'apache-airflow'
> > and 'apache-airflow-integrations-*' package (+ separate release
> > schedule
> > for backporting needs)
> > Here again the structure of repo (we will likely be able to use native
> > namespaces so I removed some needles __init__.py).
> >
> > |-- airflow
> > |   |- __init__.py|   |- operators -> fundamental operators are here
> > |-- tests -> tests for core airflow are here (optionally we can move
> > them under "airflow")|-- setup.py -> setup.py for the "apache-airflow"
> > package|-- airflow_integrations
> > |   |-providers
> > |   | |-google
> > |   |   |-setup.py -> setup.py for the
> > "apache-airflow-integrations-providers-google" package
> > |   |   |-airflow_integrations
> > |   |     |-providers
> > |   |       |-google
> > |   |         |-__init__.py
> > |   |         | tests -> tests for the
> > "apache-airflow-integrations-providers-google" package|   |
> > |-__init__.py|   |-protocols
> > |     |-setup.py -> setup.py for the
> > "apache-airflow-integrations-protocols" package
> > |     |-airflow_integrations
> > |        |-protocols
> > |          |-__init__.py|          |-tests -> tests for the
> > "apache-airflow-integrations-protocols" package
> >
> >
> > J.
> >
> > On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > So create another package in a different repo? or the same repo with
> > a
> > separate setup.py file that has airflow has dependency?
> >
> >
> >
> >
> > On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk
> > <Ja...@polidea.com>
> > wrote:
> >
> > TL;DR; I did some more testing on how namespaces work. I still
> > believe
> > the
> > only way to use namespaces is to have separate (for example
> > "airflow_integrations") package for all backportable packages.
> >
> > I am not sue if someone used namespaces before, but after reading
> > and
> > trying out , the main blocker seems to be that we have non-trivial
> > code
> > in
> > airflow's "__init__.py"  (including class definitions, imported
> > sub-packages and plugin initialisation).
> >
> > Details are in
> > https://packaging.python.org/guides/packaging-namespace-packages/ <
> https://packaging.python.org/guides/packaging-namespace-packages/>
> > but
> > it's
> > a long one so let me summarize my findings:
> >
> >    - In order to use "airflow.providers" package we would have to
> > declare
> > "airflow" as namespace
> > - It can be done in three different ways:
> >   - omitting __init__.py in this package (native/implicit
> > namespace)
> > - making __init__.py  of the "airflow" package in main
> > airflow (and
> > other packages) must be "*__path__ =
> > __import__('pkgutil').extend_path(__path__, __name__)*"
> > (pkgutil
> > style) or
> > "*__import__('pkg_resources').declare_namespace(__name__)*"
> >       (pkg_resources style)
> >
> > The first is not possible (we already have __init__.py  in
> > "airflow".
> > The second case is not possible because we already have quite a lot
> > in
> > the
> > airflow's "__init__.py" and both pkgutil and pkg_resources style
> > state:
> >
> > "*Every* distribution that uses the namespace package must include
> > an
> > identical *__init__.py*. If any distribution does not, it will
> > cause the
> > namespace logic to fail and the other sub-packages will not be
> > importable.
> > *Any
> > additional code in __init__.py will be inaccessible."*
> >
> > I even tried to add those pkgutil/pkg_resources to airflow and do
> > some
> > experimenting with it - but it does not work. Pip install fails at
> > the
> > plugins_manager as "airflow.plugins" is not accessible (kind of
> > expected),
> > but I am sure there will be other problems as well. :(
> >
> > Basically - we cannot turn "airflow" into namespace because it has
> > some
> > "__init__.py" logic :(.
> >
> > So I think it still holds that if we want to use namespaces, we
> > should
> > use
> > another package. The *"airflow_integrations"* is current candidate,
> > but
> > we
> > can think of some nicer/shorter one: "airflow_ext", "airflow_int",
> > "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt",
> > "airflow_",
> > "ext_airflow", ....  Interestingly "airflow_" is the one suggested
> > by
> > PEP8
> > to avoid conflicts with Python names (which is a different case but
> > kind
> > of
> > close).
> >
> > What do you think?
> >
> > J.
> >
> > On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <ka...@gmail.com>
> > wrote:
> >
> > The namespace feature looks promising and from your tests, it
> > looks
> > like
> > it
> > would work well from Airflow 2.0 and onwards.
> >
> > I will look at it in-depth and see if I have more suggestions or
> > opinion
> > on
> > it
> >
> > On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk
> > <Jarek.Potiuk@polidea.com
> >
> > wrote:
> >
> > TL;DR; We did some testing about namespaces and packaging (and
> > potential
> > backporting options for 1.10.* python3 Airflows) and we think
> > it's
> > best
> > to
> > use namespaces quickly and use different package name
> > "airflow-integrations" for all non-fundamental integrations.
> >
> > Unless we missed some tricks, we cannot use airflow.*
> > sub-packages
> > for
> > the
> > 1.10.* backportable packages. Example:
> >
> >    - "*apache-airflow"* package provides: "airflow.*" (this is
> > what
> > we
> > have
> >    today)
> >    - "*apache-airflow-providers-google*": provides
> >    "airflow.providers.google.*" packages
> >
> > If we install both packages (old apache-airflow 1.10.6  and new
> > apache-airflow-providers-google from 2.0) - it seems that
> > the "airflow.providers.google.*" package cannot be imported.
> > This is
> > a
> > bit
> > of a problem if we would like to backport the operators from
> > Airflow
> > 2.0
> > to
> > Airflow 1.10 in a way that will be forward-compatible We really
> > want
> > users
> > who started using backported operators in 1.10.* do not have to
> > change
> > imports in their DAGs to run them in Airflow 2.0.
> >
> > We discussed it internally in our team and considered several
> > options,
> > but
> > we think the best way will be to go straight to "namespaces" in
> > Airflow
> > 2.0
> > and to have the integrations (as discussed in AIP-21
> > discussion) to
> > be
> > in a
> > separate "*airflow_integrations*" package.  It might be even
> > more
> > towards
> > the AIP-8 implementation and plays together very well in terms
> > of
> > "stewardship" discussed in AIP-21 now. But we will still keep
> > (for
> > now)
> > single release process for all packages for 2.0 (except for the
> > backporting
> > which can be done per-provider before 2.0 release) and provide
> > a
> > foundation
> > for future more complex release cycles in future versions.
> >
> > Herre is the way how the new Airflow 2.0 repository could look
> > like
> > (i
> > only
> > show subset of dirs but they are representative). For those
> > whose
> > email
> > fixed/colorfont will get corrupted here is an image of this
> > structure
> > https://pasteboard.co/IEesTih.png: <https://pasteboard.co/IEesTih.png:>
> >
> > |-- airflow
> > |   |- __init__.py|   |- operators -> fundamental operators are
> > here
> > |-- tests -> tests for core airflow are here (optionally we can
> > move
> > them under "airflow")|-- setup.py -> setup.py for the
> > "apache-airflow"
> > package|-- airflow_integrations
> > |   |-providers
> > |   | |-google
> > |   |   |-setup.py -> setup.py for the
> > "apache-airflow-integrations-providers-google" package
> > |   |   |-airflow_integrations
> > |   |     |-__init__.py
> > |   |     |-providers
> > |   |       |-__init__.py
> > |   |       |-google
> > |   |         |-__init__.py
> > |   |         | tests -> tests for the
> > "apache-airflow-integrations-providers-google" package|   |
> > |-__init__.py|   |-protocols
> > |     |-setup.py -> setup.py for the
> > "apache-airflow-integrations-protocols" package
> > |     |-airflow_integrations
> > |        |-protocols
> > |          |-__init__.py|          |-tests -> tests for the
> > "apache-airflow-integrations-protocols" package
> >
> > There are a number of pros for this solution:
> >
> >    - We could use the standard namespaces feature of python to
> > build
> >    multiple packages:
> >
> > https://packaging.python.org/guides/packaging-namespace-packages/ <
> https://packaging.python.org/guides/packaging-namespace-packages/>
> > - Installation for users will be the same as previously. We
> > could
> > install the needed packages automatically when particular
> > extras
> > are
> > used
> >   (pip install apache-airflow[google] could install both
> > "apache-airflow"
> > and
> >   "apache-airflow-integrations-providers-google")
> >   - We could have custom setup.py installation process for
> > developers
> > that
> > could install all the packages in development ("-e ." mode)
> > in a
> > single
> > operation.
> > - In case of transfer packages we could have nice error
> > messages
> > informing that the other package needs to be installed (for
> > example
> > S3->GCS
> >   operator would import
> > "airflow-integrations.providers.amazon.*"
> > and
> > if
> > it
> >   fails it could raise ("Please install [amazon] extra to use
> > me.")
> > - We could implement numerous optimisations in the way how
> > we run
> > tests
> > in CI (for example run all the "providers" tests only with
> > sqlite,
> > run
> > tests in parallel etc.)
> > - We could implement it gradually - we do not have to have a
> > "big
> > bang"
> > approach - we can implement it in "provider-by-provider" way
> > and
> > test
> > it
> > with one provider (Google) first to make sure that all the
> > mechanisms
> > are
> >   working
> >   - For now we could have the monorepo approach where all the
> > packages
> > will be developed in concert - for now avoiding the
> > dependency
> > problems
> > (but allowing for back-portability to 1.10).
> > - We will have clear boundaries between packages and ability
> > to
> > test
> > for
> > some unwanted/hidden dependencies between packages.
> > - We could switch to (much better) sphinx-apidoc package to
> > continue
> > building single documentation for all of those (sphinx
> > apidoc has
> > support
> >    for namespaces).
> >
> > As we are working on GCP move from contrib to core, we could
> > make all
> > the
> > effort to test it and try it before we merge it to master so
> > that it
> > will
> > be ready for others (and we could help with most of the moves
> > afterwards).
> > It seems complex, but in fact in most cases it will be very
> > simple
> > move
> > between the packages and can be done incrementally so there is
> > little
> > risk
> > in doing this I think.
> >
> > J.
> >
> >
> > On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yr...@gmail.com>
> > wrote:
> >
> > Tomasz and Ash got good points about the overhead of having
> > separate
> > repos.
> > But while we grow bigger and more mature, I would prefer to
> > have
> > what
> > was
> > described in AIP-8. It shouldn't be extremely hard for us to
> > come
> > up
> > with
> > good strategies to handle the overhead. AIP-8 already talked
> > about
> > how
> > it
> > can benefit us. IMO on a high level, having clearly
> > seperation on
> > core
> > vs.
> > hooks/operators would make the project much more scalable and
> > the
> > gains
> > would outweigh the cost we pay.
> >
> > That being said, I'm supportive to this moving towards AIP-8
> > while
> > learning
> > approach, quite a good practise to tackle a big project.
> > Looking
> > forward
> > to
> > read the AIP.
> >
> >
> > Cheers,
> > Kevin Y
> >
> > On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com
> >
> > wrote:
> >
> > We are checking how we can use namespaces in back-portable
> > way
> > and
> > we
> > will
> > have POC soon so that we all will be able to see how it
> > will look
> > like.
> >
> > J.
> >
> > On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <
> > ash@apache.org>
> > wrote:
> >
> > I'll have to read your proposal in detail (sorry, no time
> > right
> > now!),
> > but
> > I'm broadly in favour of this approach, and I think
> > keeping
> > them
> > _in_
> > the
> > same repo is the best plan -- that makes writing and
> > testing
> > cross-cutting
> > changes  easier.
> >
> > -a
> >
> > On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
> > tomasz.urbaszek@polidea.com
> >
> > wrote:
> >
> > I think utilizing namespaces should reduce a lot of
> > problems
> > raised
> > by
> > using separate repos (who will manage it? how to
> > release?
> > where
> > should
> > be
> > the repo?).
> >
> > Bests,
> > Tomek
> >
> > On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com>
> > wrote:
> >
> > Thanks Bas for comments! Let me share my thoughts
> > below.
> >
> > On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> > basharenslak@godatadriven.com>
> > wrote:
> >
> > Hi Jarek, I definitely see a future in creating
> > separate
> > installable
> > packages for various operators/hooks/etc (as in
> > AIP-8).
> > This
> > would
> > IMO
> > strip the “core” Airflow to only what’s needed and
> > result
> > in
> > a
> > small
> > package without a ton of dependencies (and make it
> > more
> > maintainable,
> > shorter tests, etc etc etc). Not exactly sure though
> > what
> > you’re
> > proposing
> > in your e-mail, is it a new AIP for an intermediate
> > step
> > towards
> > AIP-8?
> >
> >
> > It's a new AIP I am proposing.  For now it's only for
> > backporting
> > the
> > new
> > 2.0 import paths to 1.10.* series.
> >
> > It's more of "incremental going in direction of AIP-8
> > and
> > learning
> > some
> > difficulties involved" than implementing AIP-8 fully.
> > We are
> > taking
> > advantage of changes in import paths from AIP-21 which
> > make
> > it
> > possible
> > to
> > have both old and new (optional) operators available
> > in
> > 1.10.*
> > series
> > of
> > Airflow. I think there is a lot more to do for full
> > implementation
> > of
> > AIP-8: decisions how to maintain, install those
> > operator
> > groups
> > separately,
> > stewardship model/organisation for the separate
> > groups, how
> > to
> > manage
> > cross-dependencies, procedures for releasing the
> > packages
> > etc.
> >
> > I think about this new AIP also as a learning effort -
> > we
> > would
> > learn
> > more
> > how separate packaging works and then we can follow up
> > with
> > AIP-8
> > full
> > implementation for "modular" Airflow. Then AIP-8 could
> > be
> > implemented
> > in
> > Airflow 2.1 for example - or 3.0 if we start following
> > semantic
> > versioning
> > - based on those learnings. It's a bit of good example
> > of
> > having
> > cake
> > and
> > eating it too. We can try out modularity in 1.10.*
> > while
> > cutting
> > the
> > scope
> > of 2.0 and not implementing full management/release
> > procedure
> > for
> > AIP-8
> > yet.
> >
> >
> > Thinking about this, I think there are still a few
> > grey
> > areas
> > (which
> > would
> > be good to discuss in a new AIP, or continue on
> > AIP-8):
> >
> >  *   In your email you only speak only about the 3
> > big
> > cloud
> > providers
> > (btw I made a PR for migrating all AWS components ->
> > https://github.com/apache/airflow/pull/6439). <
> https://github.com/apache/airflow/pull/6439).> Is
> > there a
> > plan
> > for
> > splitting other components than Google/AWS/Azure?
> >
> >
> > We could add more groups as part of this new AIP
> > indeed (as
> > an
> > extension to
> > AIP-21 and pre-requisite to AIP-8). We already see how
> > moving/deprecation
> > works for the providers package - it works for
> > GCP/Google
> > rather
> > nicely.
> > But there is nothing to prevent us from extending it
> > to
> > cover
> > other
> > groups
> > of operators/hooks. If you look at the current
> > structure of
> > documentation
> > done by Kamil, we can follow the structure there and
> > move
> > the
> > operators/hooks accordingly (
> >
> >
> >
> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html <
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html>
> > ):
> >
> >      Fundamentals, ASF: Apache Software Foundation,
> > Azure:
> > Microsoft
> > Azure, AWS: Amazon Web Services, GCP: Google Cloud
> > Platform,
> > Service
> > integrations, Software integrations, Protocol
> > integrations.
> >
> > I am happy to include that in the AIP - if others
> > agree
> > it's a
> > good
> > idea.
> > Out of those groups -  I think only Fundamentals
> > should not
> > be
> > back-ported.
> > Others should be rather easy to port (if we decide
> > to). We
> > already
> > have
> > quite a lot of those in the new GCP operators for 2.0.
> > So
> > starting
> > with
> > GCP/Google group is a good idea. Also following with
> > Cloud
> > Providers
> > first
> > is a good thing. For example we have now support from
> > Google
> > Composer
> > team
> > to do this separation for GCP (and we learn from it)
> > and
> > then
> > we
> > can
> > claim
> > the stewardship in our team for releasing the python
> > 3/
> > Airflow
> > 1.10-compatible "airflow-google" packages. Possibly
> > other
> > Cloud
> > Providers/teams might follow this (if they see the
> > value in
> > it)
> > and
> > there
> > could be different stewards for those. And then we can
> > do
> > other
> > groups
> > if
> > we decide to. I think this way we can learn whether
> > AIP-8 is
> > manageable
> > and
> > what real problems we are going to face.
> >
> >  *   Each “plugin” e.g. GCP would be a separate repo,
> > should
> > we
> > create
> > some sort of blueprint for such packages?
> >
> >
> > I think we do not need separate repos (at all) but in
> > this
> > new
> > AIP
> > we
> > can
> > test it before we decide to go for AIP-8. IMHO -
> > monorepo
> > approach
> > will
> > work here rather nicely. We could use python-3 native
> > namespaces
> > <
> >
> > https://packaging.python.org/guides/packaging-namespace-packages/ <
> https://packaging.python.org/guides/packaging-namespace-packages/>>
> > for
> > the
> > sub-packages when we go full AIP-8. For now we could
> > simply
> > package
> > the
> > new
> > operators in separate pip package for Python 3 version
> > 1.10.*
> > series
> > only.
> > We only need to test if it works well with another
> > package
> > providing
> > 'airflow.providers.*' after apache-airflow is
> > installed
> > (providing
> > 'airflow' package). But I think we can make it work. I
> > don't
> > think
> > we
> > really need to split the repos, namespaces will work
> > just
> > fine
> > and
> > has
> > easier management of cross-repository dependencies
> > (but we
> > can
> > learn
> > otherwise). For sure we will not need it for the new
> > proposed
> > AIP
> > of
> > backporting groups to 1.10 and we can defer that
> > decision to
> > AIP-8
> > implementation time.
> >
> >
> > *   In which Airflow version do we start raising
> > deprecation
> > warnings
> > and in which version would we remove the original?
> >
> >
> > I think we should do what we did in GCP case already.
> > Those
> > old
> > "imports"
> > for operators can be made as deprecated in Airflow 2.0
> > (and
> > removed
> > in
> > 2.1
> > or 3.0 if we start following semantic versioning). We
> > can
> > however
> > do
> > it
> > before in 1.10.7 or 1.10.8 if we release those
> > (without
> > removing
> > the
> > old
> > operators yet - just raise deprecation warnings and
> > inform
> > that
> > for
> > python3
> > the new "airflow-google", "airflow-aws" etc. packages
> > can be
> > installed
> > and
> > users can switch to it).
> >
> > J.
> >
> >
> >
> > Cheers,
> > Bas
> >
> > On 27 Oct 2019, at 08:33, Jarek Potiuk <
> > Jarek.Potiuk@polidea.com
> > <mailto:
> > Jarek.Potiuk@polidea.com>> wrote:
> >
> > Hello - any comments on that? I am happy to make it
> > into an
> > AIP
> > :)?
> >
> > On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
> > Jarek.Potiuk@polidea.com
> > <ma...@polidea.com>>
> > wrote:
> >
> > *Motivation*
> >
> > I think we really should start thinking about making
> > it
> > easier
> > to
> > migrate
> > to 2.0 for our users. After implementing some recent
> > changes
> > related
> > to
> > AIP-21-
> > Changes in import paths
> > <
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths
> >
> >
> > I
> > think I have an idea that might help with it.
> >
> > *Proposal*
> >
> > We could package some of the new and improved 2.0
> > operators
> > (moved
> > to
> > "providers" package) and let them be used in Python 3
> > environment
> > of
> > airflow 1.10.x.
> >
> > This can be done case-by-case per "cloud provider".
> > It
> > should
> > not
> > be
> > obligatory, should be largely driven by each
> > provider. It's
> > not
> > yet
> > full
> > AIP-8
> > Split Hooks/Operators into separate packages
> > <
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303
> >
> > .
> > It's
> > merely backporting of some operators/hooks to get it
> > work
> > in
> > 1.10.
> > But
> > by
> > doing it we might try out the concept of splitting,
> > learn
> > about
> > maintenance
> > problems and maybe implement full *AIP-8 *approach in
> > 2.1
> > consistently
> > across the board.
> >
> > *Context*
> >
> > Part of the AIP-21 was to move import paths for Cloud
> > providers
> > to
> > separate providers/<PROVIDER> package. An example for
> > that
> > (the
> > first
> > provider we already almost migrated) was
> > providers/google
> > package
> > (further
> > divided into gcp/gsuite etc).
> >
> > We've done a massive migration of all the
> > Google-related
> > operators,
> > created a few missing ones and retrofitted some old
> > operators
> > to
> > follow
> > GCP
> > best practices and fixing a number of problems - also
> > implementing
> > Python3
> > and Pylint compatibility. Some of these
> > operators/hooks are
> > not
> > backwards
> > compatible. Those that are compatible are still
> > available
> > via
> > the
> > old
> > imports with deprecation warning.
> >
> > We've added missing tests (including system tests)
> > and
> > missing
> > features -
> > improving some of the Google operators - giving the
> > users
> > more
> > capabilities
> > and fixing some issues. Those operators should pretty
> > much
> > "just
> > work"
> > in
> > Airflow 1.10.x (any recent version) for Python 3. We
> > should
> > be
> > able
> > to
> > release a separate pip-installable package for those
> > operators
> > that
> > users
> > should be able to install in Airflow 1.10.x.
> >
> > Any user will be able to install this separate
> > package in
> > their
> > Airflow
> > 1.10.x installation and start using those new
> > "provider"
> > operators
> > in
> > parallel to the old 1.10.x operators. Other providers
> > ("microsoft",
> > "amazon") might follow the same approach if they
> > want. We
> > could
> > even
> > at
> > some point decide to move some of the core operators
> > in
> > similar
> > fashion
> > (for example following the structure proposed in the
> > latest
> > documentation:
> > fundamentals / software / etc.
> >
> >
> >
> > https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html) <
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)>
> >
> > *Pros and cons*
> >
> > There are a number of pros:
> >
> >  - Users will have an easier migration path if they
> > are
> > deeply
> > vested
> > into 1.10.* version
> > - It's possible to migrate in stages for people who
> > are
> > also
> > vested
> > in
> > py2: *py2 (1.10) -> py3 (1.10) -> py3 + new
> > operators
> > (1.10)
> > ->
> > py3
> > +
> > 2.0*
> > - Moving to new operators in py3 + new operators can
> > be
> > done
> > gradually. Old operators will continue to work while
> > new
> > can
> > be
> > used
> > more
> > and more
> > - People will get incentivised to migrate to python
> > 3
> > before
> > 2.0
> > is
> > out (by using new operators)
> > - Each provider "package" can have independent
> > release
> > schedule
> > -
> > and
> > add functionality in already released Airflow
> > versions.
> > - We do not take out any functionality from the
> > users - we
> > just
> > add
> > more options
> > - The releases can be - similarly as main airflow
> > releases -
> > voted
> > separately by PMC after "stewards" of the package
> > (per
> > provider)
> > perform
> > round of testing on 1.10.* versions.
> > - Users will start migrating to new operators
> > earlier and
> > have
> >  smoother switch to 2.0 later
> >  - The latest improved operators will start
> >
> > There are three cons I could think of:
> >
> >  - There will be quite a lot of duplication between
> > old and
> > new
> > operators (they will co-exist in 1.10). That might
> > lead to
> > confusion
> > of
> > users and problems with cooperation between
> > different
> > operators/hooks
> > - Having new operators in 1.10 python 3 might keep
> > people
> > from
> > migrating to 2.0
> > - It will require some maintenance and separate
> > release
> > overhead.
> >
> > I already spoke to Composer team @Google and they are
> > very
> > positive
> > about
> > this. I also spoke to Ash and seems it might also be
> > OK for
> > Astronomer
> > team. We have Google's backing and support, and we
> > can
> > provide
> > maintenance
> > and support for those packages - being an example for
> > other
> > providers
> > how
> > they can do it.
> >
> > Let me know what you think - and whether I should
> > make it
> > into
> > an
> > official
> > AIP maybe?
> >
> > J.
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal
> > Software
> > Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal
> > Software
> > Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal
> > Software
> > Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
> >
> > --
> >
> > Tomasz Urbaszek
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Junior
> Software
> > Engineer
> >
> > M: +48 505 628 493 <+48505628493>
> > E: tomasz.urbaszek@polidea.com
> > <tomasz.urbaszeki@polidea.com
> >
> >
> > Unique Tech
> > Check out our projects!
> > <https://www.polidea.com/our-work <https://www.polidea.com/our-work>>
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software
> > Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software
> > Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/ <https://www.polidea.com/>> |
> Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> >
> >
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [PROPOSE] Ease future migration path to 2.0 by provider's operators/hook backporting to 1.10.*

Posted by Ash Berlin-Taylor <as...@apache.org>.
https://github.com/ashb/airflow-submodule-test <https://github.com/ashb/airflow-submodule-test>

That seems to work in any order things are installed, at least on python 3.7. I've had a stressful few days so I may have missed something. Please tell me if there's a case I've missed, or if this is not a suitable proxy for our situation.

-a 

> On 4 Nov 2019, at 20:08, Ash Berlin-Taylor <as...@apache.org> wrote:
> 
> Pretty hard pass from me in airflow_ext. If it's released by airflow I want it to live under airflow.* (Anyone else is free to release packages under any namespace they choose)
> 
> That said I think I've got something that works:
> 
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/__init__.py module level code running
> /Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/providers/gcp/__init__.py module level code running
> 
> Let me test it again in a few different cases etc.
> 
> -a
> 
> On 4 November 2019 14:00:24 GMT, Jarek Potiuk <Ja...@polidea.com> wrote:
> Hey Ash,
> 
> Thanks for the offer. I must admin pkgutil and package namespaces are not
> the best documented part of python.
> 
> I dug a deep deeper and I found a similar problem -
> https://github.com/pypa/setuptools/issues/895. <https://github.com/pypa/setuptools/issues/895.>  Seems that even if it is
> not explicitly explained in pkgutil documentation, this comment (assuming
> it is right) explains everything:
> 
> *"That's right. All parents of a namespace package must also be namespace
> packages, as they will necessarily share that parent name space (farm and
> farm.deps in this example)."*
> 
> There are few possibilities mentioned in the issue on how this can be
> "workarounded", but those are by far not perfect solutions. They would
> require patching already installed airflow's __init__.py to work - to
> manipulate the search path, Still from my tests I do not know if this would
> be possible at all because of the non-trivial __init__.py we have (and use)
> in the *airflow* package.
> 
> We have a few PRs now waiting for decision on that one I think, so maybe we
> can simply agree that we should use another package (I really like
> *"airflow_ext"
> *:D  and use it from now on? What do you (and others) think.
> 
> I'd love to start voting on it soon.
> 
> J.
> 
> 
> 
> On Thu, Oct 31, 2019 at 5:37 PM Ash Berlin-Taylor <as...@apache.org> wrote:
> 
> Let me run some tests too - I've used them a bit in the past. I thought
> since we only want to make airflow.providers a namespace package it might
> work for us.
> 
> Will report back next week.
> 
> -ash
> 
> On 31 October 2019 15:58:22 GMT, Jarek Potiuk <Ja...@polidea.com>
> wrote:
> The same repo (so mono-repo approach). All packages would be in
> "airflow_integrations" directory. It's mainly about moving the
> operators/hooks/sensor files to different directory structure.
> 
> It might be done pretty much without changing the current
> installation/development model:
> 
> 1) We can add setup.py command to install all the packages in -e mode
> in
> the main setup.py (to make it easier to install all deps in one go).
> 2) We can add dependencies in setup.py extras to install appropriate
> packages. For example [google] extra will 'require
> apache-airflow-integrations-providers-google' package - or
> apache-airflow-providers-google if we decide to skip -integrations from
> the
> package name to make it shorter.
> 
> The only potential drawback I see is a bit more involved setup of the
> IDE.
> 
> This way installation method for both dev and prod remains simple.
> 
> In the future we can have separate release schedule for the packages
> (AIP-8) but for now we can stick to the same version for
> 'apache-airflow'
> and 'apache-airflow-integrations-*' package (+ separate release
> schedule
> for backporting needs)
> Here again the structure of repo (we will likely be able to use native
> namespaces so I removed some needles __init__.py).
> 
> |-- airflow
> |   |- __init__.py|   |- operators -> fundamental operators are here
> |-- tests -> tests for core airflow are here (optionally we can move
> them under "airflow")|-- setup.py -> setup.py for the "apache-airflow"
> package|-- airflow_integrations
> |   |-providers
> |   | |-google
> |   |   |-setup.py -> setup.py for the
> "apache-airflow-integrations-providers-google" package
> |   |   |-airflow_integrations
> |   |     |-providers
> |   |       |-google
> |   |         |-__init__.py
> |   |         | tests -> tests for the
> "apache-airflow-integrations-providers-google" package|   |
> |-__init__.py|   |-protocols
> |     |-setup.py -> setup.py for the
> "apache-airflow-integrations-protocols" package
> |     |-airflow_integrations
> |        |-protocols
> |          |-__init__.py|          |-tests -> tests for the
> "apache-airflow-integrations-protocols" package
> 
> 
> J.
> 
> On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <ka...@gmail.com> wrote:
> 
> So create another package in a different repo? or the same repo with
> a
> separate setup.py file that has airflow has dependency?
> 
> 
> 
> 
> On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk
> <Ja...@polidea.com>
> wrote:
> 
> TL;DR; I did some more testing on how namespaces work. I still
> believe
> the
> only way to use namespaces is to have separate (for example
> "airflow_integrations") package for all backportable packages.
> 
> I am not sue if someone used namespaces before, but after reading
> and
> trying out , the main blocker seems to be that we have non-trivial
> code
> in
> airflow's "__init__.py"  (including class definitions, imported
> sub-packages and plugin initialisation).
> 
> Details are in
> https://packaging.python.org/guides/packaging-namespace-packages/ <https://packaging.python.org/guides/packaging-namespace-packages/>
> but
> it's
> a long one so let me summarize my findings:
> 
>    - In order to use "airflow.providers" package we would have to
> declare
> "airflow" as namespace
> - It can be done in three different ways:
>   - omitting __init__.py in this package (native/implicit
> namespace)
> - making __init__.py  of the "airflow" package in main
> airflow (and
> other packages) must be "*__path__ =
> __import__('pkgutil').extend_path(__path__, __name__)*"
> (pkgutil
> style) or
> "*__import__('pkg_resources').declare_namespace(__name__)*"
>       (pkg_resources style)
> 
> The first is not possible (we already have __init__.py  in
> "airflow".
> The second case is not possible because we already have quite a lot
> in
> the
> airflow's "__init__.py" and both pkgutil and pkg_resources style
> state:
> 
> "*Every* distribution that uses the namespace package must include
> an
> identical *__init__.py*. If any distribution does not, it will
> cause the
> namespace logic to fail and the other sub-packages will not be
> importable.
> *Any
> additional code in __init__.py will be inaccessible."*
> 
> I even tried to add those pkgutil/pkg_resources to airflow and do
> some
> experimenting with it - but it does not work. Pip install fails at
> the
> plugins_manager as "airflow.plugins" is not accessible (kind of
> expected),
> but I am sure there will be other problems as well. :(
> 
> Basically - we cannot turn "airflow" into namespace because it has
> some
> "__init__.py" logic :(.
> 
> So I think it still holds that if we want to use namespaces, we
> should
> use
> another package. The *"airflow_integrations"* is current candidate,
> but
> we
> can think of some nicer/shorter one: "airflow_ext", "airflow_int",
> "airflow_x", "airflow_mod", "airlfow_next", "airflow_xt",
> "airflow_",
> "ext_airflow", ....  Interestingly "airflow_" is the one suggested
> by
> PEP8
> to avoid conflicts with Python names (which is a different case but
> kind
> of
> close).
> 
> What do you think?
> 
> J.
> 
> On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <ka...@gmail.com>
> wrote:
> 
> The namespace feature looks promising and from your tests, it
> looks
> like
> it
> would work well from Airflow 2.0 and onwards.
> 
> I will look at it in-depth and see if I have more suggestions or
> opinion
> on
> it
> 
> On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk
> <Jarek.Potiuk@polidea.com
> 
> wrote:
> 
> TL;DR; We did some testing about namespaces and packaging (and
> potential
> backporting options for 1.10.* python3 Airflows) and we think
> it's
> best
> to
> use namespaces quickly and use different package name
> "airflow-integrations" for all non-fundamental integrations.
> 
> Unless we missed some tricks, we cannot use airflow.*
> sub-packages
> for
> the
> 1.10.* backportable packages. Example:
> 
>    - "*apache-airflow"* package provides: "airflow.*" (this is
> what
> we
> have
>    today)
>    - "*apache-airflow-providers-google*": provides
>    "airflow.providers.google.*" packages
> 
> If we install both packages (old apache-airflow 1.10.6  and new
> apache-airflow-providers-google from 2.0) - it seems that
> the "airflow.providers.google.*" package cannot be imported.
> This is
> a
> bit
> of a problem if we would like to backport the operators from
> Airflow
> 2.0
> to
> Airflow 1.10 in a way that will be forward-compatible We really
> want
> users
> who started using backported operators in 1.10.* do not have to
> change
> imports in their DAGs to run them in Airflow 2.0.
> 
> We discussed it internally in our team and considered several
> options,
> but
> we think the best way will be to go straight to "namespaces" in
> Airflow
> 2.0
> and to have the integrations (as discussed in AIP-21
> discussion) to
> be
> in a
> separate "*airflow_integrations*" package.  It might be even
> more
> towards
> the AIP-8 implementation and plays together very well in terms
> of
> "stewardship" discussed in AIP-21 now. But we will still keep
> (for
> now)
> single release process for all packages for 2.0 (except for the
> backporting
> which can be done per-provider before 2.0 release) and provide
> a
> foundation
> for future more complex release cycles in future versions.
> 
> Herre is the way how the new Airflow 2.0 repository could look
> like
> (i
> only
> show subset of dirs but they are representative). For those
> whose
> email
> fixed/colorfont will get corrupted here is an image of this
> structure
> https://pasteboard.co/IEesTih.png: <https://pasteboard.co/IEesTih.png:>
> 
> |-- airflow
> |   |- __init__.py|   |- operators -> fundamental operators are
> here
> |-- tests -> tests for core airflow are here (optionally we can
> move
> them under "airflow")|-- setup.py -> setup.py for the
> "apache-airflow"
> package|-- airflow_integrations
> |   |-providers
> |   | |-google
> |   |   |-setup.py -> setup.py for the
> "apache-airflow-integrations-providers-google" package
> |   |   |-airflow_integrations
> |   |     |-__init__.py
> |   |     |-providers
> |   |       |-__init__.py
> |   |       |-google
> |   |         |-__init__.py
> |   |         | tests -> tests for the
> "apache-airflow-integrations-providers-google" package|   |
> |-__init__.py|   |-protocols
> |     |-setup.py -> setup.py for the
> "apache-airflow-integrations-protocols" package
> |     |-airflow_integrations
> |        |-protocols
> |          |-__init__.py|          |-tests -> tests for the
> "apache-airflow-integrations-protocols" package
> 
> There are a number of pros for this solution:
> 
>    - We could use the standard namespaces feature of python to
> build
>    multiple packages:
> 
> https://packaging.python.org/guides/packaging-namespace-packages/ <https://packaging.python.org/guides/packaging-namespace-packages/>
> - Installation for users will be the same as previously. We
> could
> install the needed packages automatically when particular
> extras
> are
> used
>   (pip install apache-airflow[google] could install both
> "apache-airflow"
> and
>   "apache-airflow-integrations-providers-google")
>   - We could have custom setup.py installation process for
> developers
> that
> could install all the packages in development ("-e ." mode)
> in a
> single
> operation.
> - In case of transfer packages we could have nice error
> messages
> informing that the other package needs to be installed (for
> example
> S3->GCS
>   operator would import
> "airflow-integrations.providers.amazon.*"
> and
> if
> it
>   fails it could raise ("Please install [amazon] extra to use
> me.")
> - We could implement numerous optimisations in the way how
> we run
> tests
> in CI (for example run all the "providers" tests only with
> sqlite,
> run
> tests in parallel etc.)
> - We could implement it gradually - we do not have to have a
> "big
> bang"
> approach - we can implement it in "provider-by-provider" way
> and
> test
> it
> with one provider (Google) first to make sure that all the
> mechanisms
> are
>   working
>   - For now we could have the monorepo approach where all the
> packages
> will be developed in concert - for now avoiding the
> dependency
> problems
> (but allowing for back-portability to 1.10).
> - We will have clear boundaries between packages and ability
> to
> test
> for
> some unwanted/hidden dependencies between packages.
> - We could switch to (much better) sphinx-apidoc package to
> continue
> building single documentation for all of those (sphinx
> apidoc has
> support
>    for namespaces).
> 
> As we are working on GCP move from contrib to core, we could
> make all
> the
> effort to test it and try it before we merge it to master so
> that it
> will
> be ready for others (and we could help with most of the moves
> afterwards).
> It seems complex, but in fact in most cases it will be very
> simple
> move
> between the packages and can be done incrementally so there is
> little
> risk
> in doing this I think.
> 
> J.
> 
> 
> On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yr...@gmail.com>
> wrote:
> 
> Tomasz and Ash got good points about the overhead of having
> separate
> repos.
> But while we grow bigger and more mature, I would prefer to
> have
> what
> was
> described in AIP-8. It shouldn't be extremely hard for us to
> come
> up
> with
> good strategies to handle the overhead. AIP-8 already talked
> about
> how
> it
> can benefit us. IMO on a high level, having clearly
> seperation on
> core
> vs.
> hooks/operators would make the project much more scalable and
> the
> gains
> would outweigh the cost we pay.
> 
> That being said, I'm supportive to this moving towards AIP-8
> while
> learning
> approach, quite a good practise to tackle a big project.
> Looking
> forward
> to
> read the AIP.
> 
> 
> Cheers,
> Kevin Y
> 
> On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com
> 
> wrote:
> 
> We are checking how we can use namespaces in back-portable
> way
> and
> we
> will
> have POC soon so that we all will be able to see how it
> will look
> like.
> 
> J.
> 
> On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <
> ash@apache.org>
> wrote:
> 
> I'll have to read your proposal in detail (sorry, no time
> right
> now!),
> but
> I'm broadly in favour of this approach, and I think
> keeping
> them
> _in_
> the
> same repo is the best plan -- that makes writing and
> testing
> cross-cutting
> changes  easier.
> 
> -a
> 
> On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
> tomasz.urbaszek@polidea.com
> 
> wrote:
> 
> I think utilizing namespaces should reduce a lot of
> problems
> raised
> by
> using separate repos (who will manage it? how to
> release?
> where
> should
> be
> the repo?).
> 
> Bests,
> Tomek
> 
> On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
> Jarek.Potiuk@polidea.com>
> wrote:
> 
> Thanks Bas for comments! Let me share my thoughts
> below.
> 
> On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
> basharenslak@godatadriven.com>
> wrote:
> 
> Hi Jarek, I definitely see a future in creating
> separate
> installable
> packages for various operators/hooks/etc (as in
> AIP-8).
> This
> would
> IMO
> strip the “core” Airflow to only what’s needed and
> result
> in
> a
> small
> package without a ton of dependencies (and make it
> more
> maintainable,
> shorter tests, etc etc etc). Not exactly sure though
> what
> you’re
> proposing
> in your e-mail, is it a new AIP for an intermediate
> step
> towards
> AIP-8?
> 
> 
> It's a new AIP I am proposing.  For now it's only for
> backporting
> the
> new
> 2.0 import paths to 1.10.* series.
> 
> It's more of "incremental going in direction of AIP-8
> and
> learning
> some
> difficulties involved" than implementing AIP-8 fully.
> We are
> taking
> advantage of changes in import paths from AIP-21 which
> make
> it
> possible
> to
> have both old and new (optional) operators available
> in
> 1.10.*
> series
> of
> Airflow. I think there is a lot more to do for full
> implementation
> of
> AIP-8: decisions how to maintain, install those
> operator
> groups
> separately,
> stewardship model/organisation for the separate
> groups, how
> to
> manage
> cross-dependencies, procedures for releasing the
> packages
> etc.
> 
> I think about this new AIP also as a learning effort -
> we
> would
> learn
> more
> how separate packaging works and then we can follow up
> with
> AIP-8
> full
> implementation for "modular" Airflow. Then AIP-8 could
> be
> implemented
> in
> Airflow 2.1 for example - or 3.0 if we start following
> semantic
> versioning
> - based on those learnings. It's a bit of good example
> of
> having
> cake
> and
> eating it too. We can try out modularity in 1.10.*
> while
> cutting
> the
> scope
> of 2.0 and not implementing full management/release
> procedure
> for
> AIP-8
> yet.
> 
> 
> Thinking about this, I think there are still a few
> grey
> areas
> (which
> would
> be good to discuss in a new AIP, or continue on
> AIP-8):
> 
>  *   In your email you only speak only about the 3
> big
> cloud
> providers
> (btw I made a PR for migrating all AWS components ->
> https://github.com/apache/airflow/pull/6439). <https://github.com/apache/airflow/pull/6439).> Is
> there a
> plan
> for
> splitting other components than Google/AWS/Azure?
> 
> 
> We could add more groups as part of this new AIP
> indeed (as
> an
> extension to
> AIP-21 and pre-requisite to AIP-8). We already see how
> moving/deprecation
> works for the providers package - it works for
> GCP/Google
> rather
> nicely.
> But there is nothing to prevent us from extending it
> to
> cover
> other
> groups
> of operators/hooks. If you look at the current
> structure of
> documentation
> done by Kamil, we can follow the structure there and
> move
> the
> operators/hooks accordingly (
> 
> 
> 
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html <https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html>
> ):
> 
>      Fundamentals, ASF: Apache Software Foundation,
> Azure:
> Microsoft
> Azure, AWS: Amazon Web Services, GCP: Google Cloud
> Platform,
> Service
> integrations, Software integrations, Protocol
> integrations.
> 
> I am happy to include that in the AIP - if others
> agree
> it's a
> good
> idea.
> Out of those groups -  I think only Fundamentals
> should not
> be
> back-ported.
> Others should be rather easy to port (if we decide
> to). We
> already
> have
> quite a lot of those in the new GCP operators for 2.0.
> So
> starting
> with
> GCP/Google group is a good idea. Also following with
> Cloud
> Providers
> first
> is a good thing. For example we have now support from
> Google
> Composer
> team
> to do this separation for GCP (and we learn from it)
> and
> then
> we
> can
> claim
> the stewardship in our team for releasing the python
> 3/
> Airflow
> 1.10-compatible "airflow-google" packages. Possibly
> other
> Cloud
> Providers/teams might follow this (if they see the
> value in
> it)
> and
> there
> could be different stewards for those. And then we can
> do
> other
> groups
> if
> we decide to. I think this way we can learn whether
> AIP-8 is
> manageable
> and
> what real problems we are going to face.
> 
>  *   Each “plugin” e.g. GCP would be a separate repo,
> should
> we
> create
> some sort of blueprint for such packages?
> 
> 
> I think we do not need separate repos (at all) but in
> this
> new
> AIP
> we
> can
> test it before we decide to go for AIP-8. IMHO -
> monorepo
> approach
> will
> work here rather nicely. We could use python-3 native
> namespaces
> <
> 
> https://packaging.python.org/guides/packaging-namespace-packages/ <https://packaging.python.org/guides/packaging-namespace-packages/>>
> for
> the
> sub-packages when we go full AIP-8. For now we could
> simply
> package
> the
> new
> operators in separate pip package for Python 3 version
> 1.10.*
> series
> only.
> We only need to test if it works well with another
> package
> providing
> 'airflow.providers.*' after apache-airflow is
> installed
> (providing
> 'airflow' package). But I think we can make it work. I
> don't
> think
> we
> really need to split the repos, namespaces will work
> just
> fine
> and
> has
> easier management of cross-repository dependencies
> (but we
> can
> learn
> otherwise). For sure we will not need it for the new
> proposed
> AIP
> of
> backporting groups to 1.10 and we can defer that
> decision to
> AIP-8
> implementation time.
> 
> 
> *   In which Airflow version do we start raising
> deprecation
> warnings
> and in which version would we remove the original?
> 
> 
> I think we should do what we did in GCP case already.
> Those
> old
> "imports"
> for operators can be made as deprecated in Airflow 2.0
> (and
> removed
> in
> 2.1
> or 3.0 if we start following semantic versioning). We
> can
> however
> do
> it
> before in 1.10.7 or 1.10.8 if we release those
> (without
> removing
> the
> old
> operators yet - just raise deprecation warnings and
> inform
> that
> for
> python3
> the new "airflow-google", "airflow-aws" etc. packages
> can be
> installed
> and
> users can switch to it).
> 
> J.
> 
> 
> 
> Cheers,
> Bas
> 
> On 27 Oct 2019, at 08:33, Jarek Potiuk <
> Jarek.Potiuk@polidea.com
> <mailto:
> Jarek.Potiuk@polidea.com>> wrote:
> 
> Hello - any comments on that? I am happy to make it
> into an
> AIP
> :)?
> 
> On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
> Jarek.Potiuk@polidea.com
> <ma...@polidea.com>>
> wrote:
> 
> *Motivation*
> 
> I think we really should start thinking about making
> it
> easier
> to
> migrate
> to 2.0 for our users. After implementing some recent
> changes
> related
> to
> AIP-21-
> Changes in import paths
> <
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths>
> 
> I
> think I have an idea that might help with it.
> 
> *Proposal*
> 
> We could package some of the new and improved 2.0
> operators
> (moved
> to
> "providers" package) and let them be used in Python 3
> environment
> of
> airflow 1.10.x.
> 
> This can be done case-by-case per "cloud provider".
> It
> should
> not
> be
> obligatory, should be largely driven by each
> provider. It's
> not
> yet
> full
> AIP-8
> Split Hooks/Operators into separate packages
> <
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303 <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303>
> .
> It's
> merely backporting of some operators/hooks to get it
> work
> in
> 1.10.
> But
> by
> doing it we might try out the concept of splitting,
> learn
> about
> maintenance
> problems and maybe implement full *AIP-8 *approach in
> 2.1
> consistently
> across the board.
> 
> *Context*
> 
> Part of the AIP-21 was to move import paths for Cloud
> providers
> to
> separate providers/<PROVIDER> package. An example for
> that
> (the
> first
> provider we already almost migrated) was
> providers/google
> package
> (further
> divided into gcp/gsuite etc).
> 
> We've done a massive migration of all the
> Google-related
> operators,
> created a few missing ones and retrofitted some old
> operators
> to
> follow
> GCP
> best practices and fixing a number of problems - also
> implementing
> Python3
> and Pylint compatibility. Some of these
> operators/hooks are
> not
> backwards
> compatible. Those that are compatible are still
> available
> via
> the
> old
> imports with deprecation warning.
> 
> We've added missing tests (including system tests)
> and
> missing
> features -
> improving some of the Google operators - giving the
> users
> more
> capabilities
> and fixing some issues. Those operators should pretty
> much
> "just
> work"
> in
> Airflow 1.10.x (any recent version) for Python 3. We
> should
> be
> able
> to
> release a separate pip-installable package for those
> operators
> that
> users
> should be able to install in Airflow 1.10.x.
> 
> Any user will be able to install this separate
> package in
> their
> Airflow
> 1.10.x installation and start using those new
> "provider"
> operators
> in
> parallel to the old 1.10.x operators. Other providers
> ("microsoft",
> "amazon") might follow the same approach if they
> want. We
> could
> even
> at
> some point decide to move some of the core operators
> in
> similar
> fashion
> (for example following the structure proposed in the
> latest
> documentation:
> fundamentals / software / etc.
> 
> 
> 
> https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html) <https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)>
> 
> *Pros and cons*
> 
> There are a number of pros:
> 
>  - Users will have an easier migration path if they
> are
> deeply
> vested
> into 1.10.* version
> - It's possible to migrate in stages for people who
> are
> also
> vested
> in
> py2: *py2 (1.10) -> py3 (1.10) -> py3 + new
> operators
> (1.10)
> ->
> py3
> +
> 2.0*
> - Moving to new operators in py3 + new operators can
> be
> done
> gradually. Old operators will continue to work while
> new
> can
> be
> used
> more
> and more
> - People will get incentivised to migrate to python
> 3
> before
> 2.0
> is
> out (by using new operators)
> - Each provider "package" can have independent
> release
> schedule
> -
> and
> add functionality in already released Airflow
> versions.
> - We do not take out any functionality from the
> users - we
> just
> add
> more options
> - The releases can be - similarly as main airflow
> releases -
> voted
> separately by PMC after "stewards" of the package
> (per
> provider)
> perform
> round of testing on 1.10.* versions.
> - Users will start migrating to new operators
> earlier and
> have
>  smoother switch to 2.0 later
>  - The latest improved operators will start
> 
> There are three cons I could think of:
> 
>  - There will be quite a lot of duplication between
> old and
> new
> operators (they will co-exist in 1.10). That might
> lead to
> confusion
> of
> users and problems with cooperation between
> different
> operators/hooks
> - Having new operators in 1.10 python 3 might keep
> people
> from
> migrating to 2.0
> - It will require some maintenance and separate
> release
> overhead.
> 
> I already spoke to Composer team @Google and they are
> very
> positive
> about
> this. I also spoke to Ash and seems it might also be
> OK for
> Astronomer
> team. We have Google's backing and support, and we
> can
> provide
> maintenance
> and support for those packages - being an example for
> other
> providers
> how
> they can do it.
> 
> Let me know what you think - and whether I should
> make it
> into
> an
> official
> AIP maybe?
> 
> J.
> 
> 
> 
> --
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal
> Software
> Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> 
> 
> 
> --
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal
> Software
> Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> 
> 
> 
> --
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal
> Software
> Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> 
> 
> 
> --
> 
> Tomasz Urbaszek
> Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Junior Software
> Engineer
> 
> M: +48 505 628 493 <+48505628493>
> E: tomasz.urbaszek@polidea.com
> <tomasz.urbaszeki@polidea.com
> 
> 
> Unique Tech
> Check out our projects!
> <https://www.polidea.com/our-work <https://www.polidea.com/our-work>>
> 
> 
> 
> --
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal Software
> Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> 
> 
> 
> 
> --
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal Software
> Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> 
> 
> 
> 
> --
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal Software Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> 
> 
> 
> 
> --
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal Software Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>
> 
> 


Re: [PROPOSE] Ease future migration path to 2.0 by provider's operators/hook backporting to 1.10.*

Posted by Ash Berlin-Taylor <as...@apache.org>.
Pretty hard pass from me in airflow_ext. If it's released by airflow I want it to live under airflow.* (Anyone else is free to release packages under any namespace they choose)

That said I think I've got something that works:

/Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/__init__.py module level code running
/Users/ash/.virtualenvs/test-providers/lib/python3.7/site-packages/notairflow/providers/gcp/__init__.py module level code running

Let me test it again in a few different cases etc.

-a

On 4 November 2019 14:00:24 GMT, Jarek Potiuk <Ja...@polidea.com> wrote:
Hey Ash,

Thanks for the offer. I must admin pkgutil and package namespaces are not
the best documented part of python.

I dug a deep deeper and I found a similar problem -
https://github.com/pypa/setuptools/issues/895. <https://github.com/pypa/setuptools/issues/895.>  Seems that even if it is
not explicitly explained in pkgutil documentation, this comment (assuming
it is right) explains everything:

*"That's right. All parents of a namespace package must also be namespace
packages, as they will necessarily share that parent name space (farm and
farm.deps in this example)."*

There are few possibilities mentioned in the issue on how this can be
"workarounded", but those are by far not perfect solutions. They would
require patching already installed airflow's __init__.py to work - to
manipulate the search path, Still from my tests I do not know if this would
be possible at all because of the non-trivial __init__.py we have (and use)
in the *airflow* package.

We have a few PRs now waiting for decision on that one I think, so maybe we
can simply agree that we should use another package (I really like
*"airflow_ext"
*:D  and use it from now on? What do you (and others) think.

I'd love to start voting on it soon.

J.



On Thu, Oct 31, 2019 at 5:37 PM Ash Berlin-Taylor <as...@apache.org> wrote:

 Let me run some tests too - I've used them a bit in the past. I thought
 since we only want to make airflow.providers a namespace package it might
 work for us.

 Will report back next week.

 -ash

 On 31 October 2019 15:58:22 GMT, Jarek Potiuk <Ja...@polidea.com>
 wrote:
The same repo (so mono-repo approach). All packages would be in
"airflow_integrations" directory. It's mainly about moving the
operators/hooks/sensor files to different directory structure.

It might be done pretty much without changing the current
installation/development model:

1) We can add setup.py command to install all the packages in -e mode
in
the main setup.py (to make it easier to install all deps in one go).
2) We can add dependencies in setup.py extras to install appropriate
packages. For example [google] extra will 'require
apache-airflow-integrations-providers-google' package - or
apache-airflow-providers-google if we decide to skip -integrations from
the
package name to make it shorter.

The only potential drawback I see is a bit more involved setup of the
IDE.

This way installation method for both dev and prod remains simple.

In the future we can have separate release schedule for the packages
(AIP-8) but for now we can stick to the same version for
'apache-airflow'
and 'apache-airflow-integrations-*' package (+ separate release
schedule
for backporting needs)
Here again the structure of repo (we will likely be able to use native
namespaces so I removed some needles __init__.py).

|-- airflow
|   |- __init__.py|   |- operators -> fundamental operators are here
|-- tests -> tests for core airflow are here (optionally we can move
them under "airflow")|-- setup.py -> setup.py for the "apache-airflow"
package|-- airflow_integrations
|   |-providers
|   | |-google
|   |   |-setup.py -> setup.py for the
"apache-airflow-integrations-providers-google" package
|   |   |-airflow_integrations
|   |     |-providers
|   |       |-google
|   |         |-__init__.py
|   |         | tests -> tests for the
"apache-airflow-integrations-providers-google" package|   |
|-__init__.py|   |-protocols
|     |-setup.py -> setup.py for the
"apache-airflow-integrations-protocols" package
|     |-airflow_integrations
|        |-protocols
|          |-__init__.py|          |-tests -> tests for the
"apache-airflow-integrations-protocols" package


J.

On Thu, Oct 31, 2019 at 3:38 PM Kaxil Naik <ka...@gmail.com> wrote:

So create another package in a different repo? or the same repo with
a
 separate setup.py file that has airflow has dependency?




 On Thu, Oct 31, 2019 at 2:32 PM Jarek Potiuk
<Ja...@polidea.com>
 wrote:

TL;DR; I did some more testing on how namespaces work. I still
believe
the
 only way to use namespaces is to have separate (for example
 "airflow_integrations") package for all backportable packages.

 I am not sue if someone used namespaces before, but after reading
and
trying out , the main blocker seems to be that we have non-trivial
code
in
 airflow's "__init__.py"  (including class definitions, imported
 sub-packages and plugin initialisation).

 Details are in
 https://packaging.python.org/guides/packaging-namespace-packages/ <https://packaging.python.org/guides/packaging-namespace-packages/>
but
it's
 a long one so let me summarize my findings:

    - In order to use "airflow.providers" package we would have to
declare
"airflow" as namespace
- It can be done in three different ways:
   - omitting __init__.py in this package (native/implicit
namespace)
- making __init__.py  of the "airflow" package in main
airflow (and
other packages) must be "*__path__ =
__import__('pkgutil').extend_path(__path__, __name__)*"
(pkgutil
style) or
"*__import__('pkg_resources').declare_namespace(__name__)*"
       (pkg_resources style)

 The first is not possible (we already have __init__.py  in
"airflow".
The second case is not possible because we already have quite a lot
in
the
airflow's "__init__.py" and both pkgutil and pkg_resources style
state:

 "*Every* distribution that uses the namespace package must include
an
identical *__init__.py*. If any distribution does not, it will
cause the
namespace logic to fail and the other sub-packages will not be
importable.
 *Any
 additional code in __init__.py will be inaccessible."*

 I even tried to add those pkgutil/pkg_resources to airflow and do
some
experimenting with it - but it does not work. Pip install fails at
the
plugins_manager as "airflow.plugins" is not accessible (kind of
expected),
 but I am sure there will be other problems as well. :(

 Basically - we cannot turn "airflow" into namespace because it has
some
 "__init__.py" logic :(.

 So I think it still holds that if we want to use namespaces, we
should
use
another package. The *"airflow_integrations"* is current candidate,
but
we
can think of some nicer/shorter one: "airflow_ext", "airflow_int",
"airflow_x", "airflow_mod", "airlfow_next", "airflow_xt",
"airflow_",
"ext_airflow", ....  Interestingly "airflow_" is the one suggested
by
PEP8
to avoid conflicts with Python names (which is a different case but
kind
of
 close).

 What do you think?

 J.

 On Tue, Oct 29, 2019 at 4:51 PM Kaxil Naik <ka...@gmail.com>
wrote:

The namespace feature looks promising and from your tests, it
looks
like
it
 would work well from Airflow 2.0 and onwards.

 I will look at it in-depth and see if I have more suggestions or
opinion
on
 it

 On Tue, Oct 29, 2019 at 3:32 PM Jarek Potiuk
<Jarek.Potiuk@polidea.com

 wrote:

TL;DR; We did some testing about namespaces and packaging (and
potential
backporting options for 1.10.* python3 Airflows) and we think
it's
best
to
 use namespaces quickly and use different package name
 "airflow-integrations" for all non-fundamental integrations.

 Unless we missed some tricks, we cannot use airflow.*
sub-packages
for
the
 1.10.* backportable packages. Example:

    - "*apache-airflow"* package provides: "airflow.*" (this is
what
we
have
    today)
    - "*apache-airflow-providers-google*": provides
    "airflow.providers.google.*" packages

 If we install both packages (old apache-airflow 1.10.6  and new
 apache-airflow-providers-google from 2.0) - it seems that
 the "airflow.providers.google.*" package cannot be imported.
This is
a
bit
of a problem if we would like to backport the operators from
Airflow
2.0
to
Airflow 1.10 in a way that will be forward-compatible We really
want
users
who started using backported operators in 1.10.* do not have to
change
 imports in their DAGs to run them in Airflow 2.0.

 We discussed it internally in our team and considered several
options,
but
we think the best way will be to go straight to "namespaces" in
Airflow
2.0
and to have the integrations (as discussed in AIP-21
discussion) to
be
in a
separate "*airflow_integrations*" package.  It might be even
more
towards
the AIP-8 implementation and plays together very well in terms
of
"stewardship" discussed in AIP-21 now. But we will still keep
(for
now)
single release process for all packages for 2.0 (except for the
backporting
which can be done per-provider before 2.0 release) and provide
a
foundation
 for future more complex release cycles in future versions.

 Herre is the way how the new Airflow 2.0 repository could look
like
(i
only
show subset of dirs but they are representative). For those
whose
email
fixed/colorfont will get corrupted here is an image of this
structure
 https://pasteboard.co/IEesTih.png: <https://pasteboard.co/IEesTih.png:>

 |-- airflow
 |   |- __init__.py|   |- operators -> fundamental operators are
here
|-- tests -> tests for core airflow are here (optionally we can
move
them under "airflow")|-- setup.py -> setup.py for the
"apache-airflow"
 package|-- airflow_integrations
 |   |-providers
 |   | |-google
 |   |   |-setup.py -> setup.py for the
 "apache-airflow-integrations-providers-google" package
 |   |   |-airflow_integrations
 |   |     |-__init__.py
 |   |     |-providers
 |   |       |-__init__.py
 |   |       |-google
 |   |         |-__init__.py
 |   |         | tests -> tests for the
 "apache-airflow-integrations-providers-google" package|   |
 |-__init__.py|   |-protocols
 |     |-setup.py -> setup.py for the
 "apache-airflow-integrations-protocols" package
 |     |-airflow_integrations
 |        |-protocols
 |          |-__init__.py|          |-tests -> tests for the
 "apache-airflow-integrations-protocols" package

 There are a number of pros for this solution:

    - We could use the standard namespaces feature of python to
build
    multiple packages:

https://packaging.python.org/guides/packaging-namespace-packages/ <https://packaging.python.org/guides/packaging-namespace-packages/>
- Installation for users will be the same as previously. We
could
install the needed packages automatically when particular
extras
are
used
   (pip install apache-airflow[google] could install both
"apache-airflow"
and
   "apache-airflow-integrations-providers-google")
   - We could have custom setup.py installation process for
developers
that
could install all the packages in development ("-e ." mode)
in a
single
operation.
- In case of transfer packages we could have nice error
messages
informing that the other package needs to be installed (for
example
S3->GCS
   operator would import
"airflow-integrations.providers.amazon.*"
and
if
it
   fails it could raise ("Please install [amazon] extra to use
me.")
- We could implement numerous optimisations in the way how
we run
tests
in CI (for example run all the "providers" tests only with
sqlite,
run
tests in parallel etc.)
- We could implement it gradually - we do not have to have a
"big
bang"
approach - we can implement it in "provider-by-provider" way
and
test
it
with one provider (Google) first to make sure that all the
mechanisms
are
   working
   - For now we could have the monorepo approach where all the
packages
will be developed in concert - for now avoiding the
dependency
problems
(but allowing for back-portability to 1.10).
- We will have clear boundaries between packages and ability
to
test
for
some unwanted/hidden dependencies between packages.
- We could switch to (much better) sphinx-apidoc package to
continue
building single documentation for all of those (sphinx
apidoc has
 support
    for namespaces).

 As we are working on GCP move from contrib to core, we could
make all
the
effort to test it and try it before we merge it to master so
that it
will
be ready for others (and we could help with most of the moves
afterwards).
It seems complex, but in fact in most cases it will be very
simple
move
between the packages and can be done incrementally so there is
little
risk
 in doing this I think.

 J.


 On Mon, Oct 28, 2019 at 11:45 PM Kevin Yang <yr...@gmail.com>
wrote:

Tomasz and Ash got good points about the overhead of having
separate
repos.
But while we grow bigger and more mature, I would prefer to
have
what
was
described in AIP-8. It shouldn't be extremely hard for us to
come
up
with
good strategies to handle the overhead. AIP-8 already talked
about
how
it
can benefit us. IMO on a high level, having clearly
seperation on
core
vs.
hooks/operators would make the project much more scalable and
the
gains
 would outweigh the cost we pay.

 That being said, I'm supportive to this moving towards AIP-8
while
learning
approach, quite a good practise to tackle a big project.
Looking
forward
to
 read the AIP.


 Cheers,
 Kevin Y

 On Mon, Oct 28, 2019 at 6:21 AM Jarek Potiuk <
Jarek.Potiuk@polidea.com

 wrote:

We are checking how we can use namespaces in back-portable
way
and
we
will
have POC soon so that we all will be able to see how it
will look
like.

 J.

 On Mon, Oct 28, 2019 at 1:24 PM Ash Berlin-Taylor <
ash@apache.org>
wrote:

I'll have to read your proposal in detail (sorry, no time
right
now!),
but
I'm broadly in favour of this approach, and I think
keeping
them
_in_
the
same repo is the best plan -- that makes writing and
testing
cross-cutting
 changes  easier.

 -a

On 28 Oct 2019, at 12:14, Tomasz Urbaszek <
tomasz.urbaszek@polidea.com

 wrote:

 I think utilizing namespaces should reduce a lot of
problems
raised
by
using separate repos (who will manage it? how to
release?
where
should
be
 the repo?).

 Bests,
 Tomek

 On Sun, Oct 27, 2019 at 11:54 AM Jarek Potiuk <
Jarek.Potiuk@polidea.com>
 wrote:

Thanks Bas for comments! Let me share my thoughts
below.

 On Sun, Oct 27, 2019 at 9:23 AM Bas Harenslak <
 basharenslak@godatadriven.com>
 wrote:

Hi Jarek, I definitely see a future in creating
separate
installable
packages for various operators/hooks/etc (as in
AIP-8).
This
would
IMO
strip the “core” Airflow to only what’s needed and
result
in
a
small
package without a ton of dependencies (and make it
more
maintainable,
shorter tests, etc etc etc). Not exactly sure though
what
you’re
proposing
in your e-mail, is it a new AIP for an intermediate
step
towards
AIP-8?


 It's a new AIP I am proposing.  For now it's only for
backporting
the
new
 2.0 import paths to 1.10.* series.

 It's more of "incremental going in direction of AIP-8
and
learning
some
difficulties involved" than implementing AIP-8 fully.
We are
taking
advantage of changes in import paths from AIP-21 which
make
it
possible
to
have both old and new (optional) operators available
in
1.10.*
series
of
Airflow. I think there is a lot more to do for full
implementation
of
AIP-8: decisions how to maintain, install those
operator
groups
separately,
stewardship model/organisation for the separate
groups, how
to
manage
cross-dependencies, procedures for releasing the
packages
etc.

 I think about this new AIP also as a learning effort -
we
would
learn
more
how separate packaging works and then we can follow up
with
AIP-8
full
implementation for "modular" Airflow. Then AIP-8 could
be
implemented
in
Airflow 2.1 for example - or 3.0 if we start following
semantic
versioning
- based on those learnings. It's a bit of good example
of
having
cake
and
eating it too. We can try out modularity in 1.10.*
while
cutting
the
scope
of 2.0 and not implementing full management/release
procedure
for
AIP-8
 yet.


Thinking about this, I think there are still a few
grey
areas
(which
would
be good to discuss in a new AIP, or continue on
AIP-8):

  *   In your email you only speak only about the 3
big
cloud
providers
(btw I made a PR for migrating all AWS components ->
https://github.com/apache/airflow/pull/6439). <https://github.com/apache/airflow/pull/6439).> Is
there a
plan
for
 splitting other components than Google/AWS/Azure?


 We could add more groups as part of this new AIP
indeed (as
an
extension to
AIP-21 and pre-requisite to AIP-8). We already see how
moving/deprecation
works for the providers package - it works for
GCP/Google
rather
nicely.
But there is nothing to prevent us from extending it
to
cover
other
groups
of operators/hooks. If you look at the current
structure of
documentation
done by Kamil, we can follow the structure there and
move
the
 operators/hooks accordingly (



https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html <https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html>
):

      Fundamentals, ASF: Apache Software Foundation,
Azure:
Microsoft
Azure, AWS: Amazon Web Services, GCP: Google Cloud
Platform,
Service
integrations, Software integrations, Protocol
integrations.

 I am happy to include that in the AIP - if others
agree
it's a
good
idea.
Out of those groups -  I think only Fundamentals
should not
be
back-ported.
Others should be rather easy to port (if we decide
to). We
already
have
quite a lot of those in the new GCP operators for 2.0.
So
starting
with
GCP/Google group is a good idea. Also following with
Cloud
Providers
first
is a good thing. For example we have now support from
Google
Composer
team
to do this separation for GCP (and we learn from it)
and
then
we
can
claim
the stewardship in our team for releasing the python
3/
Airflow
1.10-compatible "airflow-google" packages. Possibly
other
Cloud
Providers/teams might follow this (if they see the
value in
it)
and
there
could be different stewards for those. And then we can
do
other
groups
if
we decide to. I think this way we can learn whether
AIP-8 is
manageable
and
 what real problems we are going to face.

  *   Each “plugin” e.g. GCP would be a separate repo,
should
we
create
 some sort of blueprint for such packages?


 I think we do not need separate repos (at all) but in
this
new
AIP
we
can
test it before we decide to go for AIP-8. IMHO -
monorepo
approach
will
work here rather nicely. We could use python-3 native
namespaces
<

https://packaging.python.org/guides/packaging-namespace-packages/ <https://packaging.python.org/guides/packaging-namespace-packages/>>
for
the
sub-packages when we go full AIP-8. For now we could
simply
package
the
new
operators in separate pip package for Python 3 version
1.10.*
series
only.
We only need to test if it works well with another
package
providing
'airflow.providers.*' after apache-airflow is
installed
(providing
'airflow' package). But I think we can make it work. I
don't
think
we
really need to split the repos, namespaces will work
just
fine
and
has
easier management of cross-repository dependencies
(but we
can
learn
otherwise). For sure we will not need it for the new
proposed
AIP
of
backporting groups to 1.10 and we can defer that
decision to
AIP-8
 implementation time.


*   In which Airflow version do we start raising
deprecation
warnings
 and in which version would we remove the original?


 I think we should do what we did in GCP case already.
Those
old
"imports"
for operators can be made as deprecated in Airflow 2.0
(and
removed
in
2.1
or 3.0 if we start following semantic versioning). We
can
however
do
it
before in 1.10.7 or 1.10.8 if we release those
(without
removing
the
old
operators yet - just raise deprecation warnings and
inform
that
for
python3
the new "airflow-google", "airflow-aws" etc. packages
can be
installed
and
 users can switch to it).

 J.



 Cheers,
 Bas

 On 27 Oct 2019, at 08:33, Jarek Potiuk <
Jarek.Potiuk@polidea.com
<mailto:
 Jarek.Potiuk@polidea.com>> wrote:

 Hello - any comments on that? I am happy to make it
into an
AIP
:)?

 On Sun, Oct 13, 2019 at 5:53 PM Jarek Potiuk <
Jarek.Potiuk@polidea.com
 <ma...@polidea.com>>
 wrote:

 *Motivation*

 I think we really should start thinking about making
it
easier
to
migrate
to 2.0 for our users. After implementing some recent
changes
related
to
 AIP-21-
 Changes in import paths
 <










https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths>

 I
 think I have an idea that might help with it.

 *Proposal*

 We could package some of the new and improved 2.0
operators
(moved
to
"providers" package) and let them be used in Python 3
environment
of
 airflow 1.10.x.

 This can be done case-by-case per "cloud provider".
It
should
not
be
obligatory, should be largely driven by each
provider. It's
not
yet
full
 AIP-8
 Split Hooks/Operators into separate packages
 <










https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303 <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100827303>
.
It's
merely backporting of some operators/hooks to get it
work
in
1.10.
But
by
doing it we might try out the concept of splitting,
learn
about
maintenance
problems and maybe implement full *AIP-8 *approach in
2.1
consistently
 across the board.

 *Context*

 Part of the AIP-21 was to move import paths for Cloud
providers
to
separate providers/<PROVIDER> package. An example for
that
(the
first
provider we already almost migrated) was
providers/google
package
(further
 divided into gcp/gsuite etc).

 We've done a massive migration of all the
Google-related
operators,
created a few missing ones and retrofitted some old
operators
to
follow
GCP
best practices and fixing a number of problems - also
implementing
Python3
and Pylint compatibility. Some of these
operators/hooks are
not
backwards
compatible. Those that are compatible are still
available
via
the
old
 imports with deprecation warning.

 We've added missing tests (including system tests)
and
missing
features -
improving some of the Google operators - giving the
users
more
capabilities
and fixing some issues. Those operators should pretty
much
"just
work"
in
Airflow 1.10.x (any recent version) for Python 3. We
should
be
able
to
release a separate pip-installable package for those
operators
that
users
 should be able to install in Airflow 1.10.x.

 Any user will be able to install this separate
package in
their
Airflow
1.10.x installation and start using those new
"provider"
operators
in
parallel to the old 1.10.x operators. Other providers
("microsoft",
"amazon") might follow the same approach if they
want. We
could
even
at
some point decide to move some of the core operators
in
similar
fashion
(for example following the structure proposed in the
latest
documentation:
 fundamentals / software / etc.



https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html) <https://airflow.readthedocs.io/en/latest/operators-and-hooks-ref.html)>

 *Pros and cons*

 There are a number of pros:

  - Users will have an easier migration path if they
are
deeply
vested
into 1.10.* version
- It's possible to migrate in stages for people who
are
also
vested
in
py2: *py2 (1.10) -> py3 (1.10) -> py3 + new
operators
(1.10)
->
py3
+
2.0*
- Moving to new operators in py3 + new operators can
be
done
gradually. Old operators will continue to work while
new
can
be
used
more
and more
- People will get incentivised to migrate to python
3
before
2.0
is
out (by using new operators)
- Each provider "package" can have independent
release
schedule
-
and
add functionality in already released Airflow
versions.
- We do not take out any functionality from the
users - we
just
add
more options
- The releases can be - similarly as main airflow
releases -
voted
separately by PMC after "stewards" of the package
(per
provider)
perform
round of testing on 1.10.* versions.
- Users will start migrating to new operators
earlier and
have
  smoother switch to 2.0 later
  - The latest improved operators will start

 There are three cons I could think of:

  - There will be quite a lot of duplication between
old and
new
operators (they will co-exist in 1.10). That might
lead to
confusion
of
users and problems with cooperation between
different
operators/hooks
- Having new operators in 1.10 python 3 might keep
people
from
migrating to 2.0
- It will require some maintenance and separate
release
overhead.

 I already spoke to Composer team @Google and they are
very
positive
about
this. I also spoke to Ash and seems it might also be
OK for
Astronomer
team. We have Google's backing and support, and we
can
provide
maintenance
and support for those packages - being an example for
other
providers
how
 they can do it.

 Let me know what you think - and whether I should
make it
into
an
official
 AIP maybe?

 J.



 --

 Jarek Potiuk
 Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal
Software
Engineer

 M: +48 660 796 129 <+48660796129>
 [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>



 --

 Jarek Potiuk
 Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal
Software
Engineer

 M: +48 660 796 129 <+48660796129>
 [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>



 --

 Jarek Potiuk
 Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal
Software
Engineer

 M: +48 660 796 129 <+48660796129>
 [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>



 --

 Tomasz Urbaszek
 Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Junior Software
Engineer

 M: +48 505 628 493 <+48505628493>
 E: tomasz.urbaszek@polidea.com
<tomasz.urbaszeki@polidea.com


 Unique Tech
 Check out our projects!
<https://www.polidea.com/our-work <https://www.polidea.com/our-work>>



 --

 Jarek Potiuk
 Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal Software
Engineer

 M: +48 660 796 129 <+48660796129>
 [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>




 --

 Jarek Potiuk
 Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal Software
Engineer

 M: +48 660 796 129 <+48660796129>
 [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>




 --

 Jarek Potiuk
 Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal Software Engineer

 M: +48 660 796 129 <+48660796129>
 [image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>




--

Jarek Potiuk
Polidea <https://www.polidea.com/ <https://www.polidea.com/>> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/ <https://www.polidea.com/>>